The value specified in pipe_compute_state is in addition to the implicit
value computed by NIR.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7047>
It's included in declaration of INTEL_DEBUG.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6732>
This makes it explicit that this intrinsic is only for SSBOs. For the
v3dv driver, we'll be adding a get_ubo_size intrinsic and we want to be
able to distinguish between the two.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6812>
Clover doesn't upload a cbuf0 but instead provides the kernel inputs as
part of the pipe_grid. The most obvious thing to do is to upload them
along with system values.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6280>
We were space-leaking iris_compiled_shader objects, leaving them around
basically forever - long after the associated iris_uncompiled_shader was
deleted. Perhaps even more importantly, this left the BO containing the
assembly referenced, meaning those were never reclaimed either. For
long running applications, this can leak quite a bit of memory.
Now, when freeing iris_uncompiled_shader, we hunt down any associated
iris_compiled_shader objects and pitch those (and their BO) as well.
One issue is that the shader variants can still be bound, because we
haven't done a draw that updates the compiled shaders yet. This can
cause issues because state changes want to look at the old program to
know what to flag dirty. It's a bit tricky to get right, so instead
we defer variant deletion until the shaders are properly unbound, by
stashing them on a "dead" list and tidying that each time we try and
delete some shader variants.
This ensures long running programs delete their shaders eventually.
Fixes: ed4ffb9715 ("iris: rework program cache interface")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6075>
We've hand-rolled this loop 10 places and those are just the ones I
found easily.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
Instead of having separate lists of variables, roughly sorted by mode,
use a single list for all shader-level NIR variables. This makes a few
list walks a bit longer here and there but list walks aren't a very
common thing in NIR at all. On the other hand, it makes a lot of things
like validation, printing, etc. way simpler. Also, there are a number
of cases where we move variables from inputs/outputs to globals and this
makes it way easier because we no longer have to move them between
lists. We only have to deal with that if moving them from the shader to
a nir_function_impl.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
We're nearly out of dirty bits, and some patches pending review on
GitLab no longer apply due to that. Make room for them by splitting
off shader stage-specific bits into a separate stage_dirty mask.
An alternative would be to split compute-related bits into a separate
mask, but that would prevent the '<< stage' indexing done in various
parts of the driver from working.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5279>
This was used to decide which SIMD width to generate code for
ARB_compute_variable_group_size. Now that compiler will generate
multiple SIMD widths, this information is unused.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5142>
For now this parameter doesn't do anything.
It means the implementation is allowed to use
a cache on disk.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4993>
The motivating factor is: this lowering may cause
nir_intrinsic_load_local_group_size intrinsics to be added to the
shader, and by moving this around we make possible for the drivers to
lower that intrinsic by themselves.
Iris will do just that in a later patch for implementing variable
group size.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>
This is a preparation for dropping this field since this value is
expected to be calculated by the drivers now for variable group size
case. And also the field would get in the way of brw_compile_cs
producing multiple SIMD variants (like FS).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>
The push.total field had three values but only one was directly
used (size). Replace it with a helper function that explicitly takes
the cs_prog_data and the number of threads -- and use that in the
drivers.
This is a preparation for ARB_compute_variable_group_size where the
number of threads (hence the total size for push constants) is not
defined at compile time (not cs_prog_data->threads).
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>
Change brw_compute_vue_map() to also take the number of pos slots. If
more than one slot is used, the VARYING_SLOT_POS is treated as an
array.
When using Primitive Replication, instead of a single position, the
VUE must contain an array of positions. Padding might be
necessary (after clip distance) to ensure rest of attributes start
aligned.
v2: Add note about array in the commit message and assert that
pos_slots >= 1 to make clear 0 is invalid. (Jason)
Move padding to be after the clip distance.
v3: Apply the correct offset when gathering the sources from outputs.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2313>
Patch adds a new arg and modifies existing calls from i965, anv
pass NULL but iris stores this information for later use.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4080>
This is the same idea as "intel: fix the gen 11 compute shader scratch
IDs".
The number of EUs on TGL is not the same as ICL, but the
MEDIA_VFE_STATE restrictions stay the same, so adapt the code to it.
Also, consider the base configuration instead of what we read from the
Kernel.
According to Mark, this fixes the following piglit tests on TGL:
piglit.spec.arb_compute_shader.execution.shared-atomicmax-uint.tglm64
piglit.spec.arb_compute_shader.execution.shared-atomicmax-int.tglm64
piglit.spec.intel_shader_atomic_float_minmax.execution.shared-atomicmax-float.tglm64
v2: s/ICL+/Gen11+/ (Jason).
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4006>
Scratch space allocation is based on the number of threads in the base
configuration, and we only have one base configuration for ICL, with 8
subslices.
This fixes an issue with Aztec on Vulkan in a machine with a
configuration that's not the base. The issue looks like a regression
from b9e93db208, but it seems things are broken since forever, just
not easily reproducible.
v2: Reimplement it using the subslices variable. Don't touch TGL.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4006>
Now that it uses ISL rather than genxml code, there's no need for it to
live as a vtable function inside the state module. We can just make it
a static inline helper in iris_resource.h so it's available throughout
the codebase.
Fixes: a4da6008b6 ("iris: Use mocs from isl_dev.")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3720>
I was passing iris keys to brw_debug_key_recompile, leading to out of
bounds memory reads.
Fixes: 2e654db27a ("iris: Create smaller program keys without legacy features")
A lot of the brw_*_prog_key fields are for emulating features on legacy
hardware that iris doesn't support. In particular, all of the texture
swizzle fields take up a lot of space. These dead fields make hashing
the shader keys more expensive than it ought to be.
We introduce iris-specific keys with only the information we need, and
translate them to brw keys when actually compiling new variants. This
way, key comparisons can use the small keys. The size reductions are:
VS: 328 bytes -> 8 bytes
TCS: 312 bytes -> 24 bytes
TES: 304 bytes -> 24 bytes
GS: 284 bytes -> 8 bytes
FS: 304 bytes -> 16 bytes
CS: 280 bytes -> 4 bytes
Scores for the Piglit drawoverhead microbenchmark case with a shader
program change improve by roughly 30%.
Reviewed-by: Eric Anholt <eric@anholt.net>
In d1c4e64a69, we added a parameter to tell the back-end compiler to
ignore the param array and just push however many constants you ask it
to push. Iris doesn't want to push anything so it gives a bogus number
of parameters and trusts the back-end compiler to dead-code all of them.
Now that we can tell the back-end compiler to stop re-arranging things,
delete the hack and enable the new simpler code path.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
So nir_validate happens properly. Unfortunately this means we have
to play the metadata song and dance, so walk over all impls and say
that we didn't hurt anything.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When demoting it from an output to a global, we need to actually move
it to the correct list. While here, we also refactor so it's clear
we aren't mutating the list while iterating.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2106
Fixes: f9fd04aca1 ("nir: Fix non-determinism in lower_global_vars_to_local")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We were relying on specific pass ordering in st to avoid setting
inputs_read/outputs_written for edge flags. Instead, just assume
that it happens and throw out the results we don't want.
We should probably revisit this and try and add a vertex element
property like I originally wanted so we can avoid having it be
associated with the VS altogether.
This allows us to make sure clipdist is emitted as a scalar array rather
than two vec4s. This matches SPIR-V semantics, and will be useful for
Zink.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
so that drivers don't have to call nir_strip manually.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
This should help avoid stalls in the pixel mask array in certain
non-promoted depth cases. It especially helps for Z16, as each bit
in the PMA corresponds to two pixels when using Z16, as opposed to
the usual one pixel.
Improves performance in GFXBench5 TRex by 22% (n=1).
From the MEDIA_VFE_STATE docs:
"Starting with this configuration, the Maximum Number of Threads must
be set to (#EU * 8) for GPGPU dispatches.
Although there are only 7 threads per EU in the configuration, the
FFTID is calculated as if there are 8 threads per EU, which in turn
requires a larger amount of Scratch Space to be allocated by the
driver."
It's pretty clear that we need to increase this for scratch address
calculations, because the FFTID has a certain bit-pattern. The quote
above seems to indicate that we should increase the actual thread count
programmed in MEDIA_VFE_STATE as well, but we think the intention is to
only bump the scratch space.
Fixes GPU hangs in Bioshock Infinite and Synmark's CSDof on Icelake 8x8.
Fixes: 5ac804bd9a ("intel: Add a preliminary device for Ice Lake")
Reviewed-by: Matt Turner <mattst88@gmail.com>
If we can entirely push uniform data, we don't need a SURFACE_STATE
descriptor for pulling data. Since constant uploads are a very common
operation, and being able to push all data is also very common, we would
like to avoid the overhead in this case.
This patch defers uploading new descriptors. Instead of handling that
at iris_set_constant_buffer, we do it at iris_update_compiled_shaders,
where we can see the currently bound shader variants. If any need pull
descriptors, and descriptors are missing, we update them and flag that
the binding table also needs to be refreshed.
Improves performance in GFXBench5 gl_driver2 on an i7-6770HQ by
31.9774% +/- 1.12947% (n=15).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The compiler now sets the "Null Render Target" bit in the RT write
extended message descriptor, causing it to write to an implicit null
surface without us needing to set one up in the binding table.
Together with the last patch, this improves performance in Car Chase on
an Icelake 8x8 (locked to 700Mhz) by 0.0445526% +/- 0.0132736% (n=832).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Otherwise it's impossible to know the maximum SSBO index for both
internal TGSI shaders from TTN (which don't have any notion of atomic
counters and no offset) as well as shaders from GLSL.
I fixed everything I could find while grepping for num_ssbos and
num_abos, which hopefully is everything (iris was the only user I could
find that uses it in a meaningful way).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It's not used by anything anymore now that so much lowering has been
moved into NIR. Sadly, we still need on in brw_compile_gs() for
geometry shaders on Sandy Bridge. Short of a lot of pointless work,
that one's probably not going away.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>