In Fragment Shader, regular inputs are laid out in the thread payload
in a one dword per each half-GRF, that gives room for having the two
delta dwords needed for interpolation.
Per-primitive inputs are laid out before the regular inputs, and since
there's no need to have delta information, they are packed. So
half-GRF will be fully filled with 4 dwords of input.
When num_per_primitive_inputs is zero (the default case), behavior
should be the same as before.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
Instead of checking for MESA_SHADER_COMPUTE (and KERNEL). Where
appropriate, also use gl_shader_stage_is_compute().
This allows most of the workgroup-related lowering to be applied to
Task and Mesh shaders. These will be added later and "inherit" from
cs_prog_data structure.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13629>
Found this nugget while looking at the ACO driver. It seems sensible to
avoid SLM fences if there is no SLM. This also makes the check depend
on SLM usage rather than just shader stage which will be useful if we
ever implement task/mesh because task shaders also have SLM.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13092>
Start off making everything look like LSC where we have three types of
fences: TGM, UGM, and SLM. Then, emit the actual code in a generation-
aware way. There are three HW generation cases we care about:
XeHP+ (LSC), ICL-TGL, and IVB-SKL. Even though it looks like there's a
lot to deduplicate, it only increases the number of ubld.emit() calls
from 5 to 7 and entirely gets rid of the SFID juggling and other
weirdness we've introduced along the way to make those cases "general".
While we're here, also clean up the code for stalling after fences and
clearly document every case where we insert a stall.
There are only three known functional changes from this commit:
1. We now avoid the render cache fence on IVB if we don't need image
barriers.
2. On ICL+, we no longer unconditionally stall on barriers. We still
stall if we have more than one to help tie them together but
independent barriers are independent. Barrier instructions will
still operate in write-commit mode and still be scheduling barriers
but won't necessarily stall.
3. We now assert-fail for URB fences on LSC platforms. We'll be adding
in the new URB fence message for those platforms in a follow-on
commit.
It is a big enough refactor, however, that other minor changes may be
present.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13092>
v2: Very significant rebase on changes to previous commits.
Specifically, brw_fs_nir.cpp changes were pretty much rewritten from
scratch after changing the NIR opcode names and types.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
Program the producers/consumer fields for TCS Barrier messages.
Producer and consumer fields are set to number of TCS threads.
Ref: Bspec 54006 for Barrier Data Payload
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11963>
Rearrange if/else fragments to unify case for Gen11 or later
platforms. This will help the code look cleaner for adding
unified barrier support to TCS.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11963>
A lot of CTS tests write a u8vec4 or an i8vec4 to an SSBO. This results
in a lot of shifts and MOVs. When that pattern can be recognized, the
individual 8-bit components can be packed much more efficiently.
v2: Rebase on b4369de27f ("nir/lower_packing: use
shader_instructions_pass")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
Emitting the instructions one by one results in two MOV instructions
that won't be propagated. By handling both instructions at once, a
single MOV is emitted. For example, on Ice Lake this helps
dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag:
SIMD8 shader: 49 instructions. 1 loops. 4044 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 41 instructions. 1 loops. 3804 cycles. 0:0 spills:fills, 5 sends
Without "intel/fs: Allow copy propagation between MOVs of mixed sizes,"
the improvement is still 8 instructions, but there are more instructions
to begin with:
SIMD8 shader: 52 instructions. 1 loops. 4164 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 44 instructions. 1 loops. 3944 cycles. 0:0 spills:fills, 5 sends
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
In the vec4 compiler, 8-bit types should never exist.
In the scalar compiler, 8-bit types should only ever be able to exist on
Gfx ver 8 and 9.
Some instructions are handled in non-obvious ways.
Hopefully this will save the next person some time.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
This fixes a hang in the following piglit test when GCM moves a
UBO load outside of the loop.
tests/shaders/ssa/fs-if-def-else-break.shader_test
The end NIR ends up looking like this:
vec2 32 ssa_3 = intrinsic load_ubo (ssa_2, ssa_0) (0, 1073741824, 0, 0, 8)
vec1 32 ssa_4 = mov ssa_3.x
vec1 32 ssa_5 = inot ssa_3.y
/* succs: block_1 */
loop {
...
if ssa_5 { }
}
Fixes: 1edf67fc3f ("intel/fs: Generate if instructions with inverted conditions")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>
It's intel-specific, used to get at MSAA compression information.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>
Be consistent with other usages in Vulkan and SPIR-V, and the recently
added workgroup_size field.
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>
One exception is src/amd/addrlib/, for which -Wimplicit-fallthrough is
explicitly disabled.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>
When a surface of type SURFTYPE_NULL is accessed by resinfo, the MIPCount
returned is undefined instead of 0.
Closes#4309
Fixes dEQP-VK.robustness.robustness2.*.sampled_image.*.null_descriptor.*
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10147>
The messages for those 16-bit operations still use 32-bit sources and
destinations, so expand them accordingly when building the payload.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8750>
This allows us to do bounds checked A64 block load without the it being
counted as control-flow by NIR. This means that NIR optimizations like
CSE will be able to work on these the same as a regular load.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8635>
We currently don't use push constants with the COMPUTE_WALKER command.
Make all uniforms to be pull constants.
The local group id previously was a push constant, but is now
available in R0.2[7:0].
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8342>