On a fossil from the blender 4.5.0 vulkan backend, this improves compile
times in nak by about 17%. Compile time of other shaders improves by a
more modest 1.2%.
No stat changes on shader-db.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36184>
When disassembling and BRW IR is available (which happens in the
generator), there will be pointers to the BRW's basic block structures
that are used to print the block numbers and predecessor/successors
in the output.
There are two challenges:
- Because DO and FLOW instructions are not real instructions, they are
not emitted in the output but would still cause the output to contain
empty blocks. Previous code accounted for DO but still had problems.
- DO blocks have special physical links that don't make sense when the
DO is not emitted at the end, but they would be shown even if that
block was omitted.
These issues can be seen here (edited to remove non-essential bits)
```
START B0 (2 cycles)
mov(8) g126<1>UD 0x3f800000UD
END B0 ->B1
START B2 <-B1 <-B4 (0 cycles)
END B2 ->B3
START B3 <-B2 (260 cycles)
LABEL1:
mov(8) g1<1>D 0D
cmp.ge.f0.0(8) null<1>D g2<0,1,0>D 10D
sync nop(1) null<0,1,0>UB
send(1) g0UD g1UD nullUD
(+f0.0) break(8) JIP: LABEL0 UIP: LABEL0
END B3 ->B1 ->B5 ->B4
START B4 <-B3 (1000 cycles)
sync nop(1) null<0,1,0>UB
mov(8) g126<1>UD g0<0,1,0>UD
LABEL0:
while(8) JIP: LABEL1
END B4 ->B2
START B5 <-B1 <-B3 (20 cycles)
```
For example:
- Block 1 is missing (a skipped DO block)
- Block 2 is empty (it was a FLOW block)
- Block 3 ends with a link to Block 1 (the special links involving DO
blocks).
Two key changes were made to fix this. First, skip the DO and FLOW
blocks completely. The use_tail ensures that the instruction group is
reused to avoid empty blocks. Second, when printing, the successors and
predecessors, walk through the skipped blocks. And finally, don't print
the special blocks.
With the fix, here's the output. Note the blocks retain their original
BRW IR number.
```
START B0 (2 cycles)
mov(8) g127<1>UD 0x3f800000UD
END B0 ->B3
START B3 <-B0 <-B4 (260 cycles)
LABEL1:
mov(8) g1<1>D 0D
cmp.ge.f0.0(8) null<1>D g2<0,1,0>D 10D
sync nop(1) null<0,1,0>UB
send(1) g0UD g1UD nullUD
(+f0.0) break(8) JIP: LABEL0 UIP: LABEL0
END B3 ->B5 ->B4
START B4 <-B3 (1000 cycles)
sync nop(1) null<0,1,0>UB
mov(8) g127<1>UD g0<0,1,0>UD
LABEL0:
while(8) JIP: LABEL1
END B4 ->B3
START B5 <-B3 (20 cycles)
```
Issue was spotted by Ken.
Fixes: d2c39b1779 ("intel/brw: Always have a (non-DO) block after a DO in the CFG")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36226>
Tessellation factors have to be written dynamically (based on the next
shader primitive topology) and the builtins read using a dynamic
offset (based on the preceeding shader's VUE).
Anv is updated to use this new infrastructure for dynamic
patch_control_points.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
Drivers can provide the inputs required for the backend to call the
compute function.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
v2: Rebase on ac2b072312 ("brw: Add more specific brw_builder
helpers"), and fix a bug that caused the new instruction to possibly be
put in the wrong place.
No shader-db changes on any Intel platform.
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 233675305 -> 233641585 (-0.01%)
Cycle count: 32593658094 -> 32591467794 (-0.01%); split: -0.01%, +0.00%
Totals from 33513 (4.25% of 789264) affected shaders:
Instrs: 5200332 -> 5166612 (-0.65%)
Cycle count: 1499831128 -> 1497640828 (-0.15%); split: -0.15%, +0.00%
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35444>
Remove the cfg variables and use the shader pointers directly. Reset
the variant pointer if a shader failed or will not be used.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
The brw_shader::uniforms now is derived from the nir_shader. The
only exception is compute shaders for older Gfx versions, so we
move the adjust logic for that.
The benefit here is untangling the code for compilation variants,
that before needed to keep track of the first that compiled to,
in most cases, copy an integer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
And unify the initialization code for brw_shader. Avoid passing
brw_compile_params since for a single compilation we might have
multiple shaders (the case for BS stage).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
The problem with the current code is that there is a disconnect between :
- the virtual register size allocated
- the dispatch size
- the size_written value
Only the last 2 are in sync and this confuses the spiller that only
looks at the destination register allocation & dispatch size to figure
out how much to spill.
The solution in this change is to make BROADCAST more like
MOV_INDIRECT, so that you can do a BROADCAST(8) that actually reads a
SIMD32 register. We put the size of the register read into src2.
Now the spiller sees correct read/write sizes just looking at the
destination register & dispatch size.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 662339a2ff ("brw/build: Use SIMD8 temporaries in emit_uniformize")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13614
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36564>
Those are for push constants, no point in doing that because :
- there is no HW constant offsets in push constants (payload
delivery), it's just register offset calculation
- if we have an dynamic value it's already using MOV_INDIRECT
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e103afe7be ("brw: run the nir_opt_offsets pass and set the maximum offset size")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36958>
Instead of being encoded as a contiguous 64-bit mask of individual registers,
the robustness information is now encoded as a vector of up to 4 bytes that
represent the limits of each of the pushed UBO ranges in 16 byte units.
Some buggy Direct3D workloads are known to depend on a robustness alignment
as low as 16 bytes to work properly.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36455>
We can just place the set structures inside nir_block.
This reduces the number of ralloc calls by 6.7% when compiling Heaven
shaders with radeonsi+ACO using a release build (i.e. not including
nir_validate set allocations, which are also removed).
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36728>
Unfortunately we cannot use the indirect descriptor on Gfx11, it
appears to just drop writes. Other platforms appear to be fine.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36883>
Usually JIP will be valid, but as part of other changes, it will be
possible to have a shader that have multiple EOT messages and end with
and ENDIF instruction. Its JIP will point after the program ends.
This is fine but was tripping up the compaction code.
Change compaction to not read its internal structures beyond the last
instruction.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36822>
Move it before the new source is used. This currently works because all
instructions have a minimum amount of sources allocated, but a later commit
will change that.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36817>
For example Xe2 uses the LSC and doesn´t need the shifting, so let's
just apply it where it's needed.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36757>
So that the driver can allocate an array of relocations using
BRW_SHADER_RELOC_EMBEDDED_SAMPLER_HANDLE + number_of_embedded_samplers
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36757>
We can optimize the VUE layout in cases where all shaders are compiled
together and some outputs are unused. So we need to have consistent
clip/cull_distance_mask with the VUE.
Previously we could have a VUE without ClipDistance present in the
header and yet have a non zero clip_distance_mask. This would trip the
HW into taking into account a VUE field that doesn't exist.
Here we set the clip/cull_distance_mask to 0 if the associated output
is not written by the shader. The written outputs are always
consistent with what's in the VUE.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2d396f6085 ("intel: prepare VUE layout for more than 2 layouts")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13685
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36734>
For unaligned invocations, don't launch two COMPUTE_WALKER, instead we
can mask off excessive invocations in the shader itself at nir level and
launch one additional workgroup.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36245>