On a fossil from the blender 4.5.0 vulkan backend, this improves compile
times in nak by about 17%. Compile time of other shaders improves by a
more modest 1.2%.
No stat changes on shader-db.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36184>
src/intel/perf/xe/intel_perf.c:420:27: warning: implicit conversion from enumeration type 'enum drm_xe_eu_stall_property_id' to different enumeration type 'enum drm_xe_oa_property_id' [-Wenum-conversion]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37172>
When disassembling and BRW IR is available (which happens in the
generator), there will be pointers to the BRW's basic block structures
that are used to print the block numbers and predecessor/successors
in the output.
There are two challenges:
- Because DO and FLOW instructions are not real instructions, they are
not emitted in the output but would still cause the output to contain
empty blocks. Previous code accounted for DO but still had problems.
- DO blocks have special physical links that don't make sense when the
DO is not emitted at the end, but they would be shown even if that
block was omitted.
These issues can be seen here (edited to remove non-essential bits)
```
START B0 (2 cycles)
mov(8) g126<1>UD 0x3f800000UD
END B0 ->B1
START B2 <-B1 <-B4 (0 cycles)
END B2 ->B3
START B3 <-B2 (260 cycles)
LABEL1:
mov(8) g1<1>D 0D
cmp.ge.f0.0(8) null<1>D g2<0,1,0>D 10D
sync nop(1) null<0,1,0>UB
send(1) g0UD g1UD nullUD
(+f0.0) break(8) JIP: LABEL0 UIP: LABEL0
END B3 ->B1 ->B5 ->B4
START B4 <-B3 (1000 cycles)
sync nop(1) null<0,1,0>UB
mov(8) g126<1>UD g0<0,1,0>UD
LABEL0:
while(8) JIP: LABEL1
END B4 ->B2
START B5 <-B1 <-B3 (20 cycles)
```
For example:
- Block 1 is missing (a skipped DO block)
- Block 2 is empty (it was a FLOW block)
- Block 3 ends with a link to Block 1 (the special links involving DO
blocks).
Two key changes were made to fix this. First, skip the DO and FLOW
blocks completely. The use_tail ensures that the instruction group is
reused to avoid empty blocks. Second, when printing, the successors and
predecessors, walk through the skipped blocks. And finally, don't print
the special blocks.
With the fix, here's the output. Note the blocks retain their original
BRW IR number.
```
START B0 (2 cycles)
mov(8) g127<1>UD 0x3f800000UD
END B0 ->B3
START B3 <-B0 <-B4 (260 cycles)
LABEL1:
mov(8) g1<1>D 0D
cmp.ge.f0.0(8) null<1>D g2<0,1,0>D 10D
sync nop(1) null<0,1,0>UB
send(1) g0UD g1UD nullUD
(+f0.0) break(8) JIP: LABEL0 UIP: LABEL0
END B3 ->B5 ->B4
START B4 <-B3 (1000 cycles)
sync nop(1) null<0,1,0>UB
mov(8) g127<1>UD g0<0,1,0>UD
LABEL0:
while(8) JIP: LABEL1
END B4 ->B3
START B5 <-B3 (20 cycles)
```
Issue was spotted by Ken.
Fixes: d2c39b1779 ("intel/brw: Always have a (non-DO) block after a DO in the CFG")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36226>
It alwways comes in through the create flags now.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36957>
We will use those where no associated shaders is active but we still
need some default values programmed.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
Tessellation factors have to be written dynamically (based on the next
shader primitive topology) and the builtins read using a dynamic
offset (based on the preceeding shader's VUE).
Anv is updated to use this new infrastructure for dynamic
patch_control_points.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
Drivers can provide the inputs required for the backend to call the
compute function.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34872>
Partial results should be computed for all types of queries.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36916>
Implement HSD 16028171704/14025112257:
LSC state cache livelock:- Once state cache entries are full,
subsequent walker dispatches with two threads per thread group maybe
gets stuck infinitely because of state cache live lock.
One thread continuously stuck in loop doing UGM fence + evict and UGM
read is waiting on UGM read to have certain value. while other thread
supposed to update the value that first thread is waiting for. But
since entries are full in state cache, there is second thread never
make progress.
Closes: #12352
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37128>
Implement HSD 16028171704/14025112257:
LSC state cache livelock:- Once state cache entries are full,
subsequent walker dispatches with two threads per thread group maybe
gets stuck infinitely because of state cache live lock.
One thread continuously stuck in loop doing UGM fence + evict and UGM
read is waiting on UGM read to have certain value. while other thread
supposed to update the value that first thread is waiting for. But
since entries are full in state cache, there is second thread never
make progress.
Closes: #12352
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37128>
The non-compute end flag should be INTEL_DS_TRACEPOINT_FLAG_END_OF_PIPE.
This fixes the broken anv utrace for anything non-compute that can
potentially overlap (execute in parallel).
Fixes: 6281b207db ("anv: add tracepoints timestamp mode for empty dispatches")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37155>
Layered surfaces (array textures) with video encode/decode usage bits
will have their slices aligned to make them addressable to the media
engine. Multi-planar layered surfaces will be stored with their slices
interleaved so that a relative offset can be programmed between the
gamma and chroma slices.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35651>
Adds support for creating layered surfaces with slices that are addressable
to the media engine for video encoding and decoding.
Co-authored-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35651>
Pre-rasterization stages need a CS stall if they need to wait on the
flushes from a PIPE_CONTROL.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37132>
This fixes graphics artifacts happening with particular shader.
This 'heuristic' hits few very similar shaders but should provide better
performance than current fix to turn off caching from all shaders.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35929>
The Xe ioctl DRM_XE_DEVICE_QUERY_ENGINE_CYCLES provides accurate
timestamps correlated between the CPU and GPU. However, it is slow and
impacts performance while collecting Perfetto traces.
Instead, use Perfetto's GetBootTimeNs() to track when to emit the
BUILTIN_CLOCK_BOOTTIME clock sync event so it only occurs every 1
second. This reduces the impact of recording gpu.renderstages from
-8% to -4%.
More concretely, FPS measurements when tracing Unity BoatAttack demo on
an Intel ADL device:
* gpu.renderstages disabled: 48.044293667
* gpu.renderstages enabled: 38.119778333 (-20.66%)
* gpu.renderstages enabeled + this fix: 42.641818333 (-11.24%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37095>
Apply any outstanding accumulated PC bits before we proceed on building
Acceleration Structure.
2 reasons for this :
- some of the data accessed by the build might need to be flushed
as a result of a previous barrier
- the scratch buffer might get reused between builds
Cc: mesa-stable
Closes: #13711
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36951>
v2: Rebase on ac2b072312 ("brw: Add more specific brw_builder
helpers"), and fix a bug that caused the new instruction to possibly be
put in the wrong place.
No shader-db changes on any Intel platform.
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 233675305 -> 233641585 (-0.01%)
Cycle count: 32593658094 -> 32591467794 (-0.01%); split: -0.01%, +0.00%
Totals from 33513 (4.25% of 789264) affected shaders:
Instrs: 5200332 -> 5166612 (-0.65%)
Cycle count: 1499831128 -> 1497640828 (-0.15%); split: -0.15%, +0.00%
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35444>
Remove the cfg variables and use the shader pointers directly. Reset
the variant pointer if a shader failed or will not be used.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>