Like accumulators and ARF address registers, the virtual address
registers are not tracked in a way the defs analysis can know
about. This could actually be fixed, but that is future work.
Fixes: b110b06447 ("brw: introduce a new register type for the address register")
Suggested-by: Lionel
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40083>
brw_reg::nr encodes both which ARF it is and which instance of that
ARF. In other words, nr for acc0 and acc2 have some bits that say
BRW_ARF_ACCUMULATOR and some bits that say 0 vs 2. The previous test
would only detect acc0.
Fixes: 0d144821f0 ("intel/brw: Add a new def analysis pass")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40083>
This can occur if NULL or an accumulator is an explicit destination.
update_for_reads still needs to process the sources.
v2: Pass a brw_reg to ::mark_invalid, and do the VGRF check in that one
place.
Fixes: 0d144821f0 ("intel/brw: Add a new def analysis pass")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40083>
Set max outstanding ray queries to 1024. This value can be tuned later
specific to apps.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40182>
Applications often miss emitting barriers between a shader
initializing data & another shader writing data in the same location
afterward. This is very common for UAVs (see vkd3d-proton).
Vkd3d-proton does a pretty good job as inserting missing barriers
between UAV clears & writes. But some applications also have similar
issues with custom shaders. Here we introduce an analysis pass that
recognize shaders doing clear/initialization. We'll use that
information in the following commit to insert barriers after those
shaders.
Since Gfx12.5 our HW has become a lot more sensitive to those issues
due to the introduction of an L1 untyped data cache that is not
coherent across the shader units. On Gfx20+, typed data is also L1
cacheable exposing even more issues.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40187>
This doesn't have a functional impact, but when this command is run:
$ src/intel/genxml/genxml_import.py --import
It causes the gen125_rt.xml file to be processed in the expected
order, before xe2.xml.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40217>
The code is a bit simpler to follow. We add new files so rarely, that
it shouldn't be much of a burden add new items.
Reworks:
* Iván Briano recommended to use array of names rather than more
complicated filename parsing.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40103>
Coverity notes that we break out of the loop walking the analysis_ranges
early if n_push_ranges >= max_push_buffers, so it notes that
n_push_ranges could already be 3 or 4 (depending on whether we're doing
mesh), and that then if we need the padding we insert another, which
would write past the end of the array.
I don't think this is actually possible in practice, but we can add an
assert to both keep coverity happy and detect that this has actually
happened.
CID: 1681478
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40147>
this isn't C++ brw code, it's just a devinfo query.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40143>
Shaders are allocated from anv_shader_heap, which is backed by the
util_vma_heap. Rename the VA range field to shader_heap to match current
usage and avoid confusion.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40131>
Binding report test cases are failing for Gen12.5 platforms due to
missing binding events from private bindings for images creating
inconsistencies in the number of unbind events reported from
anv_DestroyImage(). Adding bind events from alloc_private_binding() to
fix the issue.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37422>
For Gfx125 workloads that use systolic mode, this might mean
an extra PIPELINE_SELECT when flipping between a compute shader
that use the mode and another that doesn't use the mode
(or vice-versa).
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40014>
There were 2 issues in the commit being fixed :
1. loading from the wrong surface state
2. not being able to have the optimization passes cleanup the
nir_vector_extract()
We fix the first issue by reusing the nir_vector_extract() pattern in
the broken places.
We fix the second issue by reworking the internal vec4 format we use
for passing around descriptor information. In particular we put the
set in its own component so that it can be easily optimized and the
vector extraction constant folded.
Fixes: e94cb92cb0 ("anv: use internal surface state on Gfx12.5+ to access descriptor buffers")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40076>
Dependencies in UNDEFs were already not propagated by
update_inst_scoreboard(), since the instruction there
was not consider neither ordered or unordered; and also
not being used to resolve implicit dependencies.
The generator was already ignoring any baked dependency
but for cases where UNDEF had two dependencies, a sync nop
would be generated -- which would be redundant with a
later sync nop.
Since we know UNDEFs have no dependencies, stop treating
them specially when trimming dependencies.
This patch remove this particular class of redundant sync nops.
No functional change is expected.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39875>
Push constants on bindless stages of Gfx12.5+ don't get the data
delivered in the registers automatically. Instead the shader needs to
load the data with SEND messages.
Those stages do get a single InlineParameter 32B block of data
delivered into the EU. We can use that to promote some of the push
constant data that has to be pulled otherwise.
The driver will try to promote all push constant data (app + driver
values) if it can, if it can't it'll try to promote only the driver
values (usually a shader will only use a few driver values). If even
the drivers values won't fit, give up and don't use the inline
parameter at all.
LNL internal fossil-db:
Totals from 315738 (20.08% of 1572649) affected shaders:
Instrs: 155053691 -> 154920901 (-0.09%); split: -0.09%, +0.00%
CodeSize: 2578204272 -> 2574991568 (-0.12%); split: -0.15%, +0.02%
Send messages: 8235628 -> 8184485 (-0.62%); split: -0.62%, +0.00%
Cycle count: 43911938816 -> 43901857748 (-0.02%); split: -0.05%, +0.03%
Spill count: 481329 -> 473185 (-1.69%); split: -1.82%, +0.13%
Fill count: 405617 -> 399243 (-1.57%); split: -1.86%, +0.28%
Max live registers: 34309395 -> 34309300 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 8298224 -> 8299168 (+0.01%)
Non SSA regs after NIR: 18492887 -> 17631285 (-4.66%); split: -4.73%, +0.08%
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39405>
It's pretty much the same mechanism, except it's a different register
location.
With this change we gain indirect loading support.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39405>