Commit graph

12198 commits

Author SHA1 Message Date
Lionel Landwerlin
7f1ca16e3b brw: enable rematerialization of non 32bit uniforms
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>
2024-06-21 08:29:44 +00:00
Lionel Landwerlin
0531f568ac brw: remove some brackets
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>
2024-06-21 08:29:44 +00:00
Lionel Landwerlin
11a634151b brw: remove rematerialization assert
The default case should lead us to the next rematerialization block so
this is useless.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>
2024-06-21 08:29:44 +00:00
Lionel Landwerlin
d42bc0d3fc brw: bound the amount of rematerialized NIR instructions
Some of the instructions we don't need to rematerialize because we
already know they are executed with NoMask so we can use their
destination without reemitting them again.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>
2024-06-21 08:29:44 +00:00
Lionel Landwerlin
4bfb4f35a8 brw: improve rematalization of surface/sampler handles
This change handles patterns like this

con v0 = load_ubo ...
con v1 = add v0, 0x30
con v2 = load_ubo v1, 0x0

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>
2024-06-21 08:29:44 +00:00
Lionel Landwerlin
c7b312ad45 brw: factor out source extraction for rematerialization
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>
2024-06-21 08:29:44 +00:00
Lionel Landwerlin
8fbbc9c301 brw: add missing break
Not fixing anything because of the default case below.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>
2024-06-21 08:29:44 +00:00
Lionel Landwerlin
a869c57250 anv: don't apply descriptor array bound checking
This is a follow up to 059e82a4 ("anv: remove descriptor array bounds
checking"), that kind of bound checking is not required by the spec.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>
2024-06-21 08:29:44 +00:00
Jianxun Zhang
02813f341b isl: Remove code for Xe2 from isl_gfx12.c
Xe2 code is in isl_gfx20.* now.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11329

Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29702>
2024-06-21 04:02:08 +00:00
Jianxun Zhang
4debb5bbc4 isl: Implement a part of WA_22018390030 (xe2)
Fix: piglit test
gl-3.2-layered-rendering-clear-color-all-types 2d_array mipmapped -auto
-fbo

Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29702>
2024-06-21 04:02:08 +00:00
Jianxun Zhang
8b084df0c0 isl: Add dispatching in isl.c (xe2)
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29702>
2024-06-21 04:02:08 +00:00
Jianxun Zhang
8d3093a329 isl: Add isl_gfx20 into build (xe2)
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29702>
2024-06-21 04:02:08 +00:00
Jianxun Zhang
5de9df094f isl: Update isl_gfx20 code (xe2)
Purge code for previous platforms and rename functions
in Xe2 files.

Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29702>
2024-06-21 04:02:07 +00:00
Jianxun Zhang
67fb44ccd6 isl: Clone from isl_gfx12.* files (xe2)
The new Xe2 files are copyed from intel/isl/isl_gfx12.*, as the
base for a seperation.

From 59218cdf07.

Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29702>
2024-06-21 04:02:07 +00:00
Paulo Zanoni
87787c4a87 anv/xe: fix declaration of memory flags for integrated non-LLC platforms
Makes Cyberpunk, Hitman and Total War Warhammer 3 run on LNL.

Fixes: c9e41f25a1 ("anv: Add heaps for Xe KMD in platforms without LLC")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29775>
2024-06-21 02:49:24 +00:00
José Roberto de Souza
73ce3143a8 anv: Fix assert in xe_gem_create()
In this assert we want to enforce that if a cached buffer is created
it is a cached+coherent as Xe KMD don't support cached+incoherent.

Did not caught this issue because it only reproduces in platforms with
GPU outside of LLC.

Fixes: 9d8d5cf8c9 ("anv: Remove block promoting non CPU mapped bos to coherent")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29826>
2024-06-21 02:19:55 +00:00
Francisco Jerez
c1feccdd90 intel/fs/gfx20+: Fix surface state address on extended descriptors for NIR scratch intrinsics.
The r0.5 thread payload register contains Surface State Offset bits
[27:6] as bits [31:10], so we need to shift the register right by 4 in
order to get the surface state offset expected in ExBSO mode, which is
the only extended descriptor encoding supported by the UGM shared
function for SS addressing on Xe2+.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29543>
2024-06-21 01:49:43 +00:00
Francisco Jerez
8bbad903a2 anv/xe2+: Fix format of scratch space surface address in various 3DSTATE packets.
This field encodes bits [27:6] of the scratch surface state offset
according to the hardware spec, already on XeHP platforms.  However,
on previous platforms we were passing bits [25:4] instead, which was
apparently okay for two reasons:

 1/ We never used more than 8 MB of scratch surface states apparently.
 2/ A shift right by 2 was implicitly happening while copying the
    value of r0.5 into the address register holding the extended
    descriptor, which with the ExBSO addressing mode disabled
    considered bits [31:12] as the surface state index within the
    pool.

However on Xe2 ExBSO addressing mode is always enabled for the UGM
shared function, so we have to add an extra SHR instruction to format
the extended descriptor regardless, and there is no point in
disobeying the hardware spec passing a left-shifted offset.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29543>
2024-06-21 01:49:43 +00:00
José Roberto de Souza
f5a6b84dd6 anv: Give apps the choice of compressed or uncompressed but cpu visible images
Compressed memory types are not CPU visible and Vulkan specification
don't have any requirement about that but some applications like
vkcube fails to run without a host visible option, so here appending
default_buffer_mem_types and compressed_mem_types.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28833>
2024-06-21 01:19:12 +00:00
José Roberto de Souza
8aec37fe0c anv: Add support for compressed images allocation in Xe2
Xe2 replaces auxiliary surface mapping by software to compress buffers,
instead it reserves part of the memory for the compression purpose.

To enable compression in Xe2 it is necessary bind memory with one of
the PAT indexes that has compression enabled.

It is still always returning false in anv_image_is_pat_compressible()
as it still needs more work before compression can be enabled but the
foundation for the compressed allocation is here.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28833>
2024-06-21 01:19:12 +00:00
José Roberto de Souza
90b223331f intel/dev: Add compressed PAT entry
This will be used in Xe2+ to store images compressed in memory.

Still missing add the compressed PAT index and attributes to
LNL intel_device_info.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28833>
2024-06-21 01:19:12 +00:00
Nanley Chery
b49182bed0 intel/isl: Pad the pitch on gfx12.0 for fast-clears
On gfx12.0, CCS fast clears don't seem to cover the correct portion of
the aux buffer when the pitch is not 512B-aligned. Pad the pitch unless
Wa_18020603990 applies (slow clear surfaces up to 256x256, 32bpp).

Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29659>
2024-06-21 00:08:38 +00:00
Nanley Chery
30ed4a7500 intel/isl: Require display flag for 512B pitch alignment
When CCS is enabled on a surface on gfx12, a 512B-aligned pitch is
required for the display engine. This is not required by the render
engine.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10740
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29659>
2024-06-21 00:08:38 +00:00
Nanley Chery
eff2fab0bc intel/isl: Consolidate some tiling checks for CCS
Filter out X-tiling early to avoid an assert failure in the next patch.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29659>
2024-06-21 00:08:38 +00:00
Nanley Chery
26802b3224 iris,anv: Disable gfx12.0 fast-clears with unaligned pitch
We'll reduce pitch alignment in a following patch. However, CCS
fast-clears don't seem to work unless the pitch is 512B aligned.
Disable fast clears for unaligned pitches.

Prevents the next patch from failing the following piglit tests:
* fbo-attachments-blit-scaled-linear
* hiz-stencil-test-fbo-d24s8
* hiz
* polygon-mode-facing
* clearbuffer-mixed-format
* glsl-lod-bias (transient failure)

No failures have been observed in anv, but there are more restrictions
for fast-clears in that driver compared to iris.

Note:
* The -fbo flag is necessary to make these fail. Otherwise, they end up
  with aligned render targets.
* Each of these tests allocate an image that has a pitch greater than
  512B and they collectively cover all the misalignment options - 128B,
  256B and 384B.

Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29659>
2024-06-21 00:08:38 +00:00
Nanley Chery
695577e5b0 intel/isl: Add and use isl_drm_modifier_needs_display_layout
Intel modifiers supporting compression are specified to be compatible
with the display engine, even if they won't actually be used for
scanout.

Attempting to capture a wider scope of modifiers resulted in test
errors. I chose to narrow the scope instead of digging into them.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29659>
2024-06-21 00:08:38 +00:00
Nanley Chery
483707e901 intel/isl: Drop support for the gfx12 CCS ISL surf
Now that we're using macros to handle aux-map CCS layout, we have no
need for the ISL surface representation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29659>
2024-06-21 00:08:38 +00:00
Nanley Chery
236c4597fa anv: Restrict CCS ISL surface creation to gfx9-11
ISL surfaces for CCS are not needed to describe flat CCS and aux-map
CCS.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29659>
2024-06-21 00:08:38 +00:00
Rohan Garg
2c00b7d1e6 anv: flag WSI images as scanout images for ISL
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29465>
2024-06-20 22:34:52 +00:00
José Roberto de Souza
19a8abde5f anv: Implement Wa_14019857787
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29619>
2024-06-20 21:47:59 +00:00
José Roberto de Souza
f7e3aecb87 anv: Implement Wa_14019708328
As each anv_device has its own address space it was necessary create
one dummy_aux_bo per anv_device.

Also this workaround requires us to disable the
buffer_length_in_aux_addr optimization, that is done in the physical
device creating because isl_dev of physical device is copied
to isl_dev in anv_device.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29619>
2024-06-20 21:47:59 +00:00
José Roberto de Souza
3ddcf17a12 intel/isl: Set dummy_aux_address to implement Wa_14019708328
This workaround ask us to set a dummy aux address to all
SURFTYPE_BUFFERs with AuxiliarySurfaceMode == AUX_NONE.
It also says that the same dummy aux address can be reused acrsoss all
buffers.

So here adding dummy_aux_address to isl_device, ANV and Iris will
set a value to when running a in a GPU affected.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29619>
2024-06-20 21:47:59 +00:00
Lionel Landwerlin
00982e1af6 anv: fix vkCmdWaitEvents2 handling
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29716>
2024-06-20 17:18:35 +00:00
Lionel Landwerlin
1ca97f019e anv: avoid initalizing TRTT stuff without sparseBinding
7da5b1caef ("anv: move trtt submissions over to the anv_async_submit")
added a hard dependency on timeline semaphore which is still optional.
And since it gates the sparseBinding feature, we should not use it if
sparseBinding is not enabled.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 7da5b1caef ("anv: move trtt submissions over to the anv_async_submit")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29779>
2024-06-20 11:38:16 +00:00
Kenneth Graunke
50519598ff intel/brw: Skip discarding the interference graph
We no longer need to reserve registers for constructing spill/fill
messages.  We have split sends and construct message headers in new
temporary registers with a very short lifespan which are simply added
to the existing interference graph as new nodes and allocated via the
normal mechanism.

This means that when we need to spill for the first time, we can avoid
discarding and recomputing the entire interference graph.  We also avoid
needing to recreate all spill candidate information once ra_allocate()
fails, because the graph remains valid, and none of the existing nodes
had any changes to their interference.  The existing spill candidates
remain valid.

This will slightly help improve compile time when needing to spill.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25811>
2024-06-20 09:47:18 +00:00
Kenneth Graunke
29d6264627 intel/brw: Build the scratch header on the fly for pre-LSC systems
Instead of reserving a register to contain the spill header, which
gets marked live for the entire program, we can just emit the ALU
instructions to build it on the fly.  (This is similar to the way
we handle scratch on Alchemist with the newer LSC data port.)

There are a couple of downsides that make this not obviously a win.
First, in order to construct the scratch header on Gfx9-12, we have
to use fields from g0, which will have to remain live anywhere that
scratch access is required.  This could negate the register pressure
benefits of creating the header on the fly.  However, g0 is oft used
in other places anyway, so it may already be there.  Another is that
it's a non-trivial number of ALU instructions to construct the value.
Still, trading lower pressure (so fewer spills, less memory access
and stalls) for more cheap ALU seems like it ought to be a win.

There is another valuable benefit: by not reserving a register, we
eliminate the need to reconstruct the interference graph.  (The next
patch will actually do so.)

shader-db on Icelake shows spills/fills at 54/53 helped, 4/10 hurt,
and an 8% increase in ALU on affected shaders.  Synmark's OglCSDof
(a benchmark that spills) performance remains the same on Alderlake.

fossil-db on Icelake shows a 5.6%/5.1% reduction in spills/fills and a
4% reduction in scratch memory size on affected shaders.  Instruction
counts go up by 11.07%, but cycle estimates only increase by 0.57%.
Assassin's Creed Odyssey and Wolfenstein Youngblood both see 20-30%
reductions in spills/fills, a significant improvement.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25811>
2024-06-20 09:47:18 +00:00
José Roberto de Souza
9d8d5cf8c9 anv: Remove block promoting non CPU mapped bos to coherent
The intention of this block was to set one of the flags that is used
to select a PAT index but this was doing more than that.
It was promoting WB+0 way coherency BOs to WC+1 way coherency possibly
causing regression in platforms without LLC.

anv_device_get_pat_entry() return WC/writecombining if no flags is
set so we don't need this block after all.

Reported-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Fixes: a65e982b44 ("anv: Split ANV_BO_ALLOC_HOST_CACHED_COHERENT into two actual flags")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29769>
2024-06-19 16:34:21 +00:00
Faith Ekstrand
4b6970cf36 ci: Update trace SHAs
I have no idea which commit changed these but they look fine.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28793>
2024-06-19 01:56:23 +00:00
Caio Oliveira
2a9f4618c5 intel/brw: Make component_size() consistent between VGRF and FIXED_GRF
Change so the size rounds up to the next multiple of the horizontal stride like
is done for VGRF.  This was causing an inconsistency in regs_read() -- The original
component_size() calculation for FIXED_GRF excluded any padding at the end but it was
still being discounted by regs_read().

Suggested by Curro.

Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11069
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29736>
2024-06-19 01:33:58 +00:00
Caio Oliveira
8fb70f0746 intel/brw: Add unit tests for scoreboard handling FIXED_GRF with stride
Based on shaders reported in
https://gitlab.freedesktop.org/mesa/mesa/-/issues/11069 and
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29723.  These
currently fail, later patch will enable them.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29736>
2024-06-19 01:33:58 +00:00
Lionel Landwerlin
c4e952dbd9 anv: reuse device local variable
No functional changes.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29595>
2024-06-18 20:44:51 +00:00
Lionel Landwerlin
0147908a89 anv: predicate emission of STATE_BASE_ADDRESS
Completely skip the stall & programming if the bindless address has
not changed. Only on Gfx12.5+ since previous generations also program
the binding table pool base address through STATE_BASE_ADDRESS.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29595>
2024-06-18 20:44:51 +00:00
Lionel Landwerlin
9a3e8508a7 anv: factor out STATE_BASE_ADDRESS filling to helper function
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29595>
2024-06-18 20:44:51 +00:00
Lionel Landwerlin
f8c0a99d52 anv: emit conditional after gfx state flushing
In a following change the predicate registers might be used when
flushing the state.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29595>
2024-06-18 20:44:51 +00:00
Lionel Landwerlin
ed43be941e anv: add custom mi write fences
The mi-builder already takes care of mi write/read fences, but we have
a few cases in Anv where we also need to fence mi-write ->
shader-read.

We also have one case where a command buffer jump address is modified
by a previous mi write command.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29595>
2024-06-18 20:44:51 +00:00
Caio Oliveira
f982d2bb79 intel/brw: Fix typo in DPAS emission code
The enums were mixed up.  Code was working because they were being
used only for their numerical values.

Fixes: e666872c75 ("intel/compiler: Initial bits for DPAS instruction")
Acked-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29762>
2024-06-18 18:25:21 +00:00
Kenneth Graunke
9e750f00c3 intel/brw: Make opt_copy_propagation_defs clean up its own trash
Copy propagation often eliminates all uses of an instruction.  If we
detect that we've done so, we can eliminate the instruction ourselves
rather than leaving it hanging until the next DCE pass.

This saves some CPU time as other passes don't see dead code.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
2af84c2d49 intel/brw: Use the defs-based copy propagation along with the old one
The new def-based pass works better in many cases, and should be less
resource intensive.  However, the limited visibility of the defs-based
pass due to many values not being SSA yet makes it unable to fully
replace the old pass.  Try the new one, and if it can't make progress,
then try the old one.  That way, things will mostly be handled by the
new pass, but everything that was being cleaned up still will be.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
580e1c592d intel/brw: Introduce a new SSA-based copy propagation pass
(Quite a few of the restrictions here are ported from the old pass.)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
9690bd369d intel/brw: Delete old local common subexpression elimination pass
We no longer use this older pass, so there's no need to keep it.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00