Commit graph

12582 commits

Author SHA1 Message Date
Lionel Landwerlin
778cb59086 anv: optimize STATE_BYTE_STRIDE emission
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30803>
2024-08-23 10:52:19 +00:00
Lionel Landwerlin
195c5b68ba anv: don't miss workaround for indirect draws
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30803>
2024-08-23 10:52:19 +00:00
Lionel Landwerlin
f25b500af4 anv: move conditional render predicate after gfx_flush_state
Following up on f8c0a99d52 ("anv: emit conditional after gfx state
flushing"), this should have been applied everywhere.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0147908a89 ("anv: predicate emission of STATE_BASE_ADDRESS")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30803>
2024-08-23 10:52:19 +00:00
Tapani Pälli
5bf6602d23 anv: check if RT writes are happening for HasWriteableRT
Fixes: eebb6cd236 ("anv: stop using 3DSTATE_WM::ForceThreadDispatchEnable")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11749
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30785>
2024-08-23 06:28:00 +00:00
Lionel Landwerlin
a88898a28f anv: optimize CLIP::MaximumVPIndex setting
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11746
Fixes: 982106e676 ("anv: only set 3DSTATE_CLIP::MaximumVPIndex once")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30762>
2024-08-23 05:45:03 +00:00
Kenneth Graunke
b97e10208c intel/brw: Add a file parameter to idom_tree::dump()
The other dump methods in this file also take a file parameter,
defaulting to stderr.  Dumping dot files to stdout is probably not
what anybody really wanted.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30530>
2024-08-22 22:54:45 +00:00
Kenneth Graunke
bb4f05005e intel/brw: Print blocks in brw_print_instructions_to_file()
Useful when examining the control flow graph.  For some reason,
we printed this for the final assembly but not the IR.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30530>
2024-08-22 22:54:45 +00:00
Kenneth Graunke
2d73e42333 intel/brw: Fix OOB reads when printing instructions post-reg-alloc
Post-register allocation, but before brw_fs_lower_vgrfs_to_fixed_grfs,
we have registers with the VGRF file but they are actually fixed GRFs.

brw_print_instructions_to_file() was seeing VGRFs and trying to access
their size, but using bogus register numbers that could be out-of-bound.

Detect when we're post-RA and avoid doing this.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30530>
2024-08-22 22:54:45 +00:00
Lionel Landwerlin
d9406658ed brw: remove unused prog_data field
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30713>
2024-08-22 19:44:40 +00:00
Lionel Landwerlin
3769b58272 anv: move lowering of descriptor intrinsics to apply_layout
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30713>
2024-08-22 19:44:40 +00:00
Lionel Landwerlin
45117c0ed5 anv: simplify loading driver internal constants
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30713>
2024-08-22 19:44:39 +00:00
Lionel Landwerlin
7a55a930f6 anv: reuse common pipeline state for compute push allocations
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30713>
2024-08-22 19:44:39 +00:00
Eric Engestrom
d7f7aede15 intel/ci: don't trigger anv-jsl-full & anv-tgl-full on GL changes
These are pure VK-CTS jobs, they don't run any GL tests.

It doesn't matter right now because these two jobs are disabled, but
when they get re-enabled, we'll want this to have been fixed.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30677>
2024-08-22 16:24:24 +00:00
Daniel Stone
cc507536db ci/intel: Move manual/nightly jobs to postmerge stage
Create a new stage called intel-postmerge and move the full and manual
jobs over there, to avoid entanglement with the pre-merge jobs.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30784>
2024-08-22 15:35:18 +00:00
Daniel Stone
f1aab081b5 ci: Create new 'performance' stage
Move all jobs doing performance testing to a separate stage.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30784>
2024-08-22 15:35:18 +00:00
Kenneth Graunke
6a292c2699 intel: Fix bad align_offset on global_constant_uniform_block_intel
We were specifying align_offset = 64 and align_mul = 64, which is
invalid.  nir_combined_align() asserts that align_offset < align_mul.

Our intention here is to perform cacheline-aligned (64B-aligned) block
loads, so we should set align_mul = 64 and can leave align_offset = 0.

Fixes: fbafa9cabd ("intel/nir: remove load_global_const_block_intel intrinsic")
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30755>
2024-08-21 20:44:57 +00:00
Ian Romanick
c96ceb50d0 intel/brw/xe2: Allow int64 conversions
As far as I can tell from looking at the Bspec, MOV between integers
of all sizes appears to be supported.

shader-db:

total instructions in shared programs: 17480631 -> 17480535 (<.01%)
instructions in affected programs: 26284 -> 26188 (-0.37%)
helped: 21 / HURT: 13

total cycles in shared programs: 897601907 -> 897664293 (<.01%)
cycles in affected programs: 10929664 -> 10992050 (0.57%)
helped: 48 / HURT: 45

fossil-db:

Totals:
Instrs: 140686824 -> 140686155 (-0.00%); split: -0.00%, +0.00%
Cycle count: 21525129188 -> 21524717729 (-0.00%); split: -0.01%, +0.00%
Spill count: 70778 -> 70776 (-0.00%)
Fill count: 139172 -> 139168 (-0.00%)
Max live registers: 47513859 -> 47513795 (-0.00%)

Totals from 612 (0.11% of 549272) affected shaders:
Instrs: 964441 -> 963772 (-0.07%); split: -0.09%, +0.02%
Cycle count: 1215564312 -> 1215152853 (-0.03%); split: -0.09%, +0.06%
Spill count: 16172 -> 16170 (-0.01%)
Fill count: 37962 -> 37958 (-0.01%)
Max live registers: 70749 -> 70685 (-0.09%)

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30700>
2024-08-21 20:16:00 +00:00
Ian Romanick
09cf9fe8ab anv: Larger memory pools for huge shaders
At least one ray tracing shader in cp2077 is over 4MB on Xe2. There
isn't a memory pool large enough for the allocation, so the driver
crashes instead. This commit adds 8MB and 16MB pools.

I intend this as a stop gap fix. I would prefer to figure out why this
shader is so much larger than on previous platforms. The shader in
question has 3824 spills and 8625 fills. That is not good. I suspect
dealing with that will also solve the problem, but that will require a
bit more time.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11739
Suggested-by: Lionel Landwerlin
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30751>
2024-08-21 19:45:17 +00:00
Ian Romanick
0921dfa044 anv: Protect against OOB access to anv_state_pool::buckets
Suggested-by: Paulo Zanoni
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30751>
2024-08-21 19:45:17 +00:00
Rohan Garg
29a2e5358d anv: enable KHR_shader_relaxed_extended_instruction
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30726>
2024-08-21 14:13:46 +00:00
Francisco Jerez
71ca8529c5 intel/brw/gfx12.5+: Fix IR of sub-dword atomic LSC operations.
We were currently emitting logical atomic instructions with a packed
destination region for sub-dword LSC atomics, along the lines of:

> untyped_atomic_logical(32) dst<1>:HF, ...

However, these instructions use an LSC data size D16U32, which means
that the 16b data on the return payload is expanded to 32b by the LSC
shared function, so we were lying to the compiler about the location
of the individual channels on the return payload, its execution
masking, etc.  This is why the hacks that manually set the
'inst->size_written' of the instruction were required.

In some cases this worked, but any non-trivial manipulation of the
instruction destination by lowering or optimization passes could have
led to corruption, as has been reproduced in deqp-vk during
lower_simd_width() for shaders that use 16-bit atomics in SIMD32
dispatch mode.

Note that LSC sub-dword reads aren't affected by this because they use
raw UD destinations and specify the actual bit size of the operation
datatype as the immediate SURFACE_LOGICAL_SRC_IMM_ARG, which doesn't
work for atomic operations since that immediate specifies the atomic
opcode.

Instead, have the logical operation implement the behavior of 16-bit
destinations correctly instead of silently replacing the 16-bit region
with an inconsistent 32-bit region -- This is done by emitting the MOV
instructions used to pack the data from the UD temporary into the
packed destination from the lower_logical_sends() pass instead of from
the NIR translation pass.

Fixes: 43169dbbe5 ("intel/compiler: Support 16 bit float ops")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30683>
2024-08-21 02:33:12 +00:00
Nanley Chery
5e86087940 intel: Move depth clear value writes to drivers
This improves drivers in the following ways:

* iris_hiz_exec() and crocus_hiz_exec() gets rid of the narrowly-used
  update_clear_depth parameters.
* iris avoids fast-clearing if the aux state is CLEAR. crocus avoids
  this too, but didn't actually need it in the first place.
* iris updates the value once per fast_clear_depth() call instead of
  doing an update for each layer being cleared.
* anv now updates the clear value when transitioning from an undefined
  layout instead of doing so on every fast-clear. This should be safer
  because we don't perform state cache invalidates when changing the
  clear value. So, existing surface states won't have any stale values.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30520>
2024-08-20 21:29:43 +00:00
Nanley Chery
d7b0d32c28 intel/blorp: Simplify depth clear value updates
Use a single MI_STORE_DATA_IMM instead of five.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30520>
2024-08-20 21:29:43 +00:00
Nanley Chery
3294200098 intel: Add and use isl_get_sampler_clear_field_offset
Add and use a function which documents the sampler's behavior around
fast-clears on gfx11-12.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30520>
2024-08-20 21:29:43 +00:00
Nanley Chery
07e0834774 intel: Use a simpler workaround for HiZ WT fast-clears
The new workaround tries to strike a balance between simplicity and
functionality (for testing purposes). Instead of checking for the
alignment of a specific LOD when fast-clearing, we take an
all-or-nothing approach for LOD1+.

I haven't found any app to clear LOD1+ except for a Dirt Rally trace
some time ago. If I remember correctly, that trace clears all LODs,
doesn't render to them, then clears again with a different color,
incurring resolves. So, skipping LOD1+ fast clears will avoid those
resolves. Other apps I tested include Synmark2, glmark2, GfxBench5, and
the Vulkan games in internal our benchmarking tool.

Now that we've added updated and simplified checks in the drivers
themselves, we delete blorp_can_hiz_clear_depth.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30250>
2024-08-20 19:43:15 +00:00
Nanley Chery
a28bd0abdf intel: Adjust partial depth fast clear checks
None of our tracked games use partial depth clears, so only allow it in
simple cases for testing purposes. This change also fixes an issue on
gfx8, where we had been accidentally disabling full surface clears if
the LOD was not 8x4 aligned.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30250>
2024-08-20 19:43:15 +00:00
Nanley Chery
dd384104b7 intel/blorp: Allow LOD0 fast-clears with HiZ WT
I did some more debugging of this feature, but this time with a modified
version of the piglit test, ./bin/depthstencil-render-miplevels.
I modified the test to:

* Control which LOD to stop populating/clearing
* Print out the results of readpixels to stderr

From there, I could see how different surface dimensions affected
fast-clears. Depending on the surface dimensions, fast-clearing an LOD
above the LOD0 could cause other LODs to be cleared and/or cause the
targeted LOD to be only partially cleared (for example, when the LOD0
dimension is 66x66 and the test doesn't clear LOD3+). This never happens
when fast-clearing LOD0 however.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5258
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30250>
2024-08-20 19:43:15 +00:00
Nanley Chery
6afdc9c5a6 intel: Enable more LOD0 HIZ+CCS fast clears
For correct fast-clearing with HiZ+CCS, we require roughly 16x8
alignment of LODs. The next patch will cause drivers to ignore the
alignment of LOD0, so align the qpitch to 8 to avoid breakage and so
that fast clears will be enabled more often.

Prevents failures with the piglit test case:

	./bin/fbo-depth-array depth-clear -fbo

in the next patch.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30250>
2024-08-20 19:43:15 +00:00
Kenneth Graunke
d22d6d814d intel/brw: Fix Xe2+ SWSB encoding/decoding for DPAS instructions
SBID SET can only be used on SEND, SENDC, or DPAS instructions.  The
existing code was handling SET for SEND/SENDC, but was using the wrong
encoding for DPAS.  Add a new case to handle that and make it clear that
the existing code is only for SEND/SENDC.

While here, rewrite the encoder to use 2-bit binary immediates shifted
up into the mode [9:8] field, rather than pre-shifted hex values.  This
matches the documentation better and is a little easier to follow.

On the decode side, we were incorrectly decoding MATH instructions.
Because they're marked is_unordered, we were hitting the SEND/SENDC
decoding, which is incorrect for MATH.

Fixes 22 cooperative matrix tests on Lunar Lake.

Huge thanks to Paulo Zanoni for bisecting failures to one of my commits,
then analyzing shaders and experimenting to discover that the failure
was really an unrelated bug, just being provoked by different choices of
registers.  His work narrowing the problem down made it much easier to
discover and fix this bug.

Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30705>
2024-08-20 19:09:37 +00:00
Kenneth Graunke
89f9a6e10b intel/brw: Pass opcode to brw_swsb_encode/decode
We're going to need to handle encoding/decoding differently for DPAS vs.
SEND/SENDC vs. other instructions.  Pass the opcode so we can figure out
the encodings for each type of instruction.

Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30705>
2024-08-20 19:09:37 +00:00
Rohan Garg
1f06e70bdc anv: migrate indirect mesh draws to indirect draws on ARL+
Backport-to: 24.2
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30690>
2024-08-20 09:41:51 +00:00
Rohan Garg
f69c74b6d5 anv: dispatch indirect draws with a count buffer through the XI hardware on ARL+
ARL+ can dispatch indirect draws through the hardware.

Backport-to: 24.2
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30690>
2024-08-20 09:41:51 +00:00
Rohan Garg
74cd70841d anv: refactor indirect draw support into it's own function
ARL+ supports some form of indirect draws, instead of trying to mash
support for indirect draws across various generations, let's make things
cleaner by factoring out XI support into it's own function.

Backport-to: 24.2
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30690>
2024-08-20 09:41:51 +00:00
Rohan Garg
c1af71c9c2 anv,iris: prefix the argument format with XI for a upcoming refactor
Backport-to: 24.2
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30690>
2024-08-20 09:41:51 +00:00
Rohan Garg
dc23db2a0d anv: program a custom byte stride on Xe2 for indirect draws
Xe2 allows us to program in a custom byte stride for indirect draws

Backport-to: 24.2
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30690>
2024-08-20 09:41:50 +00:00
Tapani Pälli
d4e8c8f874 anv: move setting 3DSTATE_CLIP::MaximumVPIndex from loop
Loop iterates viewports but for MaximumVPIndex we only need viewport
count and last stage that writes viewport.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30732>
2024-08-20 06:48:50 +00:00
Jianxun Zhang
8c623b6a7e Revert "anv: Disable PAT-based compression on depth images (xe2)"
This reverts commit 6073f091bb.

With the progress on Xe2 platforms, we are not seeing many issues
caused by compression on depth buffers.

Backport-to: 24.2
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30653>
2024-08-19 17:50:10 -07:00
José Roberto de Souza
12656571fd anv/gfx20: Enable depth buffer write through for multi sampled images
BSpec: 56419
Backport-to: 24.2
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29615>
2024-08-19 20:04:36 +00:00
Nanley Chery
ebe3eabda6 anv: Add want_hiz_wt_for_image()
Backport-to: 24.2
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29615>
2024-08-19 20:04:36 +00:00
José Roberto de Souza
2553878fba intel/isl/gfx20: Alow hierarchial depth buffer write through for multi sampled surfaces
BSpec: 56419
Backport-to: 24.2
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29615>
2024-08-19 20:04:36 +00:00
Lionel Landwerlin
e10cbb59a5 anv: add assert to detect problematic instruction merges
We stick to a rule in the driver that each field is only set in a
single place in the driver. Therefore when merging instructions, we
should never have any bit set to 1 from both sides.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30684>
2024-08-19 11:02:44 +00:00
Lionel Landwerlin
982106e676 anv: only set 3DSTATE_CLIP::MaximumVPIndex once
Currently we can end up merging 2 prepacked 3DSTATE_CLIP instructions
where 2 different places in the driver fill the MaximumVPIndex.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 50f6903bd9 ("anv: add new low level emission & dirty state tracking")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30684>
2024-08-19 11:02:44 +00:00
Lionel Landwerlin
7c73346549 anv: remove unused macro
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30684>
2024-08-19 11:02:44 +00:00
Lionel Landwerlin
9eff285a46 anv: fix extended buffer flags usages
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: bcc0ec8e6c ("anv: enable KHR_maintenance5")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30714>
2024-08-19 10:13:09 +00:00
Caio Oliveira
40f77b6936 intel/brw: Avoid modifying the shader in assign_curb_setup if not needed
If there are no uniforms to push, don't emit the AND or invalidate the
shader analysis.  This affects only compute shaders.

Not a significant impact since lots of shaders end up pushing
uniforms.  Fossil-db numbers (restricted to compute pipelines only) for DG2

```
Totals:
Instrs: 3071016 -> 3070894 (-0.00%)
Cycle count: 8320268863 -> 8320264519 (-0.00%)

Totals from 122 (2.70% of 4520) affected shaders:
Instrs: 10675 -> 10553 (-1.14%)
Cycle count: 2060003 -> 2055659 (-0.21%)
```

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30631>
2024-08-17 16:25:01 -07:00
José Roberto de Souza
38c989ada2 anv: Nuke anv_utrace_submit::trace_bo
There is no usage for this bo.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30676>
2024-08-16 19:38:19 +00:00
José Roberto de Souza
f7b386bd6d anv: Use batch_bo_pool in utrace anv_async_submit_init() calls
In pratical the only change here is that batch_bo_pool
are captured to error dumps.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30676>
2024-08-16 19:38:19 +00:00
José Roberto de Souza
168e26fc04 anv: Add trivial_batch and query-pool to the error capture
Those are batch buffers that are not allocated from batch_bo_pool,
so they were left out of error capture without the capture-all
parameter.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30676>
2024-08-16 19:38:18 +00:00
Sagar Ghuge
c4f2a8d984 intel/compiler: Fix indirect offset in GS input read for Xe2+
Make sure to take new GRF size into consideration and adjust the
indirect offset according to new size so that when we do the indirect
load with address register, we load right values.

This helps pass the following tests:
   - dEQP-VK.binding_model.descriptor_buffer.mutable_descriptor.*geom*
   - dEQP-VK.ray_query.*geometry_shader.*

Backport-to: 24.2
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30679>
2024-08-16 18:40:13 +00:00
Ian Romanick
c8038643b8 intel/brw: Make ifind_msb SSA friendly
No shader-db changes on any Intel platform.

v2: Use negate(tmp) instead of creating a new temporary. Suggested by
Ken.

fossil-db:

Meteor Lake, DG2, and Skylake had similar results. (Meteor Lake shown)
Totals:
Instrs: 152535897 -> 152535883 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17112329592 -> 17112406110 (+0.00%); split: -0.06%, +0.06%

Totals from 40 (0.01% of 633223) affected shaders:
Instrs: 458813 -> 458799 (-0.00%); split: -0.01%, +0.00%
Cycle count: 4358016282 -> 4358092800 (+0.00%); split: -0.23%, +0.24%

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 150560511 -> 150560465 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15484534441 -> 15482372893 (-0.01%); split: -0.12%, +0.11%
Spill count: 59795 -> 59794 (-0.00%)
Fill count: 103513 -> 103509 (-0.00%)

Totals from 40 (0.01% of 632445) affected shaders:
Instrs: 368877 -> 368831 (-0.01%); split: -0.01%, +0.00%
Cycle count: 3918398264 -> 3916236716 (-0.06%); split: -0.49%, +0.43%
Spill count: 16896 -> 16895 (-0.01%)
Fill count: 27819 -> 27815 (-0.01%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>
2024-08-16 14:52:04 +00:00