The ambiguity of the Vulkan spec was clarified, and we don't need to
support sparse depth/stencil with exactly the same number of samples
as non-sparse.
If you want to pass CTS, you'll need VK-GL-CTS commit 03976477f521
("Don't require more than VK_SAMPLE_COUNT_1_BIT for non-color sparse
resident images").
This is essentially a revert of d5da6980d3 ("anv/sparse: don't
support depth/stencil with sparse") and 7b337e214d ("anv: remove
dead code").
Thanks to Iván Briano for working with Khronos to get clarification on
the spec and for implementing the VK-GL-CTS fix.
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37423>
While Jay overwrites sparse_tex->op with the newer opcodes that only
return red and the sparse stuff, BRW keeps using the original opcode
of the cloned instruction, so it can't change def->num_components.
This was not previously detectable since we did not have sparse
enabled for depth/stencil on Anv for a while. A patch to re-enable
that was proposed a while ago (MR !37423), never merged, but then a
recent attempt to try to merge it (by me) detected this regression.
Let's fix the regression first, then we can finally re-enable sparse
depth/stencil support in Anv, hopefully.
Fixes: 7468261d3d ("intel/nir: Make intel_nir_lower_sparse work for either brw or jay")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37423>
This removes 18042479026 as we don't utilize BRW_AOP_MOV in compiler
and adds missing xe2 entries for 14025112257.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41281>
I originally added this mechanism to have the first (SIMD8) compile
note that certain features were in use which would prevent SIMD16/32
from compiling, so we could skip the work of trying those.
But these days, there aren't many cases, and the ones we have are
easily detectable based on the NIR. We can detect it earlier without
even having to do the SIMD8 compile.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
We checked that ver is 11 or 12. It can't be >= 20. This is dead code.
Dual source blending on Xe2 does not have native SIMD32 RT write message
support, but SIMD splitting is currently lowering it to low/high SIMD16
message pairs when using SIMD32 dispatch. I'm not aware of any of the
hardware errata from previous platform still applying.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
The new FRAG_RESULT_DUAL_SRC_BLEND option is easier to work with than
looking for FRAG_RESULT_DATA0 with an index of 1. This also means we
no longer care about the dual source blend index, and can just use the
FRAG_RESULT location. That cascades to meaning we no longer have to
store a tuple in driver_location. And, if we just need location, we
can avoid populating that at all and use nir_io_semantics to get it.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
We can just have iris look at its own program key and change the
fragment shader output variable's location/index in the NIR. By
doing this before lowering fragment shader outputs, the rest of
the output lowering does the right thing, and the backend no longer
has to consider hacks for broken OpenGL apps.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
Some vulkancts tests rely on vkGetImageMemoryRequirements to return the same
exact size after exporting and importing an image. This broke when we started
adding padding to sampled surfaces to manage overfetch, because the texture
usage flag does not get applied to the ISL surface when the image is recreated
using an explicit layout.
Fixes: 8d13628f7 ("isl: Add additional alignment/padding requirements to prevent overfetch")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41376>
This is less heavy handed, avoiding unnecessary stalls after SENDs in a
bunch of common cases. The stats (SIMD32) are:
Totals:
CodeSize: 70345392 -> 71674272 (+1.89%)
Totals from 1774 (67.02% of 2647) affected shaders:
CodeSize: 67359248 -> 68688128 (+1.97%)
What's happening here is we are inserting extra SYNC.nop instructions in a
bunch of cases for the .src preceding the eventual .dst. However, putting aside
the i-cache impact for a moment, this is showing the optimization doing what it
should (deferring dst syncs and inserting cheaper src syncs first). So this
should be positive in reality despite the negative stat impact.
The most hurt shaders are pooling up SYNC.nop's at the end of blocks due to
local-only SWSB and lack of SYNC.allwr optimization. The latter is added later
in this MR. The former is planned.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
update the tracking with what we actually waited on, not what we ideally wanted
to wait on. reduces extra annotations in some cases.
SIMD32:
Totals from 194 (7.33% of 2647) affected shaders:
CodeSize: 14473840 -> 14469088 (-0.03%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
IGC does these optimizations and I think they should be safe given my mental
model. Given a sequence like:
r0 = add.f32 r1, r2
r1 = add.f32 r3, r4
Each ALU pipe is pipelined but in-order. Therefore, the second add cannot
possibly complete before the first add, so it cannot write r1 before the first
add reads r1, so we can elide the write-after-read dependency. That in term
avoids a pipeline bubble between the two instructions. Ditto for
write-after-write.
Similarly if the distance is too great within an in-order pipe since there is a
maximum pipeline length, it's not infinite.
Note that if there was cross-pipe dependencies we do need the annotation since
the pipes themselves are parallel.
SIMD32:
Totals from 58 (2.19% of 2647) affected shaders:
CodeSize: 3316592 -> 3315056 (-0.05%); split: -0.05%, +0.00%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
Greedy post-RA substitution pass, similar to IGC's AccSubstitution pass.
Stats together with the previous commits.
SIMD16:
Totals from 2209 (83.45% of 2647) affected shaders:
Instrs: 2701029 -> 2696350 (-0.17%)
CodeSize: 39166720 -> 40372272 (+3.08%); split: -0.36%, +3.44%
SIMD32:
Totals from 2211 (83.53% of 2647) affected shaders:
Instrs: 4691165 -> 4641188 (-1.07%)
CodeSize: 69365792 -> 69341616 (-0.03%); split: -0.50%, +0.47%
The instruction count reduction is from RA shuffle code getting coalesced via
accumulators. The code size changes are from:
* Fewer moves from the instr count reduction (helped)
* Smaller MADs encoded as MACs (helped)
* Fewer SYNC.nop due to fewer scoreboarding annotations (helped)
* Less compaction due to explicit accumulator operands (hurt)
I expect significant cycle count changes from this but we don't have a cycle
model wired up yet, so reading the assembly will have to do.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
According to documents linked in HSD 1209977789, the push constant
allocation for PS stage is not applicable on Gfx12.5+ (removed). The
documents says push constant data is fetched by SBE in URB.
The HW must still parse the command and do nothing with it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39584>
sed -i "s/nir_src_parent_instr/nir_src_use_instr/" `find ./ -type f`
sed -i "s/nir_src_parent_if/nir_src_use_if/" `find ./ -type f`
sed -i "s/nir_src_set_parent/nir_src_set_use/" `find ./ -type f`
There are two kinds of "parent" in relation to a src/def:
- the instruction where the def or src's def is defined
- the instruction which the src is a part of and where the def is used
Clarify that the parent here is where the src's def is used, not where
it's defined.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41344>
This platform just needs a bit more care around vertex buffer state
emission.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31384>
Some kernel will be to large and potentially change too often to
really have a consistent count.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31384>
Up to now all push descriptor accesses where going through the binding
table. That's not going to be the case anymore with descriptor buffers
device bindable shaders. Those will do A64 messages to read the
descriptor buffer (for example when build a bounded 64bit address for
storage buffers, or 64bit image format atomic emulation, etc...)
We need to have the offset relative to the push descriptor heap
(internal_state_heap in this case).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31384>
When we don't know what shader is executed. We'll still have the bind
map from the indirect execution set.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31384>
We consider them like bindless stages (no binding table) as much as
possible.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31384>