Those intrinsics have different semantics in particular with regards
to divergence. Turning one into the other without invalidating the
divergence information breaks NIR validation. But also the conversion
means we get artificially less convergent values in the shaders.
So just handle load_push_constants in the backend and stop changing
things in Anv.
Fixes a bunch of tests in
dEQP-VK.descriptor_indexing.*
dEQP-VK.pipeline.*.push_constant.graphics_pipeline.dynamic_index_*
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34546>
(cherry picked from commit df15968813)
Tested on BMG and PTL using both settings for RT_CTRL.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35044>
(cherry picked from commit 5828612da2)
Xe driver will be disabling the HW functionality for null any-hit
shaders, drivers need to take care of it instead. This commit brings
back parts of older workaround (see b0624e414f) we used to have to
handle the null any-hit case.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35044>
(cherry picked from commit 0f591425c9)
The 2 helpers we're using for doing internal operations (copies,
command generation, etc...) can work on command buffers or lower level
batches.
When working with command buffers, the helpers should set the
preemption using genX(cmd_buffer_set_preemption) so that whatever
operation comes after toggles the state back to what it needs and we
minimize the toggles.
When working with batchs, the helpers should disable preemption using
genX(batch_set_preemption) and turn it back on when done.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35030>
(cherry picked from commit c570740272)
The issues preventing it to be enabled were fixed so now we can enable
it but we need also to enable workaround 16013994831 back again.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34988>
(cherry picked from commit 3cd972a2d3)
Description of this workaround are not clear but looking at Iris
implementation we need to emit all 3DSTATE_PUSH_CONSTANT_ALLOC_XS if
any 3DSTATE_PUSH_CONSTANT_ALLOC_XS is emitted.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34988>
(cherry picked from commit 2432d6677e)
In a case like this :
block_0:
%5 = ...
%6 = ...
block_1:
%7 = load_interpolated_input %5, %6
The current logic would move load_interpolated_input to block_0 before
%5 but not move %5 & %6 which are sources of that instruction.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
(cherry picked from commit 6230f3029f)
Some intrinsics are implemented by reading memory location that could
be rewritten by a further tracing calls. So we need to move those
reads prior to tracing operations in the shaders.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8979
Tested-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34214>
(cherry picked from commit c434050a00)
intel_decoder_init() initializes intel_batch_decode_ctx so later
we can call decode functions but it depends on data stored in
brw/elk_isa_info but that was being allocated in stack
of intel_decoder_init() then when the decode functions were executed
it was accessing garbage at the brw/elk_isa_info memory.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ec2d20a70d ("intel/tools: Add helpers for decoder_init/disasm")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34776>
(cherry picked from commit 3e5a735d01)
Or current render target cache setting is to key on the binding table
index, meaning the HW associates a number in the range [0, 7] to a
RENDER_SURFACE_STATE description. If you want change the render target
0 between 2 draw calls, you need to insert a PIPE_CONTROL in between
the 2 draw calls with pb-stall + rt-flush in order to flush an writes
to a previous RENDER_SURFACE_STATE that has now becomed disassociated
with the [0, 7] number.
This PIPE_CONTROL taking care of the flush is dealt with in
cmd_buffer_maybe_flush_rt_writes(). This function diffs the current
BTI setup for render targets (first 0 to 7 BTIs) with what the next
fragment shader wants.
The issue here is we might have a render pass with 0 color attachments
and yet in 98cdb9349a we added one pointing to the render target 0,
but in the emit_binding_table() when we finally program the BTI, we
check the render pass color count and program a null surface state
instead of an actual surface state. And this leads to hangs because
the render target cache will end up with inconsistent state data.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 98cdb9349a ("anv: ensure null-rt bit in compiler isn't used when there is ds attachment")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12955
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34603>
(cherry picked from commit 63f633557f)
This brings back c9e33e5cbf ("intel/fs/cse: Make HALT instruction act
as CSE barrier."), from the old CSE pass into the new one.
Fixes new CTS test: dEQP-VK.subgroups.shader_quad_control.terminated_invocation
Fixes: 9690bd369d ("intel/brw: Delete old local common subexpression elimination pass")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34643>
(cherry picked from commit 29d7b90cfc)
For Xe2+, from Bspec 64643, bit field "StackID": The maximum number of
StackIDs can be 2^12- 1.
Cc: mesa-stable
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34709>
(cherry picked from commit 821c1bfa7e)
In float pointing rules adding +0.0f preserves all values except
for -0.0f, so what we want here is to add -0.0f. In the future
we should add proper support for float immediates in the assembler.
Fixes: fafdd24285 ("intel/executor: Update bfloat example")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>
(cherry picked from commit 3e0418ba02)
Most of the churn in this commit is changing unit tests that were
testing things that are now invalid.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17122204 -> 17122669 (<.01%)
instructions in affected programs: 120669 -> 121134 (0.39%)
helped: 0 / HURT: 124
total cycles in shared programs: 895602370 -> 895613210 (<.01%)
cycles in affected programs: 17868974 -> 17879814 (0.06%)
helped: 35 / HURT: 85
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 210736518 -> 210743769 (+0.00%)
Cycle count: 30377733040 -> 30377699060 (-0.00%); split: -0.00%, +0.00%
Max live registers: 66056852 -> 66056966 (+0.00%)
Totals from 1505 (0.21% of 706776) affected shaders:
Instrs: 1890151 -> 1897402 (+0.38%)
Cycle count: 48397408 -> 48363428 (-0.07%); split: -0.11%, +0.04%
Max live registers: 256821 -> 256935 (+0.04%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>
(cherry picked from commit e26270249b)
When I originally wrote that code, I didn't understand what a jerk NaN
can be.
v2: Remove the brw_type_is_uint stuff. This function is currently only
called for float types. In a later commit, integer types will be
supported but only for NZ and Z conditions. Noticed by Matt.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17122197 -> 17122204 (<.01%)
instructions in affected programs: 1691 -> 1698 (0.41%)
helped: 0 / HURT: 4
total cycles in shared programs: 895602484 -> 895602370 (<.01%)
cycles in affected programs: 912964 -> 912850 (-0.01%)
helped: 2 / HURT: 2
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 210736388 -> 210736518 (+0.00%)
Cycle count: 30377728900 -> 30377733040 (+0.00%); split: -0.00%, +0.00%
Totals from 130 (0.02% of 706776) affected shaders:
Instrs: 169911 -> 170041 (+0.08%)
Cycle count: 18021210 -> 18025350 (+0.02%); split: -0.00%, +0.02%
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>
(cherry picked from commit 0dab520a19)
One more instruction were the MOCS value was splited into two
registes.
Cc: mesa-stable
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34592>
(cherry picked from commit fcb6dfb29c)
Xe2 changed the MOCS field in few instructions, those now have a field
for the MOCS index and other the encryption enable bit but ISL returns
the combination of both aka MEMORY_OBJECT_CONTROL_STATE.
To minimize changes I have added 2 macros to extract the values
from the value returned by isl.
From all the instructions changed Mesa only make use of two, so the
other instruction will be handled in the next patch.
Cc: mesa-stable
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34592>
(cherry picked from commit 161c412a82)
Copy engine is not used in gfx12 platforms on ANV but that is possible
in Iris.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34560>
(cherry picked from commit a96e280dfe)
I accidentally disabled compression on CPS surfaces marked as storage or
color attachment for all platforms, when this should only be limited to
Xe.
Fixes: 80f9b6 ('anv: CPB surfaces that are used as color attachments or for stores cannot be compressed')
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34297>
(cherry picked from commit cbc1ec4f73)
Floating point SEL.CMOD may flush denorms to zero. We don't have enough
information at this point in compilation to know whether or not it is
safe to remove that.
Integer SEL or SEL without a conditional modifier is just a fancy
MOV. Those are always safe to eliminate.
See also 3f782cdd25.
Fixes: fab92fa1cb ("i965/fs: Optimize SEL with the same sources into a MOV.")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
The condition modifier on SEL means something completely different than
it means on MOV. On MOV it means to modify the flags based on the value
written to the destination. On SEL it means to compare the sources using
that mode and pick the result (i.e., as min() or max()) without
modifying the flags.
The resulting MOV should not have a condition modifier for the same
reason it (already) doesn't have a predicate. This bug was found by
inspection, so I added a unit test.
Fixes: fab92fa1cb ("i965/fs: Optimize SEL with the same sources into a MOV.")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
Floating point SEL.CMOD may flush denorms to zero. We don't have enough
information at this point in compilation to know whether or not it is
safe to remove that.
Integer SEL or SEL without a conditional modifier is just a fancy
MOV. Those are always safe to eliminate.
See also 3f782cdd25.
Fixes: fab92fa1cb ("i965/fs: Optimize SEL with the same sources into a MOV.")
No shader-db changes on any Intel platform.
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209903490 -> 209903492 (+0.00%)
Cycle count: 30546025224 -> 30546021980 (-0.00%); split: -0.00%, +0.00%
Max live registers: 65516231 -> 65516235 (+0.00%)
Totals from 2 (0.00% of 706657) affected shaders:
Instrs: 3197 -> 3199 (+0.06%)
Cycle count: 361650 -> 358406 (-0.90%); split: -10.05%, +9.15%
Max live registers: 300 -> 304 (+1.33%)
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
The condition modifier on SEL means something completely different than
it means on MOV. On MOV it means to modify the flags based on the value
written to the destination. On SEL it means to compare the sources using
that mode and pick the result (i.e., as min() or max()) without
modifying the flags.
The resulting MOV should not have a condition modifier for the same
reason it (already) doesn't have a predicate. This bug was found by
inspection, so I added a unit test.
No shader-db or shader-db changes on any Intel platform.
Fixes: fab92fa1cb ("i965/fs: Optimize SEL with the same sources into a MOV.")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
Elaborate on the packed/unpack restrictions, use ADD(x, 0.0f)
as a workaround for F->BF conversion.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34506>
Not supported. Some operations *do* work, but proper support
was removed since it also doesn't support DPAS.
Fixes: 9916cc1050 ("brw: Add BRW_TYPE_BF for bfloat16")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34506>
Gfx125+ has systolic, with exception for MTL and some ARL
variants. Update code and tests to use it.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34506>
This patch does a couple of things to make CL integration with drivers
as seamless as possible:
- We pull in opencl-c.h and opencl-c-base.h to stop relying on system
headers.
- Parts of libcl.h are moved to new headers that are incomplete CL-safe
variants of libc headers.
- A couple of util headers are changed to remove now unnecessary
__OPENCL_VERSION__ guards and make more headers CL safe.
- Drivers now include src/compiler/libcl and use headers like
macros.h,u_math.h instead of libcl.h.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33576>
This allows us to create temporary VGRFs that are larger than
MAX_VGRF_SIZE(devinfo), which will be split eventually. They may not
be split on the initial pass, because we may need LOAD_PAYLOAD lowering,
copy propagation, and so on to occur first. So we allow registers to
exceed that size initially.
The "Register allocation relies on split_virtual_grfs()" assertion in
brw_reg_allocate.cpp still asserts that all VGRFs which reach the
register allocator have been properly split.
One case where this is useful is for vectorizing convergent block loads.
We create temporaries to splat the SIMD1 values out to SIMD(N), which
can lead to some very large temporaries. However, copy propagation and
so on ultimately eliminate these and they'll get split down to proper
sizes or elided entirely in the end.
(Note: both this and the prior commits from this merge request are
needed to close the linked issue.)
Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12324
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>
Post-RA scheduling doesn't use liveness analysis, so we continue using
MAX_VGRF_SIZE(devinfo). But for pre-RA scheduling, we now use
live->max_vgrf_size.
This helps get us to a place where we can emit arbitrarily large VGRFs
early on in compilation, but which will be split and cleaned up prior to
register allocation. It may also allocate smaller arrays in practice
since MAX_VGRF_SIZE(devinfo) assumes the worst case scenario for things
we actually could need to allocate.
Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>
We're already looking at this data to calculate the per-component
vars_from_vgrf[] and vgrf_from_vars[] mappings, so just record the
largest VGRF size while we're here. This will allow passes to size
arrays based on the actual size needed, rather than hardcoding some
fixed size. In many cases, MAX_VGRF_SIZE(devinfo) is larger than
necessary, because e.g. vec5 sparse sampling results aren't used.
Not hardcoding this means we can also temporarily handle very large
VGRFs which we know will be split eventually, without having to
increase the maximum which is ultimately used for RA classes.
Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>
Not used anymore, just use the existing ADL definitions.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34433>
Some upcoming changes in the runtime will make it impossible to rely
on the pipeline or runtime information to know whether a fragment
shader has input attachments.
Instead we gather that information at compile time and store it in our
shader bind_map.
At runtime we check whether the fragment shader has input attachments
and whether those map to the runtime depth/stencil input attachments
to set the 3DSTATE_PS_EXTRA::PixelShaderKillsPixel.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d2f7b6d5a7 ("anv: implement VK_KHR_dynamic_rendering_local_read")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
If you suspect a workload is failing because it needs more memory, you
can set ANV_SYS_MEM_LIMIT=100 to give it all the memory available.
This could make, for example, certain games start working (it really
depends on how much RAM you have and how much the game wants).
If you suspect a workload is too resource hungry, you can try to limit
it with ANV_SYS_MEM_LIMIT=30 (or some other value) to see if it can
deal with the more restricted environment and behave accordingly.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28513>
"We paid for sixteen gigs of RAM, so we gonna use the whole damn
sixteen gigs of RAM!"
- My Mom
First, some history:
The Anv 50%-or-75% rule was originally added in 2017 by 060a6434ec
("anv: Advertise larger heap sizes"). When i915.ko started reporting
memory sizes in its ioctls, it didn't impose any restrictions: 100% of
SRAM was reported as available, so the restriction was in Mesa. When
xe.ko was introduced, it only reported 50% of the SRAM as available
through its ioctls, so commit b571ae6e7a ("intel: Make memory heaps
consistent between KMDs") adapted the code to not take an extra 25% of
the 50% that was already cut, and restricted i915.ko to 50% instead of
the 50%-or-75%. In Kernel commit d2d5f6d57884 ("drm/xe: Increase the
XE_PL_TT watermark"), xe.ko changed to reporting 100% of SRAM through
its ioctls, so we adapted Mesa to do the right thing depending on
which Kernel version was running.
While this was all happening, we were discussing about which behavior
was actually the best: restrict everything to 50% in order to avoid
issues when many things are running in parallel, or keep the
restriction only at 75% in order to allow high demanding workloads to
make full use of the hardware.
The way I see, if parallel applications are causing the system to run
out of resources, the user always has the option to kill applications
and use one thing at a time. On the other hand, if a single
application needs more than 50% of the SRAM and we don't allow it in
our heaps, the application will never work (unless, of course, the
user patches Mesa). So in this commit we go back to allowing
high-demanding applications to work by restoring the 50%-or-75% rule.
This commit is especially useful in systems with integrated graphics,
like LNL, where the option to upgrade RAM is not present.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28513>