On Xe platforms, many fragment shaders have patterns like:
asr(8) g21<2>W g1.2<0,1,0>W 15D
...
mov(8) g11<1>UW g21<16,8,2>UW
...
not.nz.f0.0(8) null<1>D g11<8,8,1>W
Converting the NOT.NZ to MOV.Z enables copy propagation to eliminate the
original MOV. Then cmod propagation is able to eliminate the
NOT-converted-to-MOV.
It might be possible to cover this case by adding more opcodes to the
list NOT can propagate to. The next commit will show that just
converting to MOV is a better approach anyway.
v2: Fix a bad squash. Changes that were supposed to be in this commit
were accidentally in the next commit. Noticed by Ivan.
shader-db:
Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
total instructions in shared programs: 20069804 -> 20065167 (-0.02%)
instructions in affected programs: 592450 -> 587813 (-0.78%)
helped: 2300 / HURT: 0
total cycles in shared programs: 884534032 -> 884496201 (<.01%)
cycles in affected programs: 13064194 -> 13026363 (-0.29%)
helped: 1285 / HURT: 790
LOST: 18
GAINED: 15
fossil-db:
Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 234506495 -> 234468664 (-0.02%)
Cycle count: 24444825202 -> 24445710703 (+0.00%); split: -0.01%, +0.01%
Max live registers: 42349793 -> 42349789 (-0.00%)
Max dispatch width: 7131344 -> 7131744 (+0.01%); split: +0.05%, -0.04%
Totals from 16673 (2.07% of 805781) affected shaders:
Instrs: 6497669 -> 6459838 (-0.58%)
Cycle count: 435877770 -> 436763271 (+0.20%); split: -0.54%, +0.74%
Max live registers: 1122972 -> 1122968 (-0.00%)
Max dispatch width: 151528 -> 151928 (+0.26%); split: +2.19%, -1.92%
No shader-db or fossil-db on any other Intel platforms.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>
brw_opt_constant_fold_instruction can either do nothing or replace the
instruction with a MOV of an immediate value. Previously each opcode
case would perform this replacement, and code at the bottom of the
function would verify the results.
It is much simpler if each opcode case calculates a result in a brw_reg,
and code at the bottom of the function performs the replacement. There
are two outlier cases that cannot use this pattern: MAD and
BROADCAST. These cases simply return directly from the switch-statement
after performing the replacement.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34707>
In multi-layered framebuffer LRZ also has several layers and binning
pass needs to write depth to a correct layer, so binning VS needs
VARYING_SLOT_LAYER.
Fixes: 9775b33d0f ("tu: Enable GMEM with layered rendering")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34728>
The hardware expects textureGrad (txd) derivatives to be specified as
offset coordinates from the base texture coordinate, rather than raw
gradients.
This change fixes the majority of the
dEQP-GLES3.functional.shaders.texture_functions.texturegrad.* tests on
GC7000. Remaining failures are due to shadow sampler handling, which
will be addressed separately.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34704>
This is already supported, we just need to toggle the switch.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34695>
This will be needed for indirect draws, where the VS driver desc table
needs to be updated per-draw, which also forces us to have one resource
table per draw.
Signed-off-by: Ryan Mckeever <ryan.mckeever@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Olivia Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34617>
With indirect multi draw, we'll need to allocate N times the vertex
FAUs so we can patch each of them with the draw parameters.
Add a repeat_count argument to allow that.
Signed-off-by: Ryan Mckeever <ryan.mckeever@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Olivia Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34617>
apitrace at commit b6102d10 has a bugfix regarding a recent regression
in GPU frame time calculation that made we miss weeks of performance
data.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34713>
While not yet officially conformant, we support all the required
features, and we pass the CTS. Let's mark off Vulkan 1.2, to make things
easier for applications.
Backport-to: 25.1
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34512>
The register file tests here should be done after update_renames().
Normally, get_reg() wouldn't have to move anything to make space for a 1-3
byte definition. This was encountered with skip_optimistic_path=true and a
get_reg_impl() bug (fixed in a later commit) which caused suboptimal
register assignment.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34679>
We added CmdBindVertexBuffers2(), the rest is handled by the core.
Advertising this feature also works around a CTS bug that was
calling CmdSetPatchControlPointsEXT() without checking for this
extension.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Olivia Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34696>
The vk runtime has a helper to help us with the strides, we just
need to use vi_binding_strides[] when emitting the attributes.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Olivia Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34696>
This is the most serious bug we've had in a long time due to a fundamental
misunderstanding of the hardware (due to incomplete reverse-engineering). It
caught me off guard.
The texture descriptor has "mode" bits which configure two aspects of how the
address pointer is interpreted:
* whether it is indirected, pointing to a secondary page table for sparse
* whether it writes texture access counters (for Metal's idea of sparse).
...Neither of these is a "null texture" mode.
So why did I see Apple's blob using a non-normal mode for null textures, and why
did I copy those settings?
1. Because the hardware texture access counters provide a cheap way to detect
null texture accesses after the fact, which I think their GPU debug tools
use. I'm not sure why release builds of the driver do/did that, but whatever.
2. Because I assumed Cupertino knew best and I didn't bother looking too close.
We can't use them here (without doing extra memory allocations), since then
the GPU will increment access counters. And since our null texture address used
to just be a pointer in the command buffer, that mean the GPU will trash
whatever memory happened to be 0x400 bytes after the start of the null texture
descriptor. The symptom being random faults.
This bug was caught when trying to use the zero-page instead, which raised a
permission fault when the GPU tried to write counts. Then I remembered the
sparse mechanism and had a bit of a eureka moment. Immediately followed by an
"oh, f#$&" moment as I realized how many random bugs could potentially be root
caused to this.
The fix is two-fold:
1. Use normal layout instead.
2. Set the address to the zero-page (which is a fixed VA) and detect null
textures by checking the address, instead of the mode.
The latter is a good idea anyway, but both parts needs to be done atomically to
maintain bisectability.
Backport-to: 25.1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34703>
The src[2] value 0x1100 is set based on observed behavior of the blob driver,
though its exact meaning remains to be documented.
Passes all dEQP-GLES3.functional.shaders.texture_functions.texelfetch.*
tests on GC7000.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34685>
The LOD must be a float, unlike the GLSL function, which expects an integer.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34685>
This instruction is used to implement texelfetch.
Blob generates such txf's for
dEQP-GLES3.functional.shaders.texture_functions.texelfetch.+
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34685>
Fixes crash in Piglit arb_separate_shader_object-xfb-explicit-location-array
test.
Signed-off-by: Brian Paul <brian.paul@broadcom.com>
Reviewed-by: Roland Scheidegger <roland.scheidegger@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34635>