_mesa_update_state() receives the _NEW_PROGRAM flag, so we can handle
any shader changes there.
There may be some overhead reduction because gfx_shaders_may_be_dirty
is removed.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19859>
In a tesselation control shader where an input array is accessed using
the index gl_InvocationID, we can end up accessing elements beyond the
number of input vertices specified in the shader key.
This happens because of the lowering in nir_lower_indirect_derefs().
This lowering will affect compact variables which happens in this
case :
in gl_PerVertex {
vec4 gl_Position;
float gl_ClipDistance[1];
} gl_in[gl_MaxPatchVertices];
The lowered code produced by NIR is somewhat ineffecient (implements a
binary seach) :
if (gl_InvocationID < 16) {
if (gl_InvocationID < 8) {
if (gl_InvocationID < 4) {
vec4 vals = load_at_offset(0);
value = bcsel(vals, gl_InvocationID);
} else {
vec4 vals = load_at_offset(4);
value = bcsel(vals, gl_InvocationID - 4);
}
} else {
if (gl_InvocationID < 12) {
vec4 vals = load_at_offset(8);
value = bcsel(vals, gl_InvocationID - 8);
} else {
vec4 vals = load_at_offset(12);
value = bcsel(vals, gl_InvocationID - 12);
}
}
} else {
if (gl_InvocationID < 24) {
...
} else {
...
}
}
By default the gl_MaxPatchVertices must be set at 32 items and that's
what the lowering code will use to divide the access into chunks of 4.
But when running with 3 input vertices, this means we'll pull one more
item than what was delivered in the shader payload.
This triggers issues further down the register scheduling where the
g5UD (register for the 4th item) is overwritten by a previous SEND,
leading the URB read to use an invalid handle.
This pass clamps any access load_per_vertex_input intrinsic vertex
indice to (input_vertices - 1).
Fixes issues with tests like :
dEQP-VK.clipping.user_defined.clip_distance.vert_tess.*
Also fixes a hang with zink/anv on :
KHR-GL46.draw_elements_base_vertex_tests.AEP_shader_stages
v2: Don't replace source register
v3: Implement in NIR
v4: Clamp per vertex array sizes in NIR (Jason)
v5: Move the clamping on the intel compiler
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9749>
If points or lines are drawn using the polygon mode, the guardband
should be adjusted for large points/lines.
Cc: 22.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20185>
While running VK-CTS with valgrind, the application hit the max
thread count of 500. After further investigation, this was due to
multiple instances being created with the disk cache spinning up
worker threads which wouldn't be cleaned as disk_cache_destroy
wasn't being called.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20178>
We lost the race in a recent MR of mine.
Fixes: 381e0b43d6 ("mesa: Add test to prevent windows.h to be included in shared headers")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20170>
The `coord_offset` pass is responsible for upgrading any eligible
texture loads into prefetches, but a texture prefetch's capabilities
are limited and cannot handle any interpolation modes aside from
`smooth`.
An exception is carved out for `flat` interpolation modes, but this
doesn't exclude upgrading `noperspective` texture loads and results
in perspective-corrected samples being provided that can severely
break applications depending on this behaviour.
Fixes incorrect lighting projection on Super Mario Odyssey on
Skyline Emulator.
Fixes incorrect dirt texture mapping on Portal 2 trace on Turnip and
Zink on Turnip.
Fixes incorrect lighter shadowing on Half Life 2 trace on Turnip.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19842>
`coord_offset` is called on the source of `alu` instructions and
it returns -1 for failures, this not explicitly checked for and
as a result the fetch can incorrectly be upgraded to a prefetch
when it isn't appropriate to do so.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19842>
Otherwise, making a CS using the memory will use the uninitialized .map
value (when checking the size of the CS in in begin's tu_cs_is_empty()
check), causing valgrind noise in
dEQP-VK.binding_model.descriptorset_random.sets4.dynindexed.ubolimitlow.sbolimitlow.sampledimghigh.lowimgsingletex.iublimitlow.nouab.vert.noia.0
(thanks to vi_info->vertexBindingDescriptionCount==0).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20173>
This reduces the number of instructions in task shaders when payload
size is not aligned to vec4 and payload_in_shared WA is enabled,
because nir_lower_task_shader will not need to handle the unaligned
size case.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20080>
We were not taking into account that when all invocations within workgroup
are active, we'll copy more data than needed, corrupting task payload
of other workgroups.
Fixes: 8aff8d3dd4 ("nir: Add common task shader lowering to make the backend's job easier.")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20080>
Death stranding does scratch_arr[80-idx]. This doesn't seem to work if we
try to combine the subtraction into the access.
fossil-db (navi21):
Totals from 52 (0.04% of 135636) affected shaders:
Instrs: 78560 -> 79036 (+0.61%)
CodeSize: 427940 -> 431188 (+0.76%)
Latency: 1313809 -> 1318142 (+0.33%)
InvThroughput: 292833 -> 293842 (+0.34%)
VClause: 2361 -> 2555 (+8.22%); split: -0.51%, +8.73%
Copies: 8767 -> 8746 (-0.24%); split: -0.35%, +0.11%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 0e783d687a ("aco: use scratch_* for scratch load/store on GFX9+")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7735
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20117>
It's illegal and LLVM always knows which intrinsics don't read memory.
This started failing IR validation with LLVM 16.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20146>
READONLY is illegal on calls. Others were unused.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20146>
D3D10 established that rasterization should be discarded when a null PS was
bound, and depth/stencil state was disabled, and llvmpipe followed those
semantics. Nowadays all APIs have explicit rasterization discard flag,
and so does Gallium, so it's better for llvmpipe to faithfully follow
that flag, and trust the state tracker to follow the right semantics.
Second guessing pipe_rasterizer_state::rasterizer_discard actually
causes problems, specially when no depth-stencil surface is bound, as
D3D10 mandates rasterization should still happen, yet among all the
translation layers it often happens depth-stencil enablement is
optimized away when no depth-stencil is bound, which in turn was causing
llvmpipe to disable rasterization when it shouldn't.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20155>
When a null PS is bound, the
pipe_query_data_pipeline_statistics::ps_invocations counter should not
be incremented.
However llvmpipe can't cope with a null PS bound, requiring the state
tracker to bind an empty pixel shader instead. llvmpipe infers empty
TGSI pixel shaders by looking tgsi_shader_info::num_instructions, as an
empty shader should have a single END instruction, but this logic wasn't
working for NIR shaders.
I mulled over the possibility of making llvmpipe handle null pixel
shaders. Spreading null checks everywhere would be invasive and error
prone, but it would be quite simple if llvmpipe simply created a dummy
PS internally, to be used as a replacement whenever a null PS was bound.
That said, I'm not sure if other gallium drivers can cope with a null PS
neither, and if nought, might as well keep using an empty PS in lavapipe
state tracker. An any rate, this change makes sense on its own.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20155>