061b8bfd29 moved handling of fixed operands earlier, but it should have
moved the fixing of writelane operands earlier too.
This fixes Crucible's func.uniform-subgroup.exclusive.imin64 on GFX8.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 061b8bfd29 ("aco/ra: rework fixed operands")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27583>
According to Valgrind, vcc/m0 are uninitialized and this fixes it.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27583>
We've implemented another workaround completely disabling high
priority preemption.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e6e320fc79 ("anv: make Wa_16013994831 to use intel_needs_workaround")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27716>
With TES, the primitive ID is an input variable but it's considered a
sysval by SPIRV->NIR. Though, its value is greater than
VARYING_SLOT_VAR0 which means its location was adjusted by mistake.
This fixes compiling a tessellation evaluation shader in debug build
with Enshrouded.
Fixes: dfbc03fa88 ("spirv: Fix locations for per-patch varyings")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27413>
In the past, we didn't have a good solution for combining scalar loads
with a variable index plus a constant offset. To handle that, we took
our load offset and rounded it down to the nearest vec4, loaded an
entire vec4, and trusted in the backend CSE pass to detect loads from
the same address and remove redundant ones.
These days, nir_opt_load_store_vectorize() does a good job of taking
those scalar loads and combining them into vector loads for us, so we
no longer need to do this trick. In fact, it can be better not to:
our offset need only be 4 byte (scalar) aligned, but we were making it
16 byte (vec4) aligned. So if you wanted to load an unaligned vec2,
we might actually load two vec4's (___X | Y___) instead of doing a
single load at the starting offset.
This should also reduce the work the backend CSE pass has to do,
since we just emit a single VARYING_PULL_CONSTANT_LOAD instead of 4.
shader-db results on Alchemist:
- No changes in SEND count or spills/fills
- Instructions: helped 95, hurt 100, +/- 1-3 instructions
- Cycles: helped 3411 hurt 1868, -0.01% (-0.28% in affected)
- SIMD32: gained 5, lost 3
fossil-db results on Alchemist:
- Instrs: 161381427 -> 161384130 (+0.00%); split: -0.00%, +0.00%
- Cycles: 14258305873 -> 14145884365 (-0.79%); split: -0.95%, +0.16%
- SIMD32: Gained 42, lost 26
- Totals from 56285 (8.63% of 652236) affected shaders:
- Instrs: 13318308 -> 13321011 (+0.02%); split: -0.01%, +0.03%
- Cycles: 7464985282 -> 7352563774 (-1.51%); split: -1.82%, +0.31%
From this we can see that we aren't doing more loads than before
and the change is pretty inconsequential, but it requires less
optimizing to produce similar results.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27568>
We need to allocate "shared size" bytes for each workgroup but
we were incorrectly multiplying by the number of workgroups in
each supergroup instead, which would typically cause us to allocate
less memory than actually required.
The reason this issue was not visible until now is that the kernel
driver is using a large page alignment on all BO allocations and
this causes us to "waste" a lot of memory after each allocation.
Incidentally, this wasted memory ensured that out of bounds
accesses would not cause issues since they would typically land
in unused memory regions in between aligned allocations, however,
experimenting with reduced memory aligments raised the issue,
which manifested with the UE4 Shooter demo as a GPU hang caused
by corrupted state from out of bounds memory writes to CS
shared memory.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27675>
this is the case where:
* a batch A is submitted
* a no-op flush occurs
* the frontend gets the fence from already-flushed batch A
* zink recycles batch A
* the frontend waits on fence A
fixes#10598
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27623>
They are still the same, but we don't rely on the BRW compiler
specific symbols. STATIC_ASSERT catches at compile time if they
change independently. At some point we might revisit the need
for them to match.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27646>
Once the brw_*_prog_data are available, copy down all the relevant
fields to iris_compiled_shader (and iris_*_data corresponding structs)
so that most of Iris code will be independent of brw types.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27646>
At this point in the code, the prog_data is always non-NULL (and was
already used before by setup_constant_buffers() to fill push_bos.
Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27646>
With the advent of software scoreboarding, we emit sync instructions
in various places to synchronize the execution pipelines. This results
in assembly being littered with a bunch of sync.nop instructions. That
means that when you reorder anything in the program, the scoreboarding
changes, and the number of sync.nops can vary wildly - even if the code
isn't really materially better or worse. This makes it hard to use
tools like shader-db or fossil-db on post-Icelake platforms.
For now, exclude sync.nops from the instruction count statistic. One
day we may want to consider improving the software scoreboarding pass
to emit fewer redundant sync.nop instructions, at which point tracking
this as a separate stat might be useful. For now though, it's simply
cluttering and confusing our results.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27701>
The atomics lowering only applies to Gfx <= 7.5.
The get_size lowering only applies to Gfx <= 8.
Note: the lower_store still applies for Gfx9+, to perform color
conversion.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27638>
To avoid generic pointers, makes the NIR prints a bit more readable.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: 41b2ed65 ("genxml: generate opencl packing headers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27637>
When moving the static part, I missed that the
pipeline->primitive_id_override field isn't set yet when we check it
to emit 3DSTATE_TE.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 1e081bd680 ("anv: split 3DSTATE_TE packing between static & dynamic parts")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27692>
We don't need to start iterating from `pProperties`, as the first member
is already handled in vk_common_GetPhysicalDeviceProperties2. Eliminate
this iteration by starting from pProperties->pNext.
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27671>