When lowering tg4 sparse testing to a non-gather opcode, we were adding
an explicit LOD 0 parameter. But we might already have a LOD or bias.
Fixes tests like:
dEQP-VK.glsl.texture_gather.basic.2d.rgba8.base_level.sparse_level_1_amd_lod
dEQP-VK.glsl.texture_gather.basic.2d.rgba8.base_level.sparse_level_1_amd_bias
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41535>
This reverts commit 2ee6b4d96e.
The previous change avoids 0.25MB (1%) size change on the driver binary file,
but blocks the runtime enablement for some intel tools which is critical
to our optimization tasks.
It's not a good tradeoff based on the new need of the tool in runtime,
so revert this change.
Test: meson setup builddir -Dallow-fallback-for=libdrm -D build-tests=true -Dbuildtype=release --reconfigure && ninja -C builddir && cd builddir && meson test
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: hwandy <hwandy@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41525>
allows deleting piles of moves & pressure.
simd16 results:
Totals:
Instrs: 2759547 -> 2753358 (-0.22%); split: -0.29%, +0.06%
CodeSize: 41141280 -> 41071072 (-0.17%); split: -0.23%, +0.06%
Totals from 332 (12.54% of 2647) affected shaders:
Instrs: 648080 -> 641891 (-0.95%); split: -1.23%, +0.28%
CodeSize: 9782272 -> 9712064 (-0.72%); split: -0.97%, +0.25%
simd32 is a loss because of RA being stupid. again, this is obviously the right
thing to do so we're doing it. stats are just a hint.
Totals:
Instrs: 4683556 -> 4689193 (+0.12%); split: -0.25%, +0.37%
CodeSize: 70072256 -> 70171920 (+0.14%); split: -0.23%, +0.38%
Number of spill instructions: 50320 -> 50316 (-0.01%)
Number of fill instructions: 51530 -> 51526 (-0.01%)
Totals from 351 (13.26% of 2647) affected shaders:
Instrs: 1349954 -> 1355591 (+0.42%); split: -0.86%, +1.28%
CodeSize: 20484224 -> 20583888 (+0.49%); split: -0.80%, +1.29%
Number of spill instructions: 21762 -> 21758 (-0.02%)
Number of fill instructions: 26328 -> 26324 (-0.02%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
this is both a correctness fix (insufficient MEM registers reserved in some
cases) and a performance fix (unnecessary allocations & zeroing in the RA when
we don't spill).
fixes dEQP-VK.dgc.ext.compute.misc.scratch_space
stats are noise but positive i guess.
Totals from 35 (1.32% of 2647) affected shaders:
Instrs: 396770 -> 396690 (-0.02%)
CodeSize: 6040832 -> 6039600 (-0.02%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
poking around, it seems branches stall the pipelines so we don't need to do any
dataflow analysis, but we do need to fall through for correctness. just keep
going across block boundaries. this isn't optimal yet but it reduces a
pile of A@1's already.
Totals from 1389 (52.47% of 2647) affected shaders:
CodeSize: 56385376 -> 56325776 (-0.11%); split: -0.13%, +0.03%
--
this also fixes issues where the first instruction of a block is a SEND that has
an unmet register dependency, since the old code was fundamentally broken. oops.
lol. fixes
dEQP-VK.compute.pipeline.workgroup_memory_explicit_layout.zero.uint8_t_array_to_uint_array_1
among many others.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
Lets us use more accumulators, I think this is well motivated. Saw this in a
test shader.
Totals from 242 (9.14% of 2647) affected shaders:
Instrs: 1365060 -> 1365035 (-0.00%); split: -0.00%, +0.00%
CodeSize: 20678592 -> 20680096 (+0.01%); split: -0.01%, +0.02%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
We were previously assuming that potentially stale divergence data was
valid. On some paths the register pressure estimator would recalculate
this, but, as is obvious from the results, not always.
v2: Add an assertion in brw_from_nir_emit_impl to ensure we don't end
up in this situation again.
v3: Call nir_divergence_analysis from
brw_nir_lower_deferred_urb_writes. This fixes assertion failures (the
assertion added in v2) in basically every graphics shader. The
altnerative was to call it from brw_compile_vs, brw_compile_gs, and
brw_compile_tes.
shader-db:
All Intel platformms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17050403 -> 17054033 (0.02%)
instructions in affected programs: 296344 -> 299974 (1.22%)
helped: 0 / HURT: 376
total cycles in shared programs: 876063126 -> 875817316 (-0.03%)
cycles in affected programs: 78627328 -> 78381518 (-0.31%)
helped: 91 / HURT: 276
LOST: 1
GAINED: 10
fossil-db:
All Intel platformms had similar results. (Lunar Lake shown)
Totals:
Instrs: 913770429 -> 916075391 (+0.25%); split: -0.00%, +0.26%
CodeSize: 14647414640 -> 14726176320 (+0.54%); split: -0.02%, +0.56%
Cycle count: 102308091527 -> 102290664775 (-0.02%); split: -0.26%, +0.24%
Spill count: 3469632 -> 3469124 (-0.01%); split: -0.08%, +0.07%
Fill count: 5007038 -> 4998674 (-0.17%); split: -0.51%, +0.34%
Max live registers: 192568853 -> 192595355 (+0.01%); split: -0.00%, +0.02%
Max dispatch width: 48713168 -> 48712880 (-0.00%); split: +0.00%, -0.00%
Non SSA regs after NIR: 140252767 -> 140253718 (+0.00%)
Totals from 223099 (11.11% of 2007586) affected shaders:
Instrs: 314077245 -> 316382207 (+0.73%); split: -0.01%, +0.75%
CodeSize: 5335583824 -> 5414345504 (+1.48%); split: -0.06%, +1.54%
Cycle count: 45868025821 -> 45850599069 (-0.04%); split: -0.58%, +0.54%
Spill count: 2062649 -> 2062141 (-0.02%); split: -0.14%, +0.11%
Fill count: 3343019 -> 3334655 (-0.25%); split: -0.76%, +0.51%
Max live registers: 36762498 -> 36789000 (+0.07%); split: -0.02%, +0.09%
Max dispatch width: 5542224 -> 5541936 (-0.01%); split: +0.03%, -0.03%
Non SSA regs after NIR: 43727142 -> 43728093 (+0.00%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> [v1]
Fixes: 1bff4f93ca ("brw: Basic infrastructure to store convergent values as scalars")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41370>
In Gfx9 the enum value was changed to mean SIMD8 double precision, so
drop the old unused enum. At least on Gfx9 there is an extension bit
to set to use the old SIMD4x2 mode, we can recover if we ever need this
in the future.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41457>
FullyCovered will need to know if conservative rasterization is enabled,
so pass it on to the shader.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>
Add a new intrinsic to read the raw shading rate provided to the FS
payload, and lower load_frag_shading_rate in NIR using it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>
We'll need the raw coverage mask provided to the fragment shader in a
future patch.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>
Drivers using blorp on ELK platforms don't need the special
color->depth conversion path that needs 64bit floating point math.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Allen Ballway <ballway@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39341>
Format draw and draw_indexed Perfetto events with their vertex count.
For draw_indirect and draw_indexed_indirect, include the draw count
when indirect tracing is enabled (MESA_GPU_TRACES=indirects), otherwise
fall back to the static name.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41374>
Format compute events as compute(x,y,z) using the end-payload group
dimensions. Trailing dimensions that equal 1 are omitted to keep labels
concise — e.g. compute(128,1,1) becomes compute(128).
For compute_indirect, the dispatch dimensions are not known at command
record time since they live in GPU memory as a VkDispatchIndirectCommand.
The u_trace framework reads them back at trace flush time via the
is_indirect mechanism: the GPU address is recorded alongside the
tracepoint, and u_trace copies the pointed-to struct into indirect_data
once the GPU has finished. The same trailing-1 trimming is applied when
indirect tracing is enabled (MESA_GPU_TRACES=indirects); otherwise the
event falls back to the static "compute_indirect" name.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41374>