This fixes a page fault when nr_samples=4 but nr_storage_samples=2.
Based on si_is_format_supported this is only supported for color
formats and when has_eqaa_surface_allocator is true (< GFX11).
The referenced commit below didn't introduce the issue but it
exposed it by forcing the gfx blit path to be used.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13255
Fixes: 3424e16ece ("radeonsi: add decision code to select when to use CB_RESOLVE for performance")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38925>
It displays the renderer string and the PCIe bus info.
It's not a real graph because hud_graph is built to draw
numbers and 'dev' is the only use case so far where we
just want to draw a string.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38925>
This makes the layout of "fps,cpu" identical to "fps,stdout,cpu".
Without this change, the ',' separator after 'stdout' would increase
y and we would have a gap between the fps and cpu graphs.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38925>
The steering bits tell the GPU which caches to invalidate on the
subsequent uniform state writes. There is no point in writing
those steering bits when there are no uniforms to emit.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38998>
We leave GS pushed inputs using load_per_vertex_input for now - they're
relatively simple, and using load_attribute_payload doesn't work well
since it's assumed to be convergent (for TES, FS inputs) while GS inputs
are divergent.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38990>
We're going to be deciding on push vs. pull in the NIR lowering pass
soon, so move the code to limit our register usage from brw's thread
payload code to brw_nir_lower_gs_inputs().
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38990>
This reduces the number of initialized VGPRs by 1 when no barycentric
coordinates are used.
I have verified with zink that this indeed increases performance for
cases where sysvals like frag_coord and front_face are used without
interpolated PS inputs.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38936>
When no barycentric VGPRs are needed, we always enabled one of the pairs
(e.g. PERSP_SAMPLE_ENA) because it's a HW requirement. However,
the requirement says that LINE_STIPPLE_TEX_ENA can be enabled instead,
which occupies only 1 VGPR.
To get maximum pixel throughput, we can only have 2 initialized VGPRs
at most. By reducing initialized VGPRs from 2 (with PERSP_SAMPLE_ENA) to 1
(with LINE_STIPPLE_TEX_ENA), we can have 1 additional initialized VGPR
for free with maximum pixel throughput, such as POS_FIXED_PT for
frag_coord.xy without MSAA.
Only ACO gets this perf improvement because the change would be more
complicated with LLVM.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38915>
This game sets the reset isolation bit which causes the GL context
creation to fail as Mesa doesn't support the
GLX_ARB_robustness_application_isolation extension. Here we override
and clear the bit.
According to the spec says:
"The GLX_ARB_robustness_application_isolation and
GLX_ARB_robustness_share_group_isolation extensions do not provide
guarantees for graphics resets caused by applications which did
not create their contexts with both the LOSE_CONTEXT_ON_RESET_ARB
reset notification strategy and the
GLX_CONTEXT_RESET_ISOLATION_BIT_ARB bit."
And the game doesn't set LOSE_CONTEXT_ON_RESET_ARB so technically
we could ignore the reset isolation bit even if Mesa did support
the extension.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13336
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38668>
This allows us to override and clear the reset isolation bit.
It will be used in the following patch to override missing support
for GLX_CONTEXT_RESET_ISOLATION_BIT_ARB.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38668>
A in-place resolve via the BLT engine is only supposed to fill the
tiles of a single layer of a resource, so the size to calculate the
number of tiles is the layer stride, same as done for the in-place
resolve via the RS engine in
8df11f3fad ("etnaviv: fix in-place resolve tile count.")
CC: mesa-stable
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39005>
Previously, we would spill at the NIR level any temp array over 16 vec4s.
This had two problems:
1) We wouldn't spill for the worst case scenario: a MAD accessing a dst
array and 3 different src arrays (that all get fully unspilled, rather
than just reloading the specific reg in the operand). This would fail to
register allocate. We haven't seen this in practice.
2) We would spill vec4[17] and larger arrays that weren't necessary to get
the shader to register allocate. This occurred on a FS for in Stray that
had a vec4[24] array and just 4 vec4s of register pressure other than the
array.
Instead, use NIR scratch spilling when the worst case set of vars to
reference in an instruction would overflow GPR space. This makes the
shader in Stray go from 11ms to .5ms, by eliminating all spilling and
leaving the array in GPRs. On the other hand, if leaving the arrays
unspilled in NIR means that we cause spilling in ir3, the fact that ir3
spills/reloads work on the whole array may cause the amount of spilling to
increase. However, we can see the effect is very small in terms of number
of shaders affected in shader-db and an overwhelmingly positive effect on
spills:
MaxWaves: 22522470 -> 22520664 (-0.01%)
Instrs: 396093281 -> 396122221 (+0.01%); split: -0.00%, +0.01%
STPs: 218915 -> 182907 (-16.45%)
LDPs: 155374 -> 153364 (-1.29%); split: -2.79%, +1.50%
Totals from 496 (0.03% of 1561298) affected shaders:
MaxWaves: 3792 -> 1986 (-47.63%)
Instrs: 441224 -> 470164 (+6.56%); split: -0.00%, +6.57%
CodeSize: 926164 -> 976734 (+5.46%); split: -0.05%, +5.52%
NOPs: 58896 -> 52765 (-10.41%); split: -14.95%, +4.60%
MOVs: 16314 -> 57901 (+254.92%)
COVs: 3293 -> 5146 (+56.27%)
Full: 12876 -> 23632 (+83.54%)
(ss): 18613 -> 11573 (-37.82%); split: -47.53%, +9.71%
(sy): 2539 -> 2505 (-1.34%); split: -10.75%, +9.41%
(ss)-stall: 40682 -> 26413 (-35.07%); split: -47.90%, +12.80%
(sy)-stall: 147862 -> 117004 (-20.87%); split: -37.65%, +16.69%
STPs: 38566 -> 2558 (-93.37%)
LDPs: 5060 -> 3050 (-39.72%); split: -85.77%, +45.93%
Cat0: 65593 -> 59487 (-9.31%); split: -13.42%, +4.15%
Cat1: 19667 -> 63105 (+220.87%)
Cat2: 155958 -> 157879 (+1.23%); split: -0.05%, +1.28%
Cat6: 105228 -> 94910 (-9.81%); split: -12.36%, +2.54%
Cat7: 2480 -> 2485 (+0.20%); split: -0.08%, +0.28%
Subgroup size: 31872 -> 31744 (-0.40%)
The primary impacted application from shader-db is gfxbench aztec ruins.
A quick test of it showed no significant performance improvement (n=3).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245>
This is mostly a s/dyn ShaderModel/ShaderModelInfo/ with a few manual fixes.
With this change, we now statically dispatch into ShaderModel, which is
a bit faster than dynamically dispatching. Together, this commit and the
last one improve compile times by about 1% geomean.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38913>
Because the tracked registers are really driver dependant, the driver
is expected to handle the tracked_registers struct itself.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740>
Both PVR and PanVK are drivers for generic embedded GPU IP cores, so
just take the can_present_on_device implementation from PanVK, which
allows any platform devices for presentation.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38985>