This avoids some performance regressions on Gen12 platforms caused by
SIMD32 fragment shaders reported in titles like Dota2, TF2, Xonotic,
and GFXBench5 Car Chase and Aztec Ruins.
The most obvious pattern in the regressing shaders I identified among
these workloads is that they all had non-uniform discard statements,
which are handled rather optimistically by the current IR analysis
pass: No penalty is currently applied to the SIMD32 variant of the
shader in the form of differing branching weights like we do for other
control flow instructions in order to account for the greater
likelihood of divergence of a SIMD32 shader.
Simply changing that by giving the same treatment to discard
statements as we give to other branching instructions seemed to hurt
more than it helped on platforms earlier than Gen12, since it reversed
most of the improvement obtained from SIMD32 fragment shaders in
Manhattan for no measurable benefit in other workloads (Manhattan has
a handful of shaders with statically non-uniform discard statements
which actually perform better in SIMD32 mode due to their approximate
dynamic uniformity). For that reason this change is applied to Gen12+
platforms only.
I've been running a number of tests trying to understand the
difference in behavior between Gen12 and earlier platforms, and most
of the evidence I've gathered seems to point at EU fusion being the
culprit: Unlike previous generations, on Gen12 EUs are arranged in
pairs which execute instructions in lockstep, giving an effective warp
size of 64 threads in SIMD32 mode, which seems to increase the
likelihood for control flow divergence in some of the affected shaders
significantly.
Fixes: 188a3659ae "intel/ir: Import shader performance analysis pass."
Reported-by: Caleb Callaway <caleb.callaway@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5910>
tgsi_exec.c uses the generic src load path for indirects, so we don't
actually need addr regs. Saves extra intructions.
shader-db results:
total instructions in shared programs: 3346685 -> 3249052 (-2.92%)
instructions in affected programs: 961832 -> 864199 (-10.15%)
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6018>
The tgsi_exec path can handle it, and otherwise when we start using NIR
our MAX_VARYINGS value will cause us to have VARYING_SLOT_VARx above the
maximum.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6018>
In doing the softpipe NIR and NIR-to-TGSI transition, I want to make sure
I don't make shaders significantly worse, so I need shader-db output.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6018>
This makes us more like other drivers, and avoids having tons of different
names (particularly when you want to dump vs and fs in debugging). In the
process, having a debug flag for vertex shaders just falls out.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6018>
Turning on robust buffer access enables GLES 3.2, also
finished GL 4.3 support.
The post depth coverage fail is expected, it's a test bug
This also introduce a fail in the invalid flag test that I can't reproduce out of CI.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5971>
TGSI expect vec4 of constants for it's current code paths, and when
doing indirect accesses it does the comparison on vec4 indexes,
however NIR does the indexing on packed float indexes.
This also align the compute path with the other shaders, and
should improve robustness (at least under Vulkan)
Fixes:
KHR-NoContext.gl43.robust_buffer_access_behavior.uniform_buffer
v1.1:
rename variable to something more meaningful (Roland)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5971>
This provides an alternate lowering for scratch in which it uses global
reads/writes and bases scratch addresses on a base pointer.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5927>
This way we can avoid some unnecessary conversions because there's no
need to sanitize to 0/1 for scratch.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5927>
u_screen will return 0 for all of these, which means that this is one
less driver to see in git grep when I'm checking who exposes a cap.
The exception is the texel/gather offsets and stream output
components, which will not be exposed since we don't expose the
corresponding GLSL version.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3298>
u_screen will return 0 for all of these, which means that this is one
less driver to see in git grep when I'm checking who exposes a cap.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3298>
Since this is a software rasterizer, we really don't care whether the
buffers are "mapped" since it's just malloc. This will drop a bit of
pointless CPU overhead to throw errors.
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3298>
u_screen will return 0 for all of these, which means that this is one
less driver to see in git grep when I'm checking who exposes a cap.
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3298>
u_screen will return 0 for all of these, which means that this is one
less driver to see in git grep when I'm checking who exposes a cap.
Reviewed-by Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3298>
Since this is a software rasterizer, we really don't care whether the
buffers are "mapped" since it's just malloc. This will drop a bit of
pointless CPU overhead to throw errors.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3298>
u_screen will return 0 for all of these, which means that this is one
less driver to see in git grep when I'm checking who exposes a cap.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3298>
Since this is a software rasterizer, we really don't care whether the
buffers are "mapped" since it's just malloc. This will drop a bit of
pointless CPU overhead to throw errors.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3298>
u_screen will return 0 for all of these, which means that this is one
less driver to see in git grep when I'm checking who uses a cap.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3298>
When the lava files were moved out of the container, this stopped
working which caused the traces job for Freedreno to not run any traces
at all.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Fixes: dcd171f5e9 ("gitlab-ci: More stable URL for kernel and ramdisk artifacts, for LAVA")
Acked-by: Andres Gomez <agomez@igalia.com>
Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6021>
- Execute cs_preamble_state as a separate IB with different flags.
- Set the PREEMPT flag for the main IB.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5798>