We set need_check_render_feedback for fragment shader only:
69ad0fc61e ("radeonsi: only set need_check_render_feedback if binding textures for PS")
This helps adding mesh shader to not involve other geometry
stages so we can do it in either vertex or mesh pipeline.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35931>
Series shows improvement on
TotalWarPharaoh-trace-dx11-1440p-ultra-n=2080 title by 0.96% (not a lot
but still it's improvement, so will take that.)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35904>
The recommended settings is just a guidance and not a programming
requirement as per the Bspec.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35904>
It was previously hardcoded since the switch to nvk_mem_arena.
Fixes: 9e52e296f7 ("nvk/heap: Use an nvk_mem_arena")
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36059>
It was only added to indirect compute walkers while HSD don't say
anything about this optimization be specific to indirect compute
walkers.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36058>
It was only added to indirect compute walkers while HSD don't say
anything about this optimization be specific to indirect compute
walkers.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36058>
It was only added to indirect compute walkers while HSD don't say
anything about this optimization be specific to indirect compute
walkers.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36058>
Improves average FPS across a set of 63 android and GL-with-zink traces by
1.9% (+/- 0.1%). If we assume that SpaceEngine (most-improved traces by a
significant margin) is just an outlier, it still shows a .4% improvement.
Closes: #12747
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35759>
It's a feature of the compiled shader that affects how it executes, but
it's not present in the binary itself. Needed for debug tooling looking
into the effects of double_threadsize.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35759>
hardware seems to sign extend with a signed comparison, which I guess is
reasonable! so our logic was busted if we had a zero-extend source with a signed
comparison. this broke someone's OpenCL app, and could probably be hit from
GLES/Vulkan too...
on fossil-db, only parallel-rdp affected:
Totals from 312 (0.58% of 53701) affected shaders:
Instrs: 404772 -> 405697 (+0.23%); split: -0.01%, +0.24%
CodeSize: 2863314 -> 2868998 (+0.20%); split: -0.01%, +0.21%
Spills: 40239 -> 40286 (+0.12%); split: -0.02%, +0.14%
Fills: 33763 -> 33810 (+0.14%); split: -0.03%, +0.17%
ALU: 290757 -> 291071 (+0.11%); split: -0.02%, +0.13%
FSCIB: 261844 -> 262652 (+0.31%); split: -0.02%, +0.33%
IC: 230312 -> 230336 (+0.01%); split: -0.01%, +0.02%
GPRs: 24656 -> 24648 (-0.03%); split: -0.05%, +0.02%
Reported-by: RowanG
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35949>
this is more explicit than vec2's and hence has fewer footguns. in particular
it's easier to handle with preambles in a sane way.
modelled on what ir3 does.
there's probably room for more clean up but for now this unblocks what I want to
do.
stats don't seem concerning.
Totals from 692 (1.29% of 53701) affected shaders:
MaxWaves: 441920 -> 442112 (+0.04%)
Instrs: 1588748 -> 1589304 (+0.03%); split: -0.05%, +0.08%
CodeSize: 11487976 -> 11491620 (+0.03%); split: -0.04%, +0.07%
ALU: 1234867 -> 1235407 (+0.04%); split: -0.06%, +0.10%
FSCIB: 1234707 -> 1235249 (+0.04%); split: -0.06%, +0.10%
IC: 380514 -> 380518 (+0.00%)
GPRs: 117292 -> 117332 (+0.03%); split: -0.08%, +0.11%
Preamble instrs: 314064 -> 313948 (-0.04%); split: -0.05%, +0.01%
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35949>
it is sometimes useful to turn lowered bindless intrinsics into bound or vice
versa, and it is annoying to do so without this helper, so generalize the
helper.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Job Noorman <job@noorman.info>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35949>
Class represents an indexed "ideal" register class, where non-general classes
only allow defs that choose that class in the def_size callback.
nir_opt_preamble will try to assign specialized classes where possible, falling
back to the general class once the special-purpose classes are exhausted.
AGX will use this mechanism to promote bindless texture handles to bound texture
registers where possible, falling back to pushing the handle as a uniform where
not possible. Supporting multiple classes in nir_opt_preamble allows this
multi-level hoisting to work in a single nir_opt_preamble call with proper
global behaviour.
Add this concept to nir_opt_preamble so we can use it in AGX later in this MR.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Job Noorman <job@noorman.info>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35949>
That farm is not coming back any time soon, so let's just disable the
jobs to avoid having to keep them working while refactoring; they'll
have to be mostly re-written if/when they're brought back anyway.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36024>
This hasn't been reproducible because RADV and GLSL always lower
non-constant slot and vertex indexing of GS inputs, but we'll stop
lowering it.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This reduces GLSL compile times with the gallium noop driver by 0.6%.
This might decrease register usage and do less code reordering because
nir_lower_io_vars_to_temporaries is no longer called for inputs, which
moved most input loads to the top.
radeonsi+ACO shader-db results are noise.
More uniforms are identified as inlinable.
TOTALS FROM ALL SHADERS (58138):
VGPRs: 2152680 -> 2158032 (0.25 %)
Code Size: 71008908 -> 71064812 (0.08 %) bytes
Max Waves: 916943 -> 916924 (-0.00 %)
Inline Uniforms: 6395 -> 6414 (0.30 %)
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This must be done before the GLSL compiler stops using
nir_lower_io_vars_to_temporaries for inputs to work around an LLVM bug.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This can be used to move input loads to top after we stop using
nir_lower_io_vars_to_temporaries that does it unconditionally.
It's more flexible than what nir_lower_io_vars_to_temporaries was doing,
and can be extended to handle any instructions.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This is a partial replacement for nir_lower_io_vars_to_temporaries.
It supports all input and output loads. It doesn't handle stores.
The motivation is to improve compile times.
The main differences compared to nir_lower_io_vars_to_temporaries are:
- it only lowers indirect loads to temps and doesn't touch direct loads
which improves compile times and removes the need for nir_lower_vars_to_ssa
afterward because indirect temp access can't be lowered to SSA
- it doesn't move all input loads to the top; it only moves those input
loads to the top whose indirect loads are lowered (which improves
register usage because direct loads are not moved)
- it doesn't have to deal with complexities of variables
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>