needed by GLSL compiler optimizations that have unlowered sysvals
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28049>
This intrinsic helps to read the W coordinate stored in the QPU register
when initializing the input data for the fragment shaders.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28072>
If the app requests a swizzle on the shadow sampler which doesn't just
return the red channel or literal 0s/1s, we'll crash attempting to build
the result vector. Use something that's probably valid.
Cc: mesa-stable
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28001>
These don't seem useful, since they're already done in the early optimizations.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27335>
These depend on the subgroup invocation id, so they are divergent.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: df86c5ffb3 ("nir: add divergence analysis pass.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27962>
These will be used in future to do more validation on functions as
the glsl nir linker is expanded. The first use is in the following
patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27841>
Here we remove the special handling for input params that was hard
to work with and unite it with the output and inout params.
Here a mediump test needs to be updated to what is a more expected
outcome anyway.
We also need to update the code that inserts software f64 to the
new way input params are handled.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27108>
No shader-db changes on any Intel platform.
fossil-db:
All Ice Lake and newer platforms had similar results. (Ice Lake)
Totals:
Instrs: 165513303 -> 165511820 (-0.00%)
Cycles: 15125314947 -> 15125211500 (-0.00%); split: -0.00%, +0.00%
Totals from 82 (0.01% of 656120) affected shaders:
Instrs: 544627 -> 543144 (-0.27%)
Cycles: 22616493 -> 22513046 (-0.46%); split: -0.46%, +0.00%
No fossil-db changes on Gfx9.
Suggested-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
This adds optimizations for iadd, fadd, and ixor with reduce,
inclusive scan, and exclusive scan.
NOTE: The fadd and ixor optimizations had no shader-db or fossil-db
changes on any Intel platform.
NOTE 2: This change "fixes" arb_compute_variable_group_size-local-size
and base-local-size.shader_test on DG2 and MTL. This is just changing
the code path taken to not use whatever path was not working properly
before.
This is a subset of the things optimized by ACO. See also
https://gitlab.freedesktop.org/mesa/mesa/-/issues/3731#note_682802. The
min, max, iand, and ior exclusive_scan optimizations are not
implemented.
Broadwell on shader-db is not happy. I have not investigated.
v2: Silence some warnings about discarding const.
v3: Rename mbcnt to count_active_invocations. Add a big comment
explaining the differences between the two paths. Suggested by Rhys.
shader-db:
All Gfx9 and newer platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20300384 -> 20299545 (<.01%)
instructions in affected programs: 19167 -> 18328 (-4.38%)
helped: 35 / HURT: 0
total cycles in shared programs: 842809750 -> 842766381 (<.01%)
cycles in affected programs: 2160249 -> 2116880 (-2.01%)
helped: 33 / HURT: 2
total spills in shared programs: 4632 -> 4626 (-0.13%)
spills in affected programs: 206 -> 200 (-2.91%)
helped: 3 / HURT: 0
total fills in shared programs: 5594 -> 5581 (-0.23%)
fills in affected programs: 664 -> 651 (-1.96%)
helped: 3 / HURT: 1
fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165551893 -> 165513303 (-0.02%)
Cycles: 15132539132 -> 15125314947 (-0.05%); split: -0.05%, +0.00%
Spill count: 45258 -> 45204 (-0.12%)
Fill count: 74286 -> 74157 (-0.17%)
Scratch Memory Size: 2467840 -> 2451456 (-0.66%)
Totals from 712 (0.11% of 656120) affected shaders:
Instrs: 598931 -> 560341 (-6.44%)
Cycles: 184650167 -> 177425982 (-3.91%); split: -3.95%, +0.04%
Spill count: 983 -> 929 (-5.49%)
Fill count: 2274 -> 2145 (-5.67%)
Scratch Memory Size: 52224 -> 35840 (-31.37%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
This is divergent because it specifically loads sequential values into
successive SIMD lanes.
No shader-db or fossil-db changes on any Intel platform.
Fixes: 9f44a26462 ("nir/divergence: handle load_global_block_intel")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
This allow us to invoke the quad helper.
v2: (Georg)
- Add check for is_gather_implicit_lod
Fixes: 48158636bf ("nir: add is_gather_implicit_lod")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>
1. Mesh shaders don't have inputs (only task payload),
so remove them from handling load_input.
2. Clarify in comments that loading any mesh shader
output is an NV_mesh_shader only feature.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27680>
load_patch_vertices_in can only occur in tessellation shaders,
and contains the number of vertices in an input patch.
* TCS: patch_vertices_in is equal to the input patch size
* TES: patch_vertices_in is equal to the TCS output patch size
The patch sizes may be set by a pipeline or dynamic states,
however in both cases it is definitely uniform within a subgroup.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27680>
By accident, the function would return without setting
the divergence information.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27680>
even with shobjs, we know the class of topology statically, so we just need to
select between the (up to) 3 compatible topologies, and luckily there are common
subexpressions we can factor out when calculating all 3 at once.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27616>
GL semantics. GLES (weaker) and VK (stronger) semantics are left as a todo, with
explanations given. Enabled always to deal with null VBOs, this should be
optimized once we have soft fault.
This necessitates a rework of VBO keys, but hopefully for the best.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27616>
This implements a rough skeleton of what's needed for tessellation. It contains
the relevant lowerings to merge the VS and TCS, running them as a compute
kernel, and to lower the TES to a new VS (possibly merged in with a subsequent
GS). This is sufficient for both standalone tessellation and tess + geom/xfb
together. It does not yet contain a GPU accellerated tessellator, simply falling
back to the CPU for that for now. Nevertheless the data structures are
engineered with that end goal in mind, in particular to be able to tessellate
all patches in parallel without needing any prefix sums etc (using simple
watermark allocation for the heap).
Work on fleshing out the skeleton continues in parallel. For now, this does pass
the tests and lets the harder stuff get regression tested more easily. And
merging early will ease rebase.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27616>
this logic relies on constant indexing for compact arrays, but this is
frequently not the case for compact array builtins (e.g., gl_TessLevelOuter).
the usual strategy of lowering to temps isn't viable in TCS, which means
io lowering has to be able to handle indirect access to these builtins
without crashing
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27534>
Add a separate pass which uses the analyze_ubo_ranges machinery to
construct ranges of readonly globals accessed in the shader and push
them to constants in the preamble, using ldg.k if possible. This is
enough to handle inline uniforms in turnip but also provides a base for
OpenCL, although the pass would need further work for that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>