Commit graph

7394 commits

Author SHA1 Message Date
Yonggang Luo
bf338c3d7f mesa: #include "util/glheader.h" instead GL/gl.h in shared code
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Brian Paul brianp@vmware.com
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19472>
2022-11-03 16:07:31 +00:00
Yonggang Luo
bfa3ce44a6 mesa: Move glheader.h from mesa/main/glheader.h to util/glheader.h
So it's can be accessed in broader places

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Brian Paul brianp@vmware.com
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19472>
2022-11-03 16:07:31 +00:00
Yonggang Luo
b0016bc36a mesa: Use DEBUG_NAMED_VALUE_END for const struct debug_named_value
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19448>
2022-11-03 14:40:33 +00:00
Karol Herbst
e18512fe88 nir: set range and base for load_kernel_input
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18581>
2022-11-02 23:36:56 +00:00
Illia Abernikhin
aa4ac5ff8b utils: Merge util/debug.* into util/u_debug.* and remove util/debug.*
Rename env_var_as_unsigned() -> debug_get_num_option(), because duplicate
Rename env_var_as_bool() -> debug_get_bool_option(), because duplicate

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7177

Signed-off-by: Illia Abernikhin <illia.abernikhin@globallogic.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19336>
2022-11-02 07:25:39 +00:00
Alyssa Rosenzweig
2a6338722e panfrost: Don't use nir_variable in the compilers
More future proof, simpler, and works with early I/O lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19456>
2022-11-02 04:22:06 +00:00
Kenneth Graunke
fde99747e9 nir: Drop infer_non_readable option for nir_opt_access()
Everybody sets it to true now, and the only reason for the option to
exist was to work around a bug that's now been fixed.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19162>
2022-11-02 03:42:04 +00:00
Alyssa Rosenzweig
45a111c21c nir/opt_algebraic: Fuse c - a * b to FMA
Algebraically it is clear that

   -(a * b) + c = (-a) * b + c = fma(-a, b, c)

But this is not clear from the NIR

   ('fadd', ('fneg', ('fmul', a, b)), c)

Add rules to handle this case specially. Note we don't necessarily want
to  solve this by pushing fneg into fmul, because the rule opt_algebraic
(not the late part where FMA fusing happens) specifically pulls fneg out
of fmul to push fneg up multiplication chains.

Noticed in the big glmark2 "terrain" shader, which has a cycle count
reduced by 22% on Mali-G57 thanks to having this pattern a ton and being
FMA bound.

BEFORE: 1249 inst, 16.015625 cycles, 16.015625 fma, ... 632 quadwords
AFTER: 997 inst, 12.437500 cycles, .... 504 quadwords

Results on the same shader on AGX are also quite dramatic:

BEFORE: 1294 inst, 8600 bytes, 50 halfregs, ...
AFTER: 1154 inst, 8040 bytes, 50 halfregs, ...

Similar rules apply for fabs.

v2: Use a loop over the bit sizes (suggested by Emma).

shader-db on Valhall (open + small subset of closed), results on Bifrost
are similar:

total instructions in shared programs: 167975 -> 164970 (-1.79%)
instructions in affected programs: 92642 -> 89637 (-3.24%)
helped: 492
HURT: 25
helped stats (abs) min: 1.0 max: 252.0 x̄: 6.25 x̃: 3
helped stats (rel) min: 0.30% max: 20.18% x̄: 3.21% x̃: 2.91%
HURT stats (abs)   min: 1.0 max: 5.0 x̄: 2.80 x̃: 3
HURT stats (rel)   min: 0.46% max: 9.09% x̄: 3.89% x̃: 3.37%
95% mean confidence interval for instructions value: -6.95 -4.68
95% mean confidence interval for instructions %-change: -3.08% -2.65%
Instructions are helped.

total cycles in shared programs: 10556.89 -> 10538.98 (-0.17%)
cycles in affected programs: 265.56 -> 247.66 (-6.74%)
helped: 88
HURT: 2
helped stats (abs) min: 0.015625 max: 3.578125 x̄: 0.20 x̃: 0
helped stats (rel) min: 0.65% max: 22.34% x̄: 5.65% x̃: 4.25%
HURT stats (abs)   min: 0.0625 max: 0.0625 x̄: 0.06 x̃: 0
HURT stats (rel)   min: 8.33% max: 12.50% x̄: 10.42% x̃: 10.42%
95% mean confidence interval for cycles value: -0.28 -0.12
95% mean confidence interval for cycles %-change: -6.30% -4.30%
Cycles are helped.

total fma in shared programs: 1582.42 -> 1535.06 (-2.99%)
fma in affected programs: 871.58 -> 824.22 (-5.43%)
helped: 502
HURT: 9
helped stats (abs) min: 0.015625 max: 3.578125 x̄: 0.09 x̃: 0
helped stats (rel) min: 0.60% max: 25.00% x̄: 5.46% x̃: 4.82%
HURT stats (abs)   min: 0.015625 max: 0.0625 x̄: 0.03 x̃: 0
HURT stats (rel)   min: 4.35% max: 12.50% x̄: 6.22% x̃: 4.35%
95% mean confidence interval for fma value: -0.11 -0.08
95% mean confidence interval for fma %-change: -5.58% -4.93%
Fma are helped.

total cvt in shared programs: 665.55 -> 665.95 (0.06%)
cvt in affected programs: 61.72 -> 62.12 (0.66%)
helped: 33
HURT: 43
helped stats (abs) min: 0.015625 max: 0.359375 x̄: 0.04 x̃: 0
helped stats (rel) min: 1.01% max: 25.00% x̄: 6.68% x̃: 4.35%
HURT stats (abs)   min: 0.015625 max: 0.109375 x̄: 0.04 x̃: 0
HURT stats (rel)   min: 0.78% max: 38.46% x̄: 10.85% x̃: 6.90%
95% mean confidence interval for cvt value: -0.01 0.02
95% mean confidence interval for cvt %-change: 0.23% 6.24%
Inconclusive result (value mean confidence interval includes 0).

total quadwords in shared programs: 93376 -> 91736 (-1.76%)
quadwords in affected programs: 25376 -> 23736 (-6.46%)
helped: 169
HURT: 1
helped stats (abs) min: 8.0 max: 128.0 x̄: 9.75 x̃: 8
helped stats (rel) min: 1.52% max: 33.33% x̄: 8.35% x̃: 8.00%
HURT stats (abs)   min: 8.0 max: 8.0 x̄: 8.00 x̃: 8
HURT stats (rel)   min: 25.00% max: 25.00% x̄: 25.00% x̃: 25.00%
95% mean confidence interval for quadwords value: -11.18 -8.11
95% mean confidence interval for quadwords %-change: -8.95% -7.36%
Quadwords are helped.

total threads in shared programs: 4697 -> 4701 (0.09%)
threads in affected programs: 4 -> 8 (100.00%)
helped: 4
HURT: 0
helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for threads value: 1.00 1.00
95% mean confidence interval for threads %-change: 100.00% 100.00%
Threads are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Ol<C5><A1><C3><A1>k <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19312>
2022-11-01 22:39:45 -04:00
Jason Ekstrand
15796bdd0e nir/types: Add some asserts to glsl_get_struct_field()
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19430>
2022-11-01 14:48:41 +00:00
Rhys Perry
e6d26cb288 nir,ac/nir,aco,radv: replace has_input_*_amd with more general intrinsics
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19228>
2022-10-31 14:33:43 +00:00
Francisco Jerez
be6da31034 nir/lower_int64: Implement lowering of 64-bit integer to 64-bit float conversions.
This involves computing the significand with a 64-bit precision type,
and implementing the normalization and packing manually instead of
relying on u2f32, since the significand can no longer be represented
as a 32-bit integer.  This fixes 64-bit integer to 64-bit float
conversions on devices that support 64-bit float natively but lack
64-bit integer support, like Intel MTL hardware.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19128>
2022-10-29 19:45:44 +00:00
Francisco Jerez
29da985682 nir/lower_int64: Enable lowering of 64-bit float to 64-bit integer conversions.
The existing code for this appears to work okay for conversions
involving 64-bit floats, relax the assert and enable the lowering
path.  This fixes 64-bit float to 64-bit integer integer conversions
on devices that have native support for 64-bit floats but lack 64-bit
integer support, like Intel MTL hardware.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19128>
2022-10-29 19:45:44 +00:00
Marek Olšák
0ac37b595a nir: add nir_intrinsic_optimization_barrier_vgpr_amd for LLVM
We need this for the MSAA resolve shader.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19243>
2022-10-29 18:38:33 +00:00
Rob Clark
5d3895d13b nir: Add way to create passthrough TCS without VS nir
In the case of disk-cache hits, radeonsi no longer has the nir shader
around.  So add a way to create a passthrough TCS with just the VS
output locations.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7567
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19382>
2022-10-29 17:46:23 +00:00
Karol Herbst
e58c004870 nir/algebraic: add vec8/16 cmp lowering
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19150>
2022-10-29 10:31:39 +00:00
Karol Herbst
5efbef833a nir/algebraic: generalize vector_cmp lowering
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19150>
2022-10-29 10:31:39 +00:00
Karol Herbst
f27e2234e1 nir/algebraic: support CL vector accessors
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19150>
2022-10-29 10:31:39 +00:00
Karol Herbst
1d6014f267 nir/algebraic: add 8 and 64 bit urol and uror lowering
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19150>
2022-10-29 10:31:39 +00:00
Mykhailo Skorokhodov
f8425e661a glsl/meson: Add variable to export float64.glsl
Signed-off-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18854>
2022-10-28 10:08:50 +00:00
Mykhailo Skorokhodov
4692c66358 nir: Add assert in nir_lower_doubles
Cc: mesa-stable

Signed-off-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18854>
2022-10-28 10:08:50 +00:00
Mykhailo Skorokhodov
e4b7bf1a6d nir: Make lower_double_ops recognize SPIR-V mangling
Signed-off-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18854>
2022-10-28 10:08:50 +00:00
Alyssa Rosenzweig
63320c691a nir/lower_idiv: Inline convert_instr_precise
Now that we only have one convert_instr path, this is simpler.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19303>
2022-10-27 19:37:14 +00:00
Alyssa Rosenzweig
941c37c085 nir/lower_idiv: Remove imprecise_32bit_lowering
NIR has two implementations of lower_idiv, keyed on the
imprecise_32bit_lowering flag. This flag is misleading: the results when
setting this flag "imprecise", they're completely wrong for some values.
If a backend has a native implementation of umul_high, the correct path
isn't that much more expensive. If it doesn't, it's substantially slower
for highp integer divison... but in practice, non-constant highp integer
division is pretty rare.

After a painful migration of the tree, this code path has no more users.
Remove it so nobody else gets the bright idea of using it again.

Closes: #6555
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19303>
2022-10-27 19:37:14 +00:00
Daniel Schürmann
22534e0d1a nir: add AMD RT traversal intrinsics
These I/O intrinsics help to create an enclosed traversal shader.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19188>
2022-10-27 09:45:39 +00:00
Qiang Yu
3d6cce2e4c nir: add two amd ngg lds base load intrinsics
These two values are not known when compile for radeonsi.
They are relocated when link/upload time.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18832>
2022-10-27 07:35:01 +00:00
Dave Airlie
6a29cb2654 nir/lower_bool_to_int32: add support for lowering functions.
Change the function parameters to 32-bit.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19291>
2022-10-26 21:47:29 +00:00
Lionel Landwerlin
117b32a594 nir/divergence_analysis: add missing desc_set_address_intel
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19320>
2022-10-26 21:09:20 +00:00
Lionel Landwerlin
edda5731c0 nir/divergence_analysis: add some missing RT intrinsics
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19320>
2022-10-26 21:09:20 +00:00
Jason Ekstrand
5e05d98848 nir: Unconditionally call nir_trim_vector in nir_lower_readonly_images_to_tex
It will already short-circuit if the number of components matches.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19301>
2022-10-26 17:11:44 +00:00
Jason Ekstrand
d9cf6de4a8 nir: Misc. style fixes to nir_lower_readonly_images_to_tex
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19301>
2022-10-26 17:11:44 +00:00
Jason Ekstrand
b684a603f1 nir: Use nir_shader_instructions_pass in nir_lower_readonly_images_to_tex
nir_shader_lower_instructions is overkill and this makes the pass
generally easier to understand.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19301>
2022-10-26 17:11:44 +00:00
Jason Ekstrand
a3c3d0d287 nir: Reformat a comment
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19301>
2022-10-26 17:11:44 +00:00
Lionel Landwerlin
29da1c8253 nir/lower_shader_calls: run opt_cse after lower stack intrinsics
In particular when using scratch_base_ptr

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
2022-10-26 12:53:25 +00:00
Lionel Landwerlin
3c242e551d nir/lower_shader_calls: move scratch loads closer to where they're needed
The intel backend compiler is not dealing with the scratch loads
emitted by this pass very well. There are 2 reasons for this :

  - all loads are at the top of the shader

  - the loads are global load intrinsics (cannot be differentiated
    from ssbo loads for example)

This leads the backend to generate ridiculous amount of spills.

To help a bit (actually quite a lot), we can move the scratch loads in
the blocks where they're needed, using the dominance information.
Quite often that also ends up moving loads in a block that might not
be reached by all the lanes, so we're potentially avoiding some loads.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
2022-10-26 12:53:25 +00:00
Lionel Landwerlin
5717f13dff nir/lower_shader_calls: add a pass to sort/pack values on the stack
The previous pass shrinking values stored on the stack might have left
some gaps on the stack (a vec4 turned into a vec3 for instance).

This pass reorders variables on the stack, by component bit size and
by ssa value number. The component size is useful to pack smaller
values together. The ssa value number is also important because if we
have 2 calls spilling the same values, then we can avoid reemiting the
spillings if the values are stored in the same location.

v2: Remove unused sorting function (Konstantin)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
2022-10-26 12:53:25 +00:00
Lionel Landwerlin
4cd90ed7bc nir/lower_shader_calls: add a pass to trim scratch values
For example, if we store to scratch a vec4 but only a subset of
components are used after the load operation.

v2: Use nir_intrinsic_write_mask (Konstantin)
    Use u_foreach_bit() instead of u_bit_scan() (Konstantin)
    Fix mask building loop (Konstantin)

v3: Fix reswizzle (Konstantin)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
2022-10-26 12:53:25 +00:00
Lionel Landwerlin
1d10d17817 nir/lower_shader_calls: add an option structure for future optimizations
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
2022-10-26 12:53:25 +00:00
Lionel Landwerlin
d0543bfbec nir/lower_shader_calls: cleanup shaders a bit more post split
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
2022-10-26 12:53:25 +00:00
Lionel Landwerlin
6d7e04d924 nir/lower_shader_calls: add NIR_PASS_V internally
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
2022-10-26 12:53:25 +00:00
Lionel Landwerlin
dc70519c8a nir/lower_shader_calls: rematerialize values in more complex cases
Previously when considering whether to rematerialize or spill/fill
ssa_1954, we would go for a spill/fill :

   vec4 32 ssa_388 = (float32)txf ssa_387 (texture_handle), ssa_86 (coord), ssa_23 (lod), 0 (texture), 0 (sampler)
   ...
   vec1 32 ssa_1953 = load_const (0xbd23d70a = -0.040000)
   vec1 32 ssa_1954 = fadd ssa_388.x, ssa_1953
   vec1 32 ssa_1955 = fneg ssa_1954

This is because when looking at ssa_1955 the first time, we would
consider ssa_388 unrematerialiable, and therefore all values built on
top of it would be considered unrematerialiable as well.

The missing piece when considering whether to rematerialize ssa_1954
is that we should look at filled values. Now that ssa_388 has been
spilled/filled, we can rebuild ssa_1955 on top of the filled value and
avoid spilling/filling ssa_1955 at all.

This requires a bit more work though. We can't just look at an
instruction in isolation, we need to go through the ssa chains until
we find values we can rematerialize or not.

In this change we build a list of all ssa values involved in building
a given value, up to the point there we find a filled or a
rematerializable value.

In this particular case, looking at ssa_1955 :

   * We can rematerialize ssa_388 from its filled value

   * We can rematerialize ssa_1953 trivially

   * We can rematerialize ssa_1954 because its 2 inputs are rematerializable

   * We can rematerialize ssa_1955 because ssa_1954 is rematerializable

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
2022-10-26 12:53:25 +00:00
Lionel Landwerlin
ca2a1340a2 nir/lower_shader_calls: avoid respilling values
Currently we do something like this :

  ssa_0 = ...
  ssa_1 = ...
  * spill ssa_0, ssa_1
  call1()
  * fill ssa_0, ssa_1
  ssa_2 = ...
  ssa_3 = ...
  * spill ssa_0, ssa_1, ssa_2, ssa_3
  call2()
  * fill ssa_0, ssa_1, ssa_2, ssa_3

If we assign the same possition to ssa_0 & ssa_1 in the spilling
stack, then on call2(), we know that those values are already present
in memory at the right location and we can avoid respilling them.

The result would be something like this :

  ssa_0 = ...
  ssa_1 = ...
  * spill ssa_0, ssa_1
  call1()
  * fill ssa_0, ssa_1
  ssa_2 = ...
  ssa_3 = ...
  * spill ssa_2, ssa_3
  call2()
  * fill ssa_0, ssa_1, ssa_2, ssa_3

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
2022-10-26 12:53:25 +00:00
Lionel Landwerlin
5a9f8d21d0 nir/lower_shader_calls: lower scratch access to format internally
For a follow up optimization, we would like to track scratch loads.
This isn't possible with global load/store intrinsics. So use a couple
of special intrinsic in the pass and only lower it to global
intrinsics at the end.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
2022-10-26 12:53:25 +00:00
Lionel Landwerlin
df685b4f9c nir/lower_shader_calls: rematerialize more trivial values
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>
2022-10-26 12:53:25 +00:00
Rhys Perry
382831c986 radv,nir: add intrinsics for streamout and GS copy shaders
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19302>
2022-10-25 17:35:08 +00:00
Qiang Yu
7fb506d068 nir: add nir_load_prim_xfb_query_enabled_amd
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17457>
2022-10-25 12:58:43 +00:00
Qiang Yu
a119a6464f nir,ac,radv: add primitive count add intrinsics
radeonsi use shader buffer, but radv use gds for the query
result storage.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17457>
2022-10-25 12:58:43 +00:00
Qiang Yu
83643e4dc8 nir,ac/nir/ngg,radv: split shader_query_enabled_amd
For used by different counter.

Vulkan:
1. VK_QUERY_PIPELINE_STATISTIC_GEOMETRY_SHADER_PRIMITIVES_BIT,
   sum generated primitives of all 4 streams when GS.
2. VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT, count generated
   primitives for all 4 streams when VS/TES/GS.
3. VK_QUERY_TYPE_TRANSFORM_FEEDBACK_STREAM_EXT, count generated
   and streamout primitives for all 4 streams when VS/TES/GS.

OpenGL:
1. GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED_ARB, sum generated
   primitives for all 4 streams when GS.
2. GL_PRIMITIVES_GENERATED, count generated primitives for all 4
   streams when VS/TES/GS.
3. GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN, count streamout
   primitives for all 4 streams when VS/TES/GS.

pipeline_stat_query_enabled_amd is for Vulkan 1 and OpenGL 1.
xfb_query_enabled_amd is for Vulkan 2/3 and OpenGL 2/3.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19015>
2022-10-25 02:42:52 +00:00
Rob Clark
a8e84f50bc nir: Add helper to create passthrough TCS shader
Based on si_create_passthrough_tcs() as that seemed the most generic of
the various different backend driver implementations.  Uses the
load_tess_level_outer_default and load_tess_level_inner_default
intrinsics to load the gl_TessLevelOuter and gl_TessLevelInner values,
so driver will somehow need to implement those to load the values set
by pipe_context::set_tess_state() or similar.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19259>
2022-10-24 21:39:38 +00:00
Dave Airlie
55d2b82cc0 glsl/types: fix dword slots calc for float16 matricies.
The current uniform query uploader for mat3 calcs things as
if the vector elements are f16vec4 wide, so fix the calcs
here to do the same.

Fixes GTF-GL46.gtf21.GL.mat3.mat3arraysimple_frag on llvmpipe
when 16-bit uniform lowering is allowed.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14817>
2022-10-23 01:43:44 +00:00
Alyssa Rosenzweig
80de33cf6a nir/opt_preamble: Move load_texture_base_agx
nir_opt_preamble will be crucial to optimize out the lowering for array
textures on AGX, which involves this AGX-specific sysval.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18813>
2022-10-22 14:48:04 -04:00