This enables drivers and utils to get all IO information from intrinsics,
so that they don't have to walk the complex types of NIR variables to find
out other information about IO intrinsics.
NIR in/out variables can be removed after nir_lower_io. We could remove
the variables in the pass, but for now I just decided to remove
the variables in radeonsi before shaders are returned to st/mesa.
(st/mesa just needs adjustments to work without NIR in/out variables)
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6442>
Instead of having separate lists of variables, roughly sorted by mode,
use a single list for all shader-level NIR variables. This makes a few
list walks a bit longer here and there but list walks aren't a very
common thing in NIR at all. On the other hand, it makes a lot of things
like validation, printing, etc. way simpler. Also, there are a number
of cases where we move variables from inputs/outputs to globals and this
makes it way easier because we no longer have to move them between
lists. We only have to deal with that if moving them from the shader to
a nir_function_impl.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
This provides an alternate lowering for scratch in which it uses global
reads/writes and bases scratch addresses on a base pointer.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5927>
This way we can avoid some unnecessary conversions because there's no
need to sanitize to 0/1 for scratch.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5927>
No drivers are using this anymore so we can delete it and not keep
maintaining this legacy code-path. If any drivers want this in the
future, they should use nir_lower_varst_to_explicit_types followed by
nir_lower_explicit_io.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5418>
For turnip, we use the "bindless" model on a6xx. Loads and stores with
the bindless model require a bindless base, which is an immediate field
in the instruction that selects between 5 different 64-bit "bindless
base registers", a 32-bit descriptor index that's added to the base, and
the usual 32-bit offset. The bindless base usually, but not always,
corresponds to the Vulkan descriptor set. We can handle the case where
the base is non-constant by using a bunch of if-statements, to make it a
little easier in core NIR, and this seems to be what Qualcomm's driver
does too. Therefore, the pointer format we need to use in NIR has a vec2
index, for the bindless base and descriptor index. Plumb this format
through core NIR.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5683>
Make sure to propagate the NON_UNIFORM access for UBO loads, so
that non-uniform loads are correctly lowered.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5311>
By inserting a b2b1 around the load_ubo, load_input, etc. intrinsics
generated by nir_lower_io, we can ensure that the intrinsic has the
correct destination bit size. Not having the right size can mess up
passes which try to optimize access. In particular, it was causing
brw_nir_analyze_ubo_ranges to ignore load_ubo of booleans which meant
that booleans uniforms weren't getting pushed as push constants. I
don't think this is an actual functional bug anywhere hence no CC to
stable but it may improve perf somewhere.
Shader-db results on ICL with iris:
total instructions in shared programs: 16076707 -> 16075246 (<.01%)
instructions in affected programs: 129034 -> 127573 (-1.13%)
helped: 487
HURT: 0
helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.45% max: 3.00% x̄: 1.33% x̃: 1.36%
95% mean confidence interval for instructions value: -3.00 -3.00
95% mean confidence interval for instructions %-change: -1.37% -1.29%
Instructions are helped.
total cycles in shared programs: 338015639 -> 337983311 (<.01%)
cycles in affected programs: 971986 -> 939658 (-3.33%)
helped: 362
HURT: 110
helped stats (abs) min: 1 max: 1664 x̄: 97.37 x̃: 43
helped stats (rel) min: 0.03% max: 36.22% x̄: 5.58% x̃: 2.60%
HURT stats (abs) min: 1 max: 554 x̄: 26.55 x̃: 18
HURT stats (rel) min: 0.03% max: 10.99% x̄: 1.04% x̃: 0.96%
95% mean confidence interval for cycles value: -79.97 -57.01
95% mean confidence interval for cycles %-change: -4.60% -3.47%
Cycles are helped.
total sends in shared programs: 815037 -> 814550 (-0.06%)
sends in affected programs: 5701 -> 5214 (-8.54%)
helped: 487
HURT: 0
LOST: 2
GAINED: 0
The two lost programs were SIMD16 shaders in CS:GO. However, CS:GO was
also one of the most helped programs where it shaves sends off of 134
programs. This seems to reduce GPU core clocks by about 4% on the first
1000 frames of the PTS benchmark.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
This introduces a new NIR intrinsic for loading inputs at a specific
vertex index.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3578>
From the SPV_AMD_shader_explicit_vertex_parameter extension:
"Returns the value of the input <interpolant> without any
interpolation, i.e. the raw output value of previous shader
stage."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3578>
Used for address/offset calculation (ie. array derefs), where we can
potentially use less than 32b for the multiply of array idx by element
size. For backends that support `imul24`, this gives a lowering pass
an easy way to find multiplies that potentially can be converted to
`imul24`.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Load a 32-bit value then convert to 1-bit. Convert 1-bit to 32-bit
value, then Store it.
These cases started to appear when we changed Anvil to use derefs for
shared memory.
v2: Use `bit_size` in a couple of places we were missing. (Jason)
Reassign `value` instead of `src[0]`. (Jason)
Fixes: 024a46a407 ("anv: use derefs for shared memory access")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: use glsl_type_size_align_func
v2: move get_explicit_type() to glsl_types.cpp/nir_types.cpp
v2: use align() instead of util_align_npot()
v2: pack arrays a bit tighter
v2: rename mem_* to field_*
v2: don't attempt to handle when struct offsets are already set
v2: use column_type() instead of recreating it
v2: use a branch instead of |= in nir_lower_to_explicit_impl()
v2: assign locations to variables and update shared_size and num_shared
v2: allow the pass to be used with nir_var_{shader_temp,function_temp}
v4: rebase
v5: add TODO
v5: small formatting changes
v5: remove incorrect assert in get_explicit_type()
v5: rename to nir_lower_vars_to_explicit_types
v5: correctly update progress when only variables are updated
v5: rename get_explicit_type() to get_explicit_shared_type()
v5: add comment explaining how get_explicit_shared_type() is different
v5: update cast strides
v6: update progress when lowering nir_var_function_temp variables
v6: formatting changes
v6: add more detailed documentation comment for get_explicit_shared_type
v6: rename get_explicit_shared_type to get_explicit_type_for_size_align
v7: fix comment in nir_lower_vars_to_explicit_types_impl()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> (v5)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: require nir_address_format_32bit_offset instead
v3: don't call nir_intrinsic_set_access() for shared atomics
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I can't find a single place where nir_lower_io is called after going out
of SSA which is the only real reason why you wouldn't do this. Returning
SSA defs is more idiomatic and is required for the next commit.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Drivers only use lower_io for modes where pointers don't have a
meaningful value, and dereferences can always be traced back to a
variable. But there can be other modes, like global mode with
VK_EXT_buffer_device_address, where pointers cannot be traced back to a
variable, and lower_io would segfault on loads/stores of these since
nir_deref_instr_get_variable() would return NULL.
Just use the mode on the deref itself to filter out these modes before
we try to get the variable.
Fixes: 118a66df99 ("radv: Use NIR barycentric coordinates")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Pretty much every driver using nir_lower_io_to_temporaries followed by
nir_lower_io is going to want this. In particular, radv and radeonsi in
the next commits.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2: Fix comparing addresses from formats that have more than one
component by using nir_ball_iequal(). (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This type information will be used by gather_ssa_types to get usable results
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Returns the nir_const_value * with the representation of the NULL
pointer for each address format.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is a simple 32-bit address which is not a global address. Gives
us a format that don't use 0 as its null pointer value. We will need
this in anv to represent nir_var_mem_shared addresses.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
An address format representing a purely logical addressing model. In
this model, all deref chains must be complete from the dereference
operation to the variable. Cast derefs are not allowed. These
addresses will be 32-bit scalars but the format is immaterial because
you can always chase the chain. E.g. push constants in anv.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This commit adds new nir_load/store_scratch opcodes which read and write
a virtual scratch space. It's up to the back-end to figure out what to
do with it and where to put the actual scratch data.
v2: Drop const_index comments (by anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
We will need them for a new ACCESS_NON_UNIFORM flag that's about to be
added in the next commit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It's just a 32-bit index and offset. We're going to want to use it in
GL as well so stop talking about Vulkan.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>