Add an option to nir_lower_gs_intrinsics so that it can also track
the number of emitted vertices per primitive, not just the total
vertex count.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
Add an option to nir_lower_gs_intrinsics which tells it to track
the number of emitted primitives, not just vertices. Additionally,
also make it per-stream.
Also rename the set_vertex_count intrinsic to
set_vertex_and_primitive_count.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
This new intrinsic is capable of handling the full range of conversions
from OpenCL including rounding modes and possible saturation. The
intention is that we'll emit this intrinsic directly from spirv_to_nir
and then lower it to ALU ops later.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945>
We're about to introduce conversion ops which are going to want two
different types. We may as well just split the one we have rather than
end up with three. There are a couple places where this is mildly
inconvenient but most of the time I find it to actually be nicer.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945>
This makes it explicit that this intrinsic is only for SSBOs. For the
v3dv driver, we'll be adding a get_ubo_size intrinsic and we want to be
able to distinguish between the two.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6812>
For UBO accesses to be the same performance as classic GL default uniform
block uniforms, we need to be able to push them through the same path. On
freedreno, we haven't been uploading UBOs as push constants when they're
used for indirect array access, because we don't know what range of the
UBO is needed for an access.
I believe we won't be able to calculate the range in general in spirv
given casts that can happen, so we define a [0, ~0] range to be "We don't
know anything". We use that at the moment for all UBO loads except for
nir_lower_uniforms_to_ubo, where we now avoid losing the range information
that default uniform block loads come with.
In a departure from other NIR intrinsics with a "base", I didn't make the
base an be something you have to add to the src[1] offset. This keeps us
from needing to modify all drivers (particularly since the base+offset
thing can mean needing to do addition in the backend), makes backend
tracking of ranges easy, and makes the range calculations in
load_store_vectorizer reasonable. However, this could definitely cause
some confusion for people used to the normal NIR base.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6359>
This commit adds support for nir_var_mem_constant various places. It
also adds a pass similar to nir_lower_vars_to_explicit_types except it
also scrapes out the constants and stuffs them into constant_data.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6379>
This has the same semantics as load_global except the memory it reads is
known to be constant so load_global_constant intrinsics can be CSEd
rather than relying on more complex copy-propagation.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6379>
When using nir_lower_interpolation, we need to propagate the IO
semantics from the load_interpolated_input to the new
load_fs_input_interp_deltas intrinsics. nir_lower_io assumes
they will be filled out.
This fixes assertions in most tests on iris since commit
01ab308edc, where nir_lower_io
started reading this field.
Fixes: 01ab308edc ("nir: update IO semantics in nir_io_add_const_offset_to_base")
Fixes: 502abfce7f ("nir: save IO semantics in lowered IO intrinsics")
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6450>
This enables drivers and utils to get all IO information from intrinsics,
so that they don't have to walk the complex types of NIR variables to find
out other information about IO intrinsics.
NIR in/out variables can be removed after nir_lower_io. We could remove
the variables in the pass, but for now I just decided to remove
the variables in radeonsi before shaders are returned to st/mesa.
(st/mesa just needs adjustments to work without NIR in/out variables)
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6442>
This gets us fewer comparisons in the shaders that we need to optimize
back out, and reduces backend code.
total instructions in shared programs: 11547270 -> 7219930 (-37.48%)
total full in shared programs: 334268 -> 319602 (-4.39%)
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6378>
This is very common for backends -- r600, freedreno, and nir_to_tgsi all
needed versions of it. Make a common intrinsic to use for it with a
shared, slightly-tuned-from-ir3 lowering pass.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6378>
New intrinsics are added for global invocation IDs and work group IDs to
deal with offsets in both. The only one of these that needs a system value
is global invocation offset, for CL's get_global_offset().
Note that CL requires very large work group sizes, so these intrinsics
are modified to be able to use 64bit values, for 64bit SPIR-V.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5891>
The OpenCL image_width/height/depth functions have variants which can
take an LOD parameter. More importantly, LLVM-SPIRV-Translator always
generates OpImageQuerySizeLod even if the LOD is guaranteed to be zero.
Given that over half the hardware out there has an LOD field for image
size queries (based on a rudimentary scan through their NIR -> whatever
code), we may as well just add the source to the NIR intrinsic. If this
is ever a problem for anyone, the lowering is pretty trivial.
I've also added asserts to everyone's drivers that should alert them if
they ever see an LOD other than zero. This will never happen with GL or
Vulkan so there's no need for panic.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6396>
This provides an alternate lowering for scratch in which it uses global
reads/writes and bases scratch addresses on a base pointer.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5927>
EXT_shader_framebuffer_fetch requires the fetched value to be per-sample, so we
need to load the sample id when in a fragment shader.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5930>
The first intrinsic is intended to expose the value set by glLineWidth
to shaders internally. The second intrinsic exposes the value actually
sent to the hardware. This may be wider than the first one in order to
implement anti-aliasing. These will be used in later patches to
implement a line smoothing lowering pass.
v2: Add a second intrinsic for the expanded line width for
anti-aliasing.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>
The line coord is a coordinate along the axis perpendicular to the line.
It is in the range [0,1] between the two edges of the line. It is
available at least on Broadcom hardware.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>
For turnip, we use the "bindless" model on a6xx. Loads and stores with
the bindless model require a bindless base, which is an immediate field
in the instruction that selects between 5 different 64-bit "bindless
base registers", a 32-bit descriptor index that's added to the base, and
the usual 32-bit offset. The bindless base usually, but not always,
corresponds to the Vulkan descriptor set. We can handle the case where
the base is non-constant by using a bunch of if-statements, to make it a
little easier in core NIR, and this seems to be what Qualcomm's driver
does too. Therefore, the pointer format we need to use in NIR has a vec2
index, for the bindless base and descriptor index. Plumb this format
through core NIR.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5683>
Add the possibility to specify the source components. This is necessary
to let the UBO/SSBO index have more than one component, and it also lets
us remove a few hand-rolled load intrinsic definitions.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5683>
All the other driver-specific intrinsics are at the end of the file so
Intel's should go there too.
Reviewed-by: Sagar Ghuge<sagar.ghuge@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5503>
Depth and stencil writes are combined with color writes, so we need
this intrinsic which has sources for color, RT, depth and stencil.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5065>
SPIRV OpControlBarrier can have both a memory and a control barrier
which some hardware can handle with a single instruction. Let's
turn the scoped_memory_barrier into a scoped barrier which can embed
both barrier types. Note that control-only or memory-only barriers can
be supported through this new intrinsic by passing NIR_SCOPE_NONE to the
unused barrier type.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4900>
Somehow we ended up with an extra scalar source up-front. It doesn't
look like any drivers use this opcode yet so no real harm has been done
by it being wrong.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5218>
Caught by the sanity checking in nir_intrinsic_copy_const_indices()
(which is introduced by the next patch).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
These intrinsics are supposed to map to the underlying hardware
instructions, which don't have wrmask. We use them when we lower
store_output in the geometry pipeline and since store_output gets
lowered to temps, we always see full wrmasks there.
Intrinsic to get the SIMD width, which not always the same as subgroup
size. Starting with a small scope (Intel), but we can rename it later
to generalize if this turns out useful for other drivers.
Change brw_nir_lower_cs_intrinsics() to use this intrinsic instead of
a width will be passed as argument. The pass also used to optimized
load_subgroup_id for the case that the workgroup fitted into a single
thread (it will be constant zero). This optimization moved together
with lowering of the SIMD.
This is a preparation for letting the drivers call it before the
brw_compile_cs() step.
No shader-db changes in BDW, SKL, ICL and TGL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>
In f1883cc73d we tried to pass through alignments from load_constant
intrinsics when rewriting them to load_ubo in iris. However, those
intrinsics don't have ALIGN_MUL or ALIGN_OFFSET indices. It's easy
enough to add them. We just call the size/align function on the vector
type at the end of our deref chain and use the alignment returned from
there. It's possible we could do better by walking the whole deref
chain but this should be good enough.
Fixes: f1883cc73d "iris: Set alignments on cbuf0 and constant reads"
Closes: #2739
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4468>
I had missed that LDC actually uses vec4 units for its offset. This
means that we have to create a new instruction, and lower it in
ir3_nir_lower_io_offsets, similar to the existing SSBO instructions.
Unfortunately we can't assume that loads are always vec4-aligned, so we
have to use the alignment information that NIR gives us. Unfortunately,
it's currently woefully inadequate, and will have to be fixed to give us
good codegen in the future.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4568>
To facilitate lowering SSBOs to globals, we need a load_ssbo_address
intrinsic. This intrinsic takes an SSBO index and loads the address in
global memory of the SSBO (likely implemented via a uniform in the
driver). In the future, we'll support bounds checking, but at the moment
this is not supported (this pass should only be used for trusted
contexts at the moment, i.e. contexts without robustness extensions).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2753>
r600 reads vec4 from the UBO, but the offsets in nir are evaluated to the component.
If the offsets are not literal then all non-vec4 reads must resolve the component
after reading a vec4 component (TODO: figure out whether there is a consistent way
to deduct the component that is actually read).
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3225>
Midgard can't write depth and stencil separately. It has to happen in
a single store operation containing both. Let's add a panfrost specific
intrinsic and turn all depth/stencil stores into a packed depth+stencil
one.
Note that this intrinsic is not yet handled in emit_intrinsic(), but
we'll address that later.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3697>
This introduces a new NIR intrinsic for loading inputs at a specific
vertex index.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3578>
From the SPV_AMD_shader_explicit_vertex_parameter extension:
"Returns the value of the input <interpolant> without any
interpolation, i.e. the raw output value of previous shader
stage."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3578>
This is a more explicit name now that we don't want it to be doing any
memory barrier stuff for us.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Right now, it's implemented as a no-op for everyone. For most drivers,
it's a switch case in the NIR -> whatever which just breaks. For ir3,
they already have code to delete tessellation barriers so we just add a
case to also delete memory_barrier_tcs_patch.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
SPV_AMD_shader_image_load_store_lod allows to use a lod parameter
with OpImageRead, OpImageWrite and OpImageSparseRead.
According to the specification, this parameter should be a 32-bit
integer. It is initialized to 0 when no lod parameter is found
during SPIR-V->NIR translation.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When geometry shaders write a value to gl_Layer that doesn't correspond to
an existing layer in the target framebuffer the rendering behavior is
undefined according to the spec, however, there are CTS tests that trigger
this scenario on purpose, probably to ensure that nothing terrible happens.
For V3D, this situation is problematic because the binner uses the layer
index to select the offset to write into the tile state data, and we only
allocate tile state for MAX2(num_layers, 1), so we want to make sure we
don't produce values that would lead to out of bounds writes. The simulator
has an assert to catch this, although we haven't observed issues in actual
hardware it is probably best to play safe.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>