This better matches all the other atomic intrinsics such as those for
SSBOs and shared variables where the sign is part of the intrinsic
opcode. Both generators (GLSL and SPIR-V) know the sign from the type
of the image variable or handle. In SPIR-V, signed min/max are separate
opcodes from unsigned.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will effectively enable the optimization in anv.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
gl_PointCoord handling needs some special bits set in lima/ppir code
generation. Treating gl_PointCoord as a system value makes it easier
to distinguish from a regular varying.
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
For per-sample color writes we need the output intrinsic to pack the
sample index, which is not provided with regular store_output intrinsics
unless we figured out a way to encode it into the base or the offset.
v2:
- Drop the writemask (Eric)
Reviewed-by: Eric Anholt <eric@anholt.net>
This is intended to be used, for example, with OpenGL logic operations. It
takes a render target as source and a sample index in the base index for
MSAA color reads.
v2: drop the CAN_ELIMINATE and CAN_REORDER flags (Eric).
Reviewed-by: Eric Anholt <eric@anholt.net>
This gives more flexibility than the normal store_deref/store_output
versions (particularly, it allows us to abuse the type system in awful
ways, which is necessary for efficient format conversion in blend
shaders.)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
From SPV_EXT_demote_to_helper_invocation. Demote will be implemented
as a variant of discard, so mark uses_discard if it is used.
v2: Add CAN_ELIMINATE flag to the new intrinsic. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is nice to have with radeonsi, where color varyings are handled
specially to avoid recompiles.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In the next commit, we'll properly handle access qualifiers on struct
members by propagating them to load/store instructions, but these
instructions had no way to specify the qualifier.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This type information will be used by gather_ssa_types to get usable results
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This represents a float vec4 constant color, as passed to glBlendColor.
While the existing 4 shader sysvals are retained to minimize code churn,
a single vectorized intrinsic is required for efficient blending on
vector architectures. (This may also apply to archictectures like
Bifrost where ALU is scalar but load/store is vector; it largely depends
on how blending is implemented per-driver.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Calculates i,j at specified offset within a pixel. A new load_size_ir3
intrinsic is used in conjunction with fddx/fddy to translate the offset
into primitive space and adjust the i,j from load_barycentric_pixel
accordingly.
Signed-off-by: Rob Clark <robdclark@chromium.org>
This commit adds new nir_load/store_scratch opcodes which read and write
a virtual scratch space. It's up to the back-end to figure out what to
do with it and where to put the actual scratch data.
v2: Drop const_index comments (by anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
The constant_index slots are named right there in the intrinsic
definition, and the comment is just a chance to get out of sync. Noticed
while reviewing the lower_to_scratch changes that copy-and-pasted wrong
comments, and load_ubo and load_per_vertex_output had incorrect comments
currently.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Otherwise nir_lower_non_uniform_access crashes when it tries
to get the access of a load_ubo.
Fixes: 8ed583fe52 "spirv: Handle the NonUniformEXT decoration"
Fixes: e50ab2c0f2 "nir: Add access flags to deref and SSBO atomics"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
While a partial set of viewport system values exist, these are scalar
values, which is a poor fit for viewport transformations on vector ISAs
like Midgard (where the vec3 values for scale and offset each need to be
coherent in a vec4 uniform slot to take advantage of vectorized
transform math). This patch adds vec3 scale/offset fields corresponding
to the 3D Gallium viewport / glViewport+depth
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
We will need them for a new ACCESS_NON_UNIFORM flag that's about to be
added in the next commit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
On Intel, we have both bindless and bindful and we'd like to use them at
the same time if we can so we need to be able to distinguish at the NIR
level between the two. This also fixes nir_lower_tex to properly handle
bindless in its tex_texture_size and get_texture_lod helpers.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
These are ir3 specific versions of SSBO intrinsics that add an
extra source to hold the element offset (dword), which is what the
backend instructions need.
The original byte-offset source provided by NIR is not replaced
because on a4xx and a5xx the backend still needs it.
Reviewed-by: Rob Clark <robdclark@gmail.com>
v2: use formula with fewer operations
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2: add assert in else clause
make local group intrinsics 32 bit wide
v3: always use 32 bit constant for local_size
v4: add comment by Jason
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This lets us emit the VPM_WRITEs directly from
nir_intrinsic_store_output() (useful once NIR scheduling is in place so
that we can reduce register pressure), and lets future NIR scheduling
schedule the math to generate them. Even in the meantime, it looks like
this lets NIR DCE some more code and make better decisions.
total instructions in shared programs: 6429246 -> 6412976 (-0.25%)
total threads in shared programs: 153924 -> 153934 (<.01%)
total loops in shared programs: 486 -> 483 (-0.62%)
total uniforms in shared programs: 2385436 -> 2388195 (0.12%)
Acked-by: Ian Romanick <ian.d.romanick@intel.com> (nir)
We need more space than just a 32-bit scalar and we have to burn all
that space anyway so we may as well expose it to the driver. This also
fixes a subtle bug when UBOs and SSBOs have different pointer types.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
These correspond roughly to reading/writing OpenCL global pointers. The
idea is that they just take a bare address and load/store from it. Of
course, exactly what this address means is driver-dependent.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
With OpenCL some system values match the address bits, but in GLSL we also
have some system values being 64 bit like subgroup masks.
With this it is possible to adjust the builder functions so that depending
on the bit_sizes the correct bit_size is used or an additional argument is
added in case of multiple possible values.
v2: validate dest bit_size
v3: generate hex values in python code
remove useless imports
rename and move bit_sizes
v4: add 1 to legal bit_sizes for front_face
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For now, it's hidden behind a cap. Hopefully, we can eventually drop
that along with all the manual offset code in spirv_to_nir.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This commit adds a new num_components value for intrinsic sources of -1
which means that it consumes everything and the number of components
effectively isn't validated. This is useful for deref sources which
just take the result of the deref and we leave it up to the driver to
decide what that size should be.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This extension is not properly tested (testing for
GL_ARB_fragment_shader_interlock is not sufficient), and since this was
noted in review on August 28th no tests have been sent.
Revert "i965: Add INTEL_fragment_shader_ordering support."
Revert "mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering"
This reverts commit 03ecec9ed2.
This reverts commit 119435c877.
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Eric Anholt <eric@anholt.net>
This also changes spirv_to_nir and glsl_to_nir to set them. The one
place that doesn't set them is shared memory access lowering in
nir_lower_io. That will have to be updated before any consumers of it
can effectively use these new alignments.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
v2: - change how the access qualifiers are accumulated
v3: - duplicate members in struct_member_decoration_cb()
- handle access qualifiers on variables
- remove access qualifiers handling in _vtn_variable_load_store()
- fix setting access qualifiers on type->array_element
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net
Previously, the back-end compiler turn image access into magic uniform
reads and there was a complex contract between back-end compiler and
driver about setting up and filling out those params. As of this
commit, both drivers now lower image_deref_load_param_intel intrinsics
to load_uniform intrinsics controlled by the driver and lower the other
image_deref_* intrinsics to image_* intrinsics which take an actual
binding table index. There are still "magic" uniforms but they are now
added and controlled entirely by the driver and that contract no longer
spans components.
This also has the side-effect of making most image use compile-time
binding table indices. Previously, all image access pulled the binding
table index from a uniform. Part of the reason for this was that the
magic uniforms made it difficult to decouple binding table indices from
the uniforms and, since they are indexed completely differently
(especially in Vulkan), it was hard to pull them apart. Now that the
driver is handling both, it's trivial to decouple the two and provide
actual binding table indices.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166872 -> 15164293 (-0.02%)
instructions in affected programs: 115834 -> 113255 (-2.23%)
helped: 191
HURT: 0
total cycles in shared programs: 571311495 -> 571196465 (-0.02%)
cycles in affected programs: 4757115 -> 4642085 (-2.42%)
helped: 73
HURT: 67
total spills in shared programs: 10951 -> 10926 (-0.23%)
spills in affected programs: 742 -> 717 (-3.37%)
helped: 7
HURT: 0
total fills in shared programs: 22226 -> 22201 (-0.11%)
fills in affected programs: 1146 -> 1121 (-2.18%)
helped: 7
HURT: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit moves our storage image format conversion codegen into NIR
instead of doing it in the back-end. This has the advantage of letting
us run it through NIR's optimizer which is pretty effective at shrinking
things down. In the common case of rgba8, the number of instructions
emitted after NIR is done with it is half of what it was with the
lowering happening in the back-end. On the downside, the back-end's
lowering is able to directly use predicates and the NIR lowering has to
use IFs.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166910 -> 15166872 (<.01%)
instructions in affected programs: 5895 -> 5857 (-0.64%)
helped: 15
HURT: 0
Clearly, we don't have that much image_load_store happening in the
shaders in shader-db....
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of requiring 4 components, this allows them to potentially use
fewer. Both the SPIR-V and GLSL paths still generate vec4 intrinsics so
drivers which assume 4 components should be safe. However, we want to
be able to shrink them for i965.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This extension provides new GLSL built-in function
beginFragmentShaderOrderingIntel() that guarantees
(taking wording of GL_INTEL_fragment_shader_ordering
extension) that any memory transactions issued by
shader invocations from previous primitives mapped to
same xy window coordinates (and same sample when
per-sample shading is active), complete and are visible
to the shader invocation that called
beginFragmentShaderOrderingINTEL().
One advantage of INTEL_fragment_shader_ordering over
ARB_fragment_shader_interlock is that it provides a
function that operates as a memory barrie (instead
of a defining a critcial section) that can be called
under arbitary control flow from any function (in
contrast the begin/end of ARB_fragment_shader_interlock
may only be called once, from main(), under no control
flow.
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>