When the option is enabled, lower memory barriers to the
unified nir_intrinsic_scoped_barrier.
The translation of the following is based on
https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_gl_spirv.txt
- memoryBarrier()
- memoryBarrierBuffer()
- memoryBarrierImage()
- memoryBarrierShared()
- groupMemoryBarrier()
Also use scoped barrier for the memory counterparts of the GLSL
(control) barrier() when the option is enabled. The execution
part of a (control) barrier() remains using the old intrinsic.
For memoryBarrierAtomicCounter() there's no corresponding
nir_var_atomic_counter mode. Since atomic counters are lowered
to SSBOs, use the nir_var_mem_ssbo mode in the scoped barrier
instead.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3339>
If a dual source blend colour is never written, src1 will be null and it will be
invalid to dereference it. src1 is dereferenced both for the f2fN instruction
but also if a dual blend factor is used... even if the latter isn't strictly
valid, segfaulting in the NIR pass seems a lot meaner than blending with zero.
The referenced commit hosed Asahi, causing anything that used blending to crash.
Panfrost is unaffected since it always supplies a dual colour due to our crude
construction of blend shaders.
Fixes: 8313016543 ("nir/lower_blend: Consume dual stores")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21544>
Add a new texture opcode that returns the LOD bias of the sampler. This will be
used on AGX to lower sampler LOD bias to txb and friends. This needs to be a
texture op (and not a new intrinsic) to handle both bindless and bindful
samplers across GL and Vulkan in a uniform way.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21276>
Now that we're working on lowered I/O, passing in the dual source blend colour
via a sideband doesn't make any sense. The primary source blend colours are
implicitly passed in as the sources of store_output intrinsics; likewise, we
should get dual source blend colours from their respective stores. And since
dual colours are only needed by blending, we can delete the stores as we go.
That means nir_lower_blend now provides an all-in-one software lowering of dual
source blending with no driver support needed! It even works for 8 dual-src
render targets, but I don't have a use case for that.
The only tricky bit here is making sure we are robust against different orders
of store_output within the exit block. In particular, if we naively lower
x = ...
primary color = x
y = ...
dual color = y
we end up emitting uses of y before it has been defined, something like
x = ...
primary color = blend(x, y)
y = ...
Instead, we remove dual stores and sink blend stores to the bottom of the block,
so we end up with the correct
x = ...
y = ...
primary color = blend(x, y)
lower_io_to_temporaries ensures that the stores will be in the same (exit)
block, so we don't need to sink further than that ourselves.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21426>
This will emulate VGT_ESGS_RING_ITEMSIZE, which does the multiplication
for us. It's beneficial to stop setting VGT_ESGS_RING_ITEMSIZE to reduce
context rolls, and also the register will be removed in the future.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21525>
The scaling needs to be ubo * MAX_INLINABLE_UNIFORMS, not
ubo * PIPE_MAX_CONSTANT_BUFFERS, otherwise accesses beyond buffer size
will result for ubo >= 4 (and we'd also access the wrong values later
for other non-zero ubo indices).
Fixes: a7696a4d98 ("lavapipe: Fix bad array index scale factor in lvp_inline_uniforms pass")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21506>
The result might be used in a deref_ptr_as_array, which requires a proper
stride within lower_explicit_io. If we'd lose that information or end up
with a different stride don't execute this optimization.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8289
Fixes: b779baa9bf ("nir/deref: fix struct wrapper casts. (v3)")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21458>
It's not connected up to anything at the moment, and even if I do enable
it for crocus HSW it only shaves 3 instructions off of one particular VS
in an old synthetic benchmark, not affecting anything else in shader-db.
I don't think anyone will care to ever fix or port this to NIR, let's just
retire it.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21353>
It's unused and if we ever want to use it again we should make it an alu
opcode instead.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21445>
We have specialized lowering passes dealing with most of that already:
1. gl_nir_lower_samplers_as_deref
2. nir_lower_samplers
3. nir_lower_cl_images
If we need more than that, those passes can deal with following deref
chains as well.
We _might_ need to improve nir_lower_cl_images a bit for more complex
kernels, but CL also doesn't allow indirect images, so we are always able
to optimize the entire deref chain away.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20161>
If a loop has only a single continue, the control flow is already
converged and we can inline the continue construct.
If a loop has no continue statement at all, the Continue Construct
is unreachable and can simply be deleted.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13962>
This pass lowers Loop Continue Constructs to the previous solution
by inserting it at the beginning of the loop:
loop {
if (i != 0) {
continue construct
}
loop body
}
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13962>
Hoping that I didn't miss any, this *should* add assertions
to all functions and passes which explicitly handle 'nir_loop'.
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13962>
The added continue_list corresponds to the SPIR-V
Continue Construct and serves as a converged control-flow
construct and is executed after each continue statement
and before the next iteration of the loop body.
Also adds validation rules for loops with Continue Construct
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13962>
This is one step towards lowering I/O during shader preprocess rather than at
variant create time, which helps mitigate shader variant jank. It's also a lot
simpler.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> [v1]
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20836>
In OpenGL, FRAG_RESULT_COLOR implicitly broadcasts to every render target. Our
existing lower_blend code (somewhat arbitrarily) aliases to the the first render
target's format and blend settings. That said, I don't think that works if
different render targets have different settings -- or blend with their
different destinations -- though I don't have relevant spec text right now.
The actual reason this works is that all users of this pass either call
nir_lower_fragcolor first (panfrost, asahi) or don't have FRAG_RESULT_COLOR as
part of their API (panvk, soon agxv). Unless/until we actually have a use case
for nir_lower_blend with gl_FragColor, assert that gl_FragColor is lowered first
so we don't need to worry about this imaginary case.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20836>
Stores don't have destinations, and if they did, it would be invalid to change
their ssa_def's num_components without also changing the SSA def. Remove the
nonsensical (but harmless) assignment.
This fixes 25249e8be2 ("nir/lower_blend: Expand or shrink output variables as
needed"), but as the bug is harmless in practice, it does not need to be
backported.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Suggested-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20836>
This is a form of lowered I/O, it needs I/O semantics so we can know the
location to store to instead of passing via a sideband.
Over in !20906, we will use the BASE to lower blend shader with multisampling in
NIR instead of passing the number of samples and framebuffer format along a
sideband to the Midgard compiler. That's not needed for this series (this patch
was cherry-picked to avoid regressions in the lower_blend changes) but it's good
to model the full form of the I/O lowered intrinsic here.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20836>
No longer used. All of the information that was previously track here is
tracked directly in nir_loop_variable... and, technically speaking, has
been tracked there ever since 0b9639c35d.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21289>
These track the same information in a slightly different way. Since
nir_loop_variable::init_src is visible outside this module, it cannot
be eliminated.
As an intentional side effect, induction variables with constant
initializers will now have their nir_loop_induction_variable::init_src
field point to the load_const source. Previously this pointer would be
NULL.
v2: Update unit tests and commit message. Remove the now unused ind_var
variable in find_trip_count.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21289>
These track the same information in a slightly different way. Since
nir_loop_variable::update_src is visible outside this module, it cannot
be eliminated.
This leads to some nice simplification in find_trip_count. Previously
this code only had access to the ALU instruction that performs the
increment. It had to "search" the parameters to determine which (if any)
was the constant. With this change, this code has access to the
nir_alu_src of the ALU instruction that performs the increment. It no
longer needs to search the parameters for the constant. It's either the
supplied nir_alu_src or nothing.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21289>
As an intentional side effect, induction variables with constant
increments will now have their nir_loop_induction_variable::update_src
field point to the load_const source. Previously this pointer would be
NULL.
v2: Update unit tests and commit message.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21289>
A couple basic tests for loops with the exit condition after the
increment. In compiler literature, the optimization that moves the exit
condition from the top to the bottom is called "loop inversion."
v2: Pass parameters to loop_builder_invert using a struct. Add a comment
describing the loop being constructed to loop_builder_invert. Both
suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21289>
Inspired heavily by the work by Yevhenii Kolesnikov in the original
versions of !3445.
v2: Pass parameters to loop_builder using a struct. Add a comment
describing the loop being constructed to loop_builder. Both suggested by
Caio.
v3: mscv C++ designated initializer lolz.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21289>
All of the other tests only log the shader when validation fails, so
having that shader scroll by in the output is very distracting.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21289>
This shader originates from Granite 3D engine and has been adapted
to be used with Open GL and some GLSL ES specifics.
GLSL ES adaptation:
- remove Vulkan specifics: EXT_samplerless_texture_functions usage,
specialization constants, push constant usage
- inline bitextract.h
- always DECODE_8BIT and hardcode error color (for now)
- port to GLSL ES, required some type changes, explicit type
conversions and setting up precisions for types
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19886>
This is based on brw_nir_lower_mem_access_bit_sizes() but ended up being
substantially different. While the core concepts are all the same, the
brw_* version made a lot of Intel-specific assumptions. The new version
takes a callback which takes a number of bytes of data and an alignment
pair and returns a bit size and number of components to load/store.
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21232>
This makes the pass significantly faster cutting execution time
by around 30% in the cts test
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20
This 30% improvement is in addition to all the improvements from
the proceeding patches.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20381>