Rather than ingesting separate control and memory barriers, ingest only the
combined and optimized scoped_barrier intrinsic. For barriers originating from
GLSL, this makes it easier to ensure correctness. For barriers originating from
SPIR-V, this is required for translation at all, as spirv_to_nir knows only
scoped barriers. So this gets us closer to Vulkan and OpenCL.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21752>
Lowers away 64-bit loads, which we'll create in the sysval lowering for
dynamically indexed UBOs/VBOs. The lowering generates pack_64_2x32 instructions,
so lower those too.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
Instead of lowering to bitwise ops. Yet another way of subdividing in NIR.
Probably insignificant but makes it easy to check that the pass ordering from the
previous pass is right. It does let us get much better codegen for
unpacksnorm2x16, whatever that's worth.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
As intended. We can't CSE with partial null destinations in the way, so we
shouldn't eliminate dead destinations until after CSE has run. But we should
still eliminate dead instructions to ensure CSE doesn't move things around
needlessly, hurting register pressure.
Noticed while debugging live range splitting.
No GLES3.0 shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
Our dead code elimination pass does two things:
1. delete instructions that are entirely unnecessary
2. delete unnecessary destinations of necessary instructions
To deal with pass ordering issues, we sometimes want to do #1 without #2.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
We should handle nir_op_unpack_32_2x16_split_* natively, since we can generate
better code with agx_subdivide (coalescing the ops away) than the bitshift
lowering.
That said, we do need some extra instructions for the floating point
conversions.
No shader-db changes (which makes sense because we're targetting the GLES3.0
shader-db, which doesn't have the packing GLSL functions).
The real motivation of this change isn't optimizing some GLSL pack functions,
though, it's avoiding a code regression from using NIR's memory bit size
lowering in a future MR. That lowering will turn things like "load i16vec4" into
"load i32vec2 + unpack_32_2x16", so we need to be able to coalesce that unpack.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
Implement buffer textures in full generality. There are a few issues here:
* OpenGL requires buffer textures support a minimum size of 65536 elements,
however 1D textures in AGX are (at most) 8192 elements.
* OpenGL 4.0 (and OpenGL ES) require buffer textures to support the "RGB32"
texture formats. These are 3 packed channels of 32-bits each. In general,
non-power-of-two texel sizes are problematic. AGX does not support any such
formats and we rely on the GL frontend to lower to a padded format (RGBX) if
necessary. Such a lowering cannot work for buffer textures, however, so we
need to find a way to implement RGB32 buffer textures.
We solve these issues in the follow way:
* Use 2D texture descriptors for buffer textures, with a large fixed
power-of-two size along one axis. Then large texel indices may be accessed at
a small vec2 texel coordinate, and since the fixed dimension is a
power-of-two, that vector may be recovered by simply shifting and masking.
This effectively avoids size restriction. We do need to clamp texel indices to
the buffer size to avoid faulting on OOB reads, since we may read past the end
of the buffer (if the app binds a non-page-aligned offset into the buffer).
* Use a general purpose memory load for RGB32 buffer textures. Lower the texture
load instruction to a memory load from the buffer and some address arithmetic.
There's no format conversion needed for RGB32, other than maybe filling in a
format-appropriate alpha, so this is straightforward. Again, we need to clamp
the texel index for robustness with OOB reads.
Each of these solutions brings its own problem.
* Using 2D textures instead of 1D requires physically rounding up the buffer
size when packing the descriptor, so we can no longer implement textureSize()
by reading off the texture descriptor like normal.
* We don't know at compile-time whether a given texture load will read from an
RGB32 buffer texture or not, so we need to emit code for both. In Vulkan, we
can't key the shader to this property, either, since it's descriptor set state
and not pipeline state.
And each of these problems in turn brings its own solution:
* The texture descriptor is linear, so the "compression buffer address" field is
ignored by the hardware. We stash the real buffer size there so that
textureSize becomes a load from the texture descriptor like usual, without
requiring a sideband (which would complicate bindless textures).
* If we determine a texture descriptor contains RGB32 data, then it will never
be interpreted by the hardware and hence does not need to be a valid texture
descriptor. So, we extend the hardware's format enum to contain a
software-defined RGB32 format enum. Then, when lowering texture buffer loads,
we either read it as a typed RGB32 memory load or as a texture load depending
on the value of the format field in the texture descriptor.
All of this is accomplished with a big NIR pass generating a pile of strange
looking code. But it should be good enough in practice for this silly feature.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21672>
In NIR, texelFetch (txf) does not use a sampler, but in AGX, it does -- even
though the contents of the sampler are semantically irrelevant. Rather than
requiring the state tracker to bind a sampler anyway (indicated for texture
buffers with PIPE_CAP_TEXTURE_BUFFER_SAMPLER), just add a dummy sampler
ourselves if txf is used and there are otherwise no samplers. This is helpful
because PIPE_CAP_TEXTURE_BUFFER_SAMPLER isn't honoured by Rusticl or seemingly
mesa/st's PBO code, and after implementing this dummy sampler workaround in
Panfrost for Rusticl, I realized this CAP is silly and shouldn't exist in the
first place. (And I regret pushing for its reinclusion.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21672>
Get the texture/sampler index from the texture/sampler_offset source (which
is an offset from 0 thanks to the lower_index_to_offset lowering) and feed it in
as corresponding 16-bit texture instruction sources.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21704>
For indirect indexing into the binding table. Note this does not handle packing
the bindless forms, since that's a bit more involved.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21704>
The gallium disk cache is about to depend on these, and I don't want to
create a dependency on agx_opcodes.h.py for that. So, make a new header
for them that doesn't have build dependencies.
Rename them to agx_compiler_* too, to avoid collisions with the other
driver debug flags.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21776>
We can't add a 64-bit immediate with the hardware iadd, that won't work. What we
can do is add a 32-bit immediate, derived as the low 32-bits of a 64-bit
nir_ssa_def.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21643>
If we manage to fold in a left shift that's bigger than the hardware can do, we
should at least avoid generating a useless right shift to feed the hardware
rather bailing completely.
For motivation, this form of address arithmetic is encountered when indexing
into arrays with large power-of-two element sizes (array-of-structs).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21643>
Optimize address arithmetic of the form
base + u2u64((index << shift) + const)
into hardware operands
base, index << (shift - format_shift) + const'
which (if format_shift = shift) can be simply
base, index + const'
rather than the current naive translation
base, ((index << shift) + const) >> format_shift
This saves at least one pointless shift. We can't do this optimization with
nir_opt_algebraic, because explicitly optimizing "(a << #b) >> #b" to "a" isn't
sound due to overflow. But there's no overflow issue here, which is what this
whole pass is designed around.
For motivation, this address arithmetic implements "dynamically indexing into an
array inside of a C structure", where the const is the offset of the array
relative to the structure.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21643>
Once we've matched a summand, commit to it. This avoids needlessly checking the
second source if the first matched, and removes some indentation/funny control
flow.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21643>
Fragment shaders with side effects need to be lowered to ensure they execute for
all shaded pixels but no helper threads. Add a lowering pass to handle this.
Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.atomic_counter.const_literal_fragment
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21712>
This is required for address arithmetic to be lowered properly for compute
kernels, which may have u2u64 in the source NIR.
No shader-db changes (for GLES3.0).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21430>
Most integer immediates are only 8-bit, but load/store instructions allow their
immediate offsets to be 16-bit instead. Take advantage of this in the optimizer.
This eliminates 36% of the instructions in
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.36, a fitting
percentage.
Insignificant effect on dEQP-GLES31.functional.ssbo.* performance... Only a
small % of our compile-time pie is actually spent in the backend anyway (as
opposed to NIR passes or GLSL IR).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21430>
This avoids creating silly preambles that don't actually do anything except push
a constant that we could've inlined for cheaper anyway, since nir_opt_preamble's
cost model is sensitive to e.g. constant folding.
This avoids a pointless preamble in split-hell.
As a nice bonus, this also improves compile-time on address-heavy shaders. With
a release build, CPU time in dEQP-GLES31.functional.ssbo.* reduces from 12.87s
to 10.77... a 16% improvement is nothing to sneeze at.
shader-db results are mostly noise.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21430>
Useful both for ruling out issues with shader preambles as well as (in some
cases) making for a nicer reading experience of the compiled assembly.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21430>
It doesn't make sense, they're basically little compute kernel environments.
Noticed while debugging dEQP-GLES31.functional.fbo.no_attachments.multisample.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21710>
Sample index and layer index are both 16-bits, even though they are zero
extended for compiler simplicity in some cases. In particular this means that 2D
MSAA arrays consume 6 half-regs for their coordinates, not 8. This is what the
IR translation (actually agx_nir_lower_texture) produces, we just need to fix
the calculation in agx_read_registers to agree.
Fixes validation failure in tests like
dEQP-GLES31.functional.texture.multisample.samples_4.use_texture_color_2d_array
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21708>
This gives our shifts SM5 behaviour at the cost of a little extra ALU. That way,
we match NIR's shifts.
This fixes unsoundness of GLSL expressions like "a << (b & 31)", where the &
would mistakenly get optimized away.
Closes: #8181
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reported-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21673>
These just don't seem to work. macOS falls back to eMRT here...
dEQP-GLES3.functional.draw_buffers_indexed.random.max_implementation_draw_buffers.13
from Fail -> Crash. Proper solution will come when we implement eMRT later on.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21705>
Some things were missed (like winsys) and there's still some bad include
orders lying around and some other randomness.
We should set up CI checks for this soon... ^^;;
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21687>
I'm not sure if this was always broken downstream or just got dropped at
some point, but it's definitely UAPI-agnostic and missing now that we
have all the non-UAPI bits upstream.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21677>
We expect to forward GPU fault information to userspace. Since Mesa can
get that information, we can look up the fault address to log what was
the containing or nearest BO. Add a helper for that, so it can be called
from the driver.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21662>
With macOS support out of the way, we can start implementing a lot of
the Linux driver interface and bookkeeping without actually adding the
UAPI proper. Let's do that to reduce the size of the UAPI patchset.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21662>
Nothing implemented, but this lets us get the batch tracking bits in,
including explicit sync/DMA-BUF integration which uses generic ioctls.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21662>
This might be useful in the future, but it is best reimplemented in
terms of the upcoming Linux UAPI instead of having parallel codepaths.
Let's drop it.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21662>
G13 does not support sampler descriptor LOD biasing, so this needs to be lowered
to shader code for APIs that require this functionality. Add an option to do
this lowering while doing our other backend texture lowerings. This generates
lod_bias_agx texture instructions which the driver is expected to lower
according to its binding model.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21276>