lower_mul_2x32_64 generates mul_high opcodes, and lower_mul_high is done by
nir_lower_alu, so call nir_lower_alu after nir_opt_algebraic.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
If for some reason the fence associated with an image doesn't signal,
we're likely in a device lost scenario, we should report that error.
We can't really wait for a given amount of time because we could get a
timeout and that is not a valid error to report for vkQueuePresentKHR,
so just wait forever.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/830
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Was hoping to find potential issues but nothing. Still probably a good
idea.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I'm honestly unsure what this is for, but it's needed on MFBD systems
for unknown reasons, at least when MRT is actually in use and then
sometimes without MRT (it fixes a blend shader issue on T760?)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
Epilogues are special fixed-function blocks, so they need special
handling for liveness analysis to work completely. This in turns fixes
RA issues for many shaders using MRT.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
The flow is considerably more complicated. Instead of one writeout loop
like usual, we have a separate write loop for each render target. This
requires some scheduling shenanigans to get right.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
This is a branch, like discard, so we need a barrier to make it safe.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
We have to key the blend shader for the render target number due to
writeout silliness.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
This fixes a series of dEQP tests on Raven platforms:
- dEQP-GLES3.functional.fbo.msaa.2_samples.rgba4
- dEQP-GLES3.functional.fbo.msaa.2_samples.rgb5_a1
- dEQP-GLES3.functional.fbo.msaa.2_samples.rgb565
- dEQP-GLES3.functional.fbo.msaa.2_samples.rg8
- dEQP-GLES3.functional.fbo.msaa.2_samples.r16f
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3090>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3090>
V3D can do indirect inputs so we don't need it. Also, the lowering
produces horrible if-ladder code that is particularly bad for geometry
shaders where inputs are always arrays and shader bodies usually have
a loop indexing into them.
This fixes a couple of geometry shader tests in CTS that would fail to
register allocate otherwise.
There are no changes in shader-db.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
With geometry shaders the number of emitted primitived is decided
at run time, so we cannot precompute it in the CPU and we need to
use the PRIMITIVE_COUNTS_FEEDBACK commands to have the GPU provide
the number like we do for the number of primitives written to
transform feedback. This may have a performance impact though, since
it requires a sync wait for the draw to complete, so we only do
it when geometry shaders are present.
v2: remove '> 0' comparison for ponter type (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
When geometry shaders write a value to gl_Layer that doesn't correspond to
an existing layer in the target framebuffer the rendering behavior is
undefined according to the spec, however, there are CTS tests that trigger
this scenario on purpose, probably to ensure that nothing terrible happens.
For V3D, this situation is problematic because the binner uses the layer
index to select the offset to write into the tile state data, and we only
allocate tile state for MAX2(num_layers, 1), so we want to make sure we
don't produce values that would lead to out of bounds writes. The simulator
has an assert to catch this, although we haven't observed issues in actual
hardware it is probably best to play safe.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
When doing layered rendering the binning stage will prepare per-tile
lists for each layer in the framebuffer, so we need to make sure
we allocate enough space for them .
We also need to emit the NUMBER_OF_LAYERS packet. This is required
even when the number of layers is only 1, otherwise the simulator
detects buffer overflows in the tile_state BO during some CTS test
cases involving layered FBOs.
When rendering, we need to emit commands for each layer of the
framebuffer separately and make sure we address the correct layers for
each one.
v2: fixed typo in comment (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
For layered rendering we need to emit per layer rendering commands
lists so we we can end up requiring a fairly large buffer for this
if the number of layers is large enough.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
OES_geometry_shader introduced the concept of layered framebuffers.
Removing this assertion gets a bunch of CTS tests to pass. We will
also need layered images to implement layered rendering with geometry
shaders.
v2: fix typo in commit message (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
If we program an output size of 0 the simulator asserts. This was
not a problem until now because our VS would always have to
emit fixed function outputs, however, now that it can be paired
with a GS we can end up with a VS shader that no longer emits
any outputs.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Geometry shaders can output many vertices and thus have higher VPM memory
pressure as a result. It is possible that too wide geometry shader dispatches
exceed the maximum available VPM output allocated, in which case we need
to reduce the dispatch width until we can fit the VPM memory requirements.
Supported dispatch widths for geometry shaders are 16, 8, 4, 1.
There is a limit in the number of VPM output sectors that can be used by a
geometry shader that we can meet by lowering the dispatch width at compile
time, however, at draw time we need to revisit this number and, together with
other elements that can contribute to total VPM memory requirements, decide
on a configuration that can fit the program into the available VPM memory.
Ideally, we also want to aim for not using more than half of the available
memory so we that we can run a pair of bin and render programs in parallel.
v2: fixed language in comment and typo in commit log. (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
According to the documentation, the 1-way dispatch width is only supported
with geometry shaders.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This is good enough to get basic GS workloads working, later patches will
improve this by adding instancing support, proper SIMD configuration, etc.
Notice that most of the TESSELLATION_GEOMETRY_SHADER_PARAMS fields are only
relevant when tessellation shaders are present. We do not support tessellation
yet, but we still need to fill in these tessellation state with default values
since our packing functions require some of these to have non-zero values.
v2:
- Add a comment in the code explaining why we fill in
tessellation fields (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Every code address starts at bit 3 (addresses must be 64-bit aligned),
with the first 3 bits used to specify threading and NaN propagation
parameters for the shader program.
We generally skip "reserved" bits, however, doing this when the
reserved field is the last in a struct and it is large enough can
make us compute incorrect (smaller) struct sizes which can
lead to corrupt CLs. In particular, the "Tess/Geom Common Params"
struct has a reserved field at the end that is 8-bit, so if we
don't include this we compute a packet size that is 1 byte smaller
than it shold, making the next packet we emit start 1 byte
earlier and therefore leading to incorrect CL data from that point
forward.
The name of one of the fields was not correct.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Most of the relevant work happens in the v3d_nir_lower_io. Since
geometry shaders can write any number of output vertices, this pass
injects a few variables into the shader code to keep track of things
like the number of vertices emitted or the offsets into the VPM
of the current vertex output, etc. This is also where we handle
EmitVertex() and EmitPrimitive() intrinsics.
The geometry shader VPM output layout has a specific structure
with a 32-bit general header, then another 32-bit header slot for
each output vertex, and finally the actual vertex data.
When vertex shaders are paired with geometry shaders we also need
to consider the following:
- Only geometry shaders emit fixed function outputs.
- The coordinate shader used for the vertex stage during binning must
not drop varyings other than those used by transform feedback, since
these may be read by the binning GS.
v2:
- Use MAX3 instead of a chain of MAX2 (Alejandro).
- Make all loop variables unsigned in ntq_setup_gs_inputs (Alejandro)
- Update comment in IO owering so it includes the GS stage (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
While lowering vpm outputs we look for the NIR variables matching
particular store output instructions and we expect to find a match,
so assert on that.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Even without depth+stencil addrlib can (correctly!) decide to
disable tc compatible HTILE.
One example is 8x sampling with 32-bit depth on Stoney. The row size
on Stoney is 1024, while the tile size is 2048, which results in
tile splits which are not supported with tc-compat.
On Stoney, this fixes
dEQP-VK.glsl.builtin_var.fragdepth.*_list_d32_sfloat_multisample_8
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3054>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3054>
addrlib sometimes returns smaller sizes for tcCompat as it does
not seem to take into account the depth+stencil matching config
gymnastics with tcCompat.
This fixes
dEQP-VK.pipeline.render_to_image.core.2d_array.huge.height.r8g8b8a8_unorm_d32_sfloat_s8_uint
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3054>
Signed-off-by: Denis Pauk <pauk.denis@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: Marek Olšák <maraeo@gmail.com>
CC: Rhys Perry <pendingchaos02@gmail.com>
CC: Bruce Cherniak <bruce.cherniak@intel.com>
CC: Matt Turner <mattst88@gmail.com>
Reuse Code from:
f69bc797e1 gallium/auxiliary: Add helper support for bptc format compress/decompress
Signed-off-by: Denis Pauk <pauk.denis@gmail.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: Marek Olšák <maraeo@gmail.com>
CC: Tim Rowley <timothy.o.rowley@intel.com>