This will be required for `cl_intel_required_subgroup_size`, but it
already helps implementing OpenCL subgroups as this allows us to check
with every subgroup size when implementing
`CL_KERNEL_LOCAL_SIZE_FOR_SUB_GROUP_COUNT`.
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22893>
When clearing an alpha-only format, set the alpha channel into red
channel.
Fixes `spec@ext_texture_integer@multisample-fast-clear
gl_ext_texture_integer`.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23497>
It is now set by all relevant drivers and not checked anywhere.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23191>
Now everyone's saying NIR, and doing any NTT internally. The only returns
of TGSI were in gallivm_get_shader_param() and
tgsi_exec_get_shader_param(), but the drivers were returning NIR instead
of calling down to them.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23114>
If the viewport center is not positive we can't express it
through the fine coordinates, which are unsigned, and we
need to use the coarse coordinates too.
Fixes new crashes in Vulkan CTS 1.3.6.0:
dEQP-VK.draw.renderpass.offscreen_viewport.*negative*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23489>
These are split between fine and coarse coordinates. We have only been
using fine until now, so we kept the same naming convention we had
prior to V3D 4.1 for simplicity, but we will start using the coarse
coordinates soon too.
Also, the signedness was reversed: coarse coordinates are signed and
fine coordinates are unsigned.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23489>
16K * 16K * 16bpp = 4G, which overflows int32, so layer_stride needs to
have 64 bits. More generally, any "byte_stride * height" computation
can overflow int32.
Use (u)intptr_t in some gallium and st/mesa places where we do CPU access.
Use uint64_t otherwise.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23389>
Similar to checking that a z/s format can not be used for color
blit with tile-based blit path, color format can not be used for z/s
blit.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23390>
They are exactly the same, so it's safe to do the replace
Also gen OS_TIMEOUT_INFINITE var with rusticl_mesa_bindings_rs by OS_ prefix and
include "util/os_time.h" in rusticl/rusticl_mesa_bindings.h
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23401>
This is a prepare step to remove depends on p_defines.h in src/util/*
This is done by:
replace pipe_prim_type with mesa_prim
replace shader_prim with mesa_prim
replace PIPE_PRIM_MAX with MESA_PRIM_COUNT
replace SHADER_PRIM_ with MESA_PRIM_
replace PIPE_PRIM_ with MESA_PRIM_
This patch only replace code only
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23369>
Add proper support for ASTC texture compression in the v3d
gallium driver, instead of relying on the fallback software
conversion from gallium, as the hardware has native support
for ASTC compressed textures.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23260>
Now that it is exposing GLSL 1.30, and we can read clipdistance arrays
in the fragment shader, let's enable this capability.
It fixes
`spec@glsl-1.30@execution@clipping@fs-clip-distance-interpolated,Crash`.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23232>
Now that we have nir_fsub_imm, let's use it to save some typing!
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23179>
nir_registers are only supposed to be used temporarily. They may be created by a
producer, but then must be immediately lowered prior to optimizing the produced
shader. They may be created internally by an optimization pass that doesn't want
to deal with phis, but that pass needs to lower them back to phis immediately.
Finally they may be created when going out-of-SSA if a backend chooses, but that
has to happen late.
Regardless, there should be no case where a backend sees a shader that comes in
with nir_registers needing to be lowered. The two frontend producers of
registers (tgsi_to_nir and mesa/st) both call nir_lower_regs_to_ssa to clean up
as they should. Some backend (like intel) already depend on this behaviour.
There's no need for other backends to call nir_lower_regs_to_ssa too.
Drop the pointless calls as a baby step towards replacing nir_register.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23181>
Ideally we would like to trigger a compilation error like we do on
v3dv (VK_ERROR_UNKNOWN). But with v3d we can't really do that, as this
could happen on a draw call. Let's at least assert so debug builds
stops at this point.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23203>
This is a one-line wrapper, so let's just use the v3d_X or v3dX macros
instead.
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23172>
This is a one-line wrapper, so let's just use the v3d_X or v3dX macros
instead.
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23172>
This is a one-line wrapper, so let's just use the v3d_X or v3dX macros
instead.
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23172>
Improve readability by using an auxiliar
struct v3d_device_info *devinfo = &screen->devinfo;
this was triggered by the use of the v3d_X macro, where just having a
devinfo makes is more friendly. As we are here, we used it on other
places of the code.
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23172>
Some values like the transform feedback offset or the number of output
vertices in VS can be obtained knowing how many vertices and primitive
type are used in the drawcall.
But when the primitive restart is enabled, doing this is quite more
complex, as we should parse the vertex buffer to know where is the
restart values, and so on.
In this case, delay this computation after the drawcall is executed, by
querying the GPU to know these values.
Similarly, this delay is also applied to compute the transform feedback
buffer offsets when there is a geometry shader, as we don't know
beforehand how many vertices it is going to output.
This fixes `spec@!opengl 3.1@primitive-restart-xfb flush` and
`spec@!opengl 3.1@primitive-restart-xfb generated`.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22716>
Ensure the render target values are in the proper range.
This fixes `spec@!opengl 3.0@render-integer`.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22733>
Some of the new features require at least V3D 4.2. And actually, 4.2 is
the version used by the Raspberry Pi 4 hardware.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22733>
Instead of hardcoding conditionals to know which hardwared-based version
of a function to call, just wrap them in a macro to use
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22733>
1D texture miplevels are aligned to 64b, but this should include also
texture arrays.
Fixes
`spec@glsl-1.30@execution@texelfetchoffset@vs-texelfetch-usampler1darray`
and several other piglit tests.
CC: mesa-stable
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22775>
tex_format should be `enum V3DX(Texture_Data_Formats)`, but using that enum
type in the header requires including `v3dx_pack.h`, which triggers circular
include dependencies issues, so use a `uint32_t` for now.
"fix" the one place that was using the correct enum, because doing so
triggers `-Wenum-int-mismatch` in GCC 13 as the function declaration
doesn't match the function definition.
Reported-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Eric Engestrom <eric@igalia.com>
Acked-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22739>
So far we were only considering the number of vertices to draw to
compute the offset in a stream output buffer.
But this is not correct, as it depends on the primitive type too. For
instance, with 4 vertices, if we use a triangle strip primitive, then 2
triangles are generated from those 4 vertices, so 6 vertices will be
captured.
This fixes spec@!opengl es
3.0@gles-3.0-transform-feedback-uniform-buffer-object.
CC: 23.1
Reviewed-by: Emma Anholt <emma@anholt.net>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22607>
Depth compare function must be set to the configured one only when
compare mode is enabled; otherwise it must be configured to never.
v2 (Eric):
- Handle V3D < 4.0 case
CC: mesa-stable
Reviewed-by: Eric Engestrom <eric@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22470>
All drivers should now be using the appropriate NIR lowering, so we can
drop this pile of code.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22083>
In V3D we were doing this incorrectly by peeking into the sampler state
unconditionally, which is not correct if the TMU operations don't use
sampler state at all (like PBOs). This was causing us to fail the second
test in this sequence when both tests run back back to back in the same
process:
dEQP-GLES3.functional.texture.shadow.2d.linear.greater_or_equal_depth_component32f
dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rg32f_cube
Here, the first test would setup sampler state for shadow comparisons and
the second test would setup a PBO upload, which would incorrectly pick
up the sampler state to decide about the TMU output size for the PBO
operation.
In V3DV we were doing this right looking through each texture/sampler
instruction and checking if they all involved shadow comparisons or had
relaxed precission, defaulting to 32-bit otherwise.
This special-casing for shadow comparisons also leaks from drivers
into the compiler where we are forced to emit some pieces of sampler
state for 32-bit outputs, so we had to special-case shadow instructions
there as well and we also had a fix for CS textures not having correct
sampler state representing shadow operations too. Finally,
we also had at least a couple of bugs where forcing 32-bit TMU output
through V3D_DEBUG wasn't correctly forcing shadow comparisons to actually
be 32-bit in all the right places, leading to visual bugs with the
option enabled (Sponza being one example of this). This change eliminates
all of these issues.
Finally, the performance improvement observed from special casing shadow
comparison is negligible, and in specific scenarios it can even be
detrimental to performance due to increased register pressure (Sponza with
PCF filtering set to 4 is an example of this again).
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8684
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22284>
When stencil is enabled and it isn't non-op, Early-Z must be disabled.
The condition that checks this for stencil[0] is correct, but the one
for stencil[1] is wrong: it uses an "and" instead of "or" condition.
This affects dEQP-GLES3.functional.fragment_ops.interaction.basic_shader.14
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22081>
The hardware doesn't support native conditional rendering, so it is
implemented by software.
Code borrowed from Freedreno and Panfrost.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17373>
When dealing with multiple Transform Feedback buffers, each of them
needs to have their own offset, so when resuming from one to another we
know exactly were to continue adding primitives.
Fixes "spec@arb_transform_feedback2@change objects while paused (gles3)"
piglit test.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17373>
As the BO storing the results is destroyed after getting the query
results, store the results in case requesting the results again.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17373>
Many of the `V3D_DIRTY_*` flags are above 32 bits, but for now the only
one used here is V3D_DIRTY_SSBO.
`shader->uniform_dirty_bits`, where `dirty` ends up, is already 64 bits.
Fixes: 45bb8f2957 ("broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22019>
Both OpenGL and Vulkan drivers share the same performance counters.
Let's move them to a common place instead of duplicating.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21420>
These are supported, and in fact we are exposing them through
Vulkan. Makes SuperTuxKart significantly faster in GL, I've
observed an FPS increase from ~100% to ~500% depending on the
track.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21361>