finalize_nir requires that calling it multiple times on the same shader
doesn't break it.
RV530 shader-db:
total instructions in shared programs: 132915 -> 132851 (-0.05%)
instructions in affected programs: 2016 -> 1952 (-3.17%)
helped: 16
HURT: 0
total temps in shared programs: 18238 -> 18232 (-0.03%)
temps in affected programs: 42 -> 36 (-14.29%)
helped: 6
HURT: 0
total cycles in shared programs: 197510 -> 197446 (-0.03%)
cycles in affected programs: 2102 -> 2038 (-3.04%)
helped: 16
HURT: 0
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32160>
No effect in shader-db right now, but without it the next commit
leads to small regression in instruction numbers (0.03%) instead
of the small win we have now (-0.05%).
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32160>
This makes instruction selection simpler and fixes potential issues with
allocated_vec or the optimizer moving SGPR uses out of the loop.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31143>
The VK_STRUCTURE_TYPE_IMPORT_ANDROID_HARDWARE_BUFFER_INFO_ANDROID is handled
by the common code.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32314>
Some MTK display controller drivers support only this AFBC modifier.
Give it a chance to use AFBC for scanout resources.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31948>
When dealing with AFBC render targets using wide blocks, the GPU needs to
keep rendering tiles that are a multiple of 16x16. This is described
as AFBC render block size, and adds extra constraints:
- render target buffers need to be aligned on 16 pixels in the vertical
direction, even if the AFBC super block size is 4 or 8 pixels.
- if the effective tile size is smaller than the render block size, we
should force a clean write and discard+ignore the CRC
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31948>
On all previous GPUs, the effective tile size was limited to 16x16, but
it got increased on v10. Add an helper to query this maximum effective
tile size.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31948>
This allows using the tile size to make decisions not related to the
framebuffer descriptor. Mainly, for the near future, to decide
whether some tiling hierarchy levels should be disabled.
The color buffer allocation size is also calculated at the same time
as it's using common data underneath.
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31948>
This is mostly useful so that we can set the hierarchy level mask
using information from the `pan_fb_info` structure that isn't filled
yet when the tiler descriptor is first allocated.
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31948>
AFBC body is required to be aligned on 128 bytes on v6+ hardware.
Cc: mesa-stable
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31948>
It's an experimental feature that we may enable later.
Instead of exporting NULL primitives, perform a compaction
on primitives after culling.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32290>
Implement two workgroup scans over two boolean values in parallel,
so that they can be done with very minimal ALU overhead.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32290>
In certain cases, the hardware fails to properly process a mipmap level
of these special stencil and depth formats. This happens at width=16.
This change adds a software workaround.
Modifying the corresponding mipmap nblk_x, and the other related
values, could make the tests below to work. Anyway, this method
generates regressions.
This change was tested on palm and cayman and fixes the following tests:
spec/arb_framebuffer_object/framebuffer-blit-levels read stencil: fail pass
spec/arb_depth_buffer_float/fbo-clear-formats stencil/gl_depth32f_stencil8: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31957>
This situation is happening, for instance, when the hardware is
using the type FMT_8_8_8_8 (4 bytes) while the software was
requesting a 3 bytes type. The width should be adjusted to the
expected hardware size; otherwise, the last vertex is lost.
Note: The rv770 didn't behave like this. This is definitely
a hardware change between these gpus.
This change was tested on palm and cayman. Here are the tests fixed:
spec/!opengl 2.0/gl-2.0-vertexattribpointer-size-3: fail pass
deqp-gles2/functional/draw/random/62: fail pass
deqp-gles2/functional/vertex_arrays/single_attribute/strides/buffer_0_32_byte3_vec4_dynamic_draw_quads_1: fail pass
deqp-gles2/functional/vertex_arrays/single_attribute/strides/buffer_0_32_short3_vec4_dynamic_draw_quads_1: fail pass
deqp-gles2/functional/vertex_arrays/single_attribute/strides/buffer_0_32_short3_vec4_dynamic_draw_quads_256: fail pass
deqp-gles3/functional/draw/random/117: fail pass
deqp-gles3/functional/vertex_arrays/single_attribute/strides/byte/buffer_stride32_components3_quads1: fail pass
deqp-gles3/functional/vertex_arrays/single_attribute/strides/short/buffer_stride32_components3_quads1: fail pass
deqp-gles3/functional/vertex_arrays/single_attribute/strides/short/buffer_stride32_components3_quads256: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32184>
This change fixes the evergreen nonconformity issue on non-mipmap
textures when the minification and the magnification are not in
the same state.
This modification disables 5278436d67 when the minification and
the magnification are different. This fixes the nonconformity
without new regressions. Anyway, I was unable to reproduce
the issue described by 5278436d67 on palm and cayman.
This change was tested on cayman and palm. It fixes 84 deqp-gles2
tests and 128 deqp-gles3 tests:
deqp-gles2/functional/texture/filtering/2d/linear_nearest_*
deqp-gles2/functional/texture/filtering/2d/nearest_linear_*
deqp-gles2/functional/texture/filtering/cube/linear_nearest_*
deqp-gles2/functional/texture/filtering/cube/nearest_linear_*
deqp-gles2/functional/texture/vertex/2d/filtering/linear_nearest_*
deqp-gles2/functional/texture/vertex/2d/filtering/nearest_linear_*
deqp-gles2/functional/texture/vertex/cube/filtering/linear_nearest_*
deqp-gles2/functional/texture/vertex/cube/filtering/nearest_linear_*
deqp-gles3/functional/texture/filtering/2d/combinations/linear_nearest_*
deqp-gles3/functional/texture/filtering/2d/combinations/nearest_linear_*
deqp-gles3/functional/texture/filtering/2d_array/combinations/linear_nearest_*
deqp-gles3/functional/texture/filtering/2d_array/combinations/nearest_linear_*
deqp-gles3/functional/texture/filtering/3d/combinations/linear_nearest_*
deqp-gles3/functional/texture/filtering/3d/combinations/nearest_linear_*
deqp-gles3/functional/texture/filtering/cube/combinations/linear_nearest_*
deqp-gles3/functional/texture/filtering/cube/combinations/nearest_linear_*
deqp-gles3/functional/texture/vertex/2d/filtering/linear_nearest_*
deqp-gles3/functional/texture/vertex/2d/filtering/nearest_linear_*
deqp-gles3/functional/texture/vertex/2d_array/filtering/linear_nearest_*
deqp-gles3/functional/texture/vertex/2d_array/filtering/nearest_linear_*
deqp-gles3/functional/texture/vertex/3d/filtering/linear_nearest_*
deqp-gles3/functional/texture/vertex/3d/filtering/nearest_linear_*
deqp-gles3/functional/texture/vertex/cube/filtering/linear_nearest_*
deqp-gles3/functional/texture/vertex/cube/filtering/nearest_linear_*
Fixes: 5278436d67 ("r600: force LOD range to be only one value when mip.min filter is NONE")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32185>
On evergreen depth-stencil textures are allocated as two objects, and
when using the eg_surface_init_1d_miptrees code path the size evaluation
uses the generalized surf_minify function. Here when allocating the
depth texture the alignment takes the depth bpe value into account, and
uses bpe=1 for the stencil texture. As a result the texture pair may
consist of textures with two different nblk_x sizes and this seems to
be a problem with some textures, namely npot and small (width < 32), but
not for mipmapped textures. In the problematic cases, if the so allocated
depth texture is larger than the stencil texture, then the kernel may reject
sent data with an error message like:
evergreen_cs_track_validate_stencil:622 stencil read bo too
small (layer size 131072, offset 524288, max layer 1, bo size 606208)
- because apparently the expected layer size is evaluated from the depth
texture size, but the actual bo size is evaluated based on the true texture
size values. If, on the other hand, the stencil texture is larger than the
depth texture, then the data is send with a wrong alignment, and certain
dEQP-GLES31 tests fail.
In order to obtain equal texture sizes in the problematic cases magnify
the depth texture alignment requirement by its bpe, so that the relative
alignment is the same for depth and stencil texture.
Fixes:
dEQP-GLES31.functional.stencil_texturing.format
.depth32f_stencil8_2d
.depth32f_stencil8_2d_array
.depth24_stencil8_2d
.depth24_stencil8_2d_array
.stencil_index8_2d
.stencil_index8_2d_array
.depth32f_stencil8_draw
.depth24_stencil8_draw
dEQP-GLES31.functional.texture.border_clamp.formats
.stencil_index8.nearest_size_npot
.depth24_stencil8_sample_stencil.nearest_size_npot
.depth32f_stencil8_sample_stencil.nearest_size_npot
dEQP-GLES31.functional.texture.border_clamp.per_axis_wrap_mode.texture_2d
.uint_stencil.nearest.s_clamp_to_edge_t_clamp_to_border_npot
.uint_stencil.nearest.s_repeat_t_clamp_to_border_npot
.uint_stencil.nearest.s_mirrored_repeat_t_clamp_to_border_npot
piglits:
arb_framebuffer_object-depth-stencil-blit *stencil*
framebuffer-blit-levels draw stencil
arb_texture_stencil8/
texwrap formats offset/gl_stencil_index8, npot
texwrap formats/gl_stencil_index8, npot
ext_framebuffer_multisample
accuracy all_samples stencil_resolve small depthstencil
unaligned-blit * stencil downsample
ext_texture_array/fbo-depth-array *stencil
egl_khr_gl_renderbuffer_image-clear-shared-image gl_depth_component24
v2: use util_is_power_of_two_or_zero (Marek)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32169>
This requirement is currently satisfied by the usage in panfrost and
lima.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32084>
Because we are doing perspective division before clipping, small
gl_Position.w values will give Inf for positions and interpolated
varyings. Before this change, primitives containing a vertex with w=0
were invisible.
This is only used in panfrost and lima.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32084>
I don't know of any case of Apple's driver using this, but it seems to work. The
stream link bit is identical to VDM so that was easy, the tricky part was the
return but I bruteforced the encoding space and this is the (only) thing that
worked. So add the XML.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32320>