This is overkill, but hey 4-bits per channel is hardly something to
care about.
(Suggestions welcome for a better version).
Fixes:
dEQP-GLES2.functional.fbo.render.*rgba4*
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12001>
This will let me incrementally fix nir-to-tgsi against virgl without
having to carry around the whole "remove TGSI from mesa/st" MR.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12800>
virgl makes one array of UBOs starting from the first non-CB0 UBO used,
and does dynamic indexing off of that. It requires that the dynamic
indexing be CONST[ADDR[0]+base], rather than having the base be loaded in
addr0.
If we had a nir_intrinsic_base() on load_ubo, this would be easy. As we
don't, emit a subtract at address deref time.
Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12800>
This is used for var->data.sample inputs, which are already declared to be
TGSI_INTERPOLATE_LOC_SAMPLE, so we can just use the interpolated inputs.
Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12800>
Just like outputs, virglrenderer needs its inputs sorted. Should be
harmless for other TGSI producers, and makes the declarations more
readable.
Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12800>
There's no need to reserve the bottom 9 VARYING_SLOT_PATCH*, since
VARYING_SLOT_TEXCOORD won't be mapped there. This helps us match up with
nir_to_tgsi, which wasn't shifting down by 9 for patch.
Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12800>
Recent ncnn benchmarks showed a slowdown, and this change seemed
more likely.
The batching into threads for the main workloads is fine, however
the remainder stuff doesn't get spread out and can bottleneck in
one thread.
Switch to a model where the initial work is batched, but the
remainder is iterated over one by one.
Brings ncnn benchmarks back in line with previously.
Fixes: 69109e0b19 ("llvmpipe/cs: rework thread pool for avoid mtx locking")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13210>
v2 (Jason Ekstrand):
- Switch the order of arguments to be device, image, other stuff
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13199>
This is similar to a patch from Lionel except works in terms of aspects
rather than bindings. This makes it easy to use from the Android code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13199>
No shader-db changes on any Intel platform.
Fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 144380118 -> 143692823 (-0.5%)
SENDs in all programs: 6920822 -> 6920822 (+0.0%)
Loops in all programs: 38299 -> 38299 (+0.0%)
Cycles in all programs: 8434782176 -> 8423078994 (-0.1%)
Spills in all programs: 206830 -> 204469 (-1.1%)
Fills in all programs: 318737 -> 313660 (-1.6%)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12320>
No shader-db or fossil-db changes on any Intel platform.
v2: Keep the flt <-> fge switcharoo local to the SpvOpFUnordLessThan,
etc. handling. Add a comment explaining why the suboptimal
SpvOpFUnordEqual implementation is used here. Suggested by Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12320>
Add derivative opcodes fddx_must_abs_mali/fddy_must_abs_mali satisfying:
fabs(fdd*_must_abs_mali(v)) = fabs(fdd*(v))
The sign of their result is undefined.
On Bifrost and Valhall, these unsigned derivatives can be implemented
more efficiently than the correctly-signed counterparts, since the sign
fixup requires extra ALU instructions. On backends where this is the
case, it is useful to optimize fabs(fdd*(v)) to
fabs(fdd*_must_abs_mali(v)). This pattern comes up with the GLSL builtin
`fwidth`.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12332>
DCC_IND_BLK is not hooked up for this to work in the kernel in any released version, and it's unsafe to do so even if it was because it doesn't check the modifiers.
There's no reason to change the legacy non-modifier path to be more performant at the expense of breaking backwards compatibility with older versions of Mesa.
Fixes: 0f6251b3 ("ac/surface: use DCC compatible with image stores for < 4K resolutions")
Closes: #5422
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13122>
KHR-GL32.packed_pixels.pbo_rectangle.r16i on zink on lavapipe
ends up using a pbo that does an SINT image write. This was producing
truncated rather than clamped values.
Fix the calculations for 8/16-bit signed ints to clamp not truncate.
Fixes: 13e5f331db ("gallivm/nir: fix image store conversions")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13187>
Fixes leaks (release) or assertion failures (debug) on allocating small
scanout resources, when falling through to the non-scanout-specific layout
code, which became more common as of ad50b47a14 ("gbm: assume
USE_SCANOUT in create_with_modifiers").
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13202>
We need to ignore the "sync" in this case, or we'll crash with
"incomplete job" since we never submitted the work. Fixes the Piglit
intel_blackhole-draw_gles2 when run in CI.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13070>
We already set HALF_INTEGER, which is what the compiler actually does.
If we also set PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_INTEGER, we get
incorrect lowering. Only set the CAP we respect.
On Bifrost, this convention is arbitrary. We should consider moving the
Bifrost lowering into NIR to optimize this better...
Fixes Piglit glsl-arb-fragment-coord-conventions.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13070>
The number of varying records we need to reserve in the worst case is
greater than the number of source-level varyings we advertise
(gl_Position, gl_PointSize...)
We advertise MAX_VARYINGS source level varyings, which means anywhere we
manipulate varyings we need up to (MAX_VARYINGS + max non-source level
varyings) records. Add a PAN_MAX_VARYINGS define for this and use it
throughout.
Fixes a buffer overflow in Piglit glsl-max-varyings, which now passes
instead of crashes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13070>
We want to assert that the number of varyings (the count) is at most the
the maximum count. This is <=, not <, with the assertion previously
failing for exactly the maximum.
Fixes: 2c2cf0ecfe ("panfrost: Streamline varying linking code")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13070>