This is mostly for variables that are only used in asserts and cause
unused-but-set-variable warnings in release builds. Could just use
UNUSED directly, but MAYBE_UNUSED should be less confusing and is
similar to what the Linux kernel has.
And yes __attribute__((unused)) can be used on variables on both GCC 4.2
(oldest supported by mesa) and clang 3.0 (just some random old version,
not sure what's the minimum for mesa).
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
combineLd/St would combine, i.e. :
st u32 # g[$r2+0x0] $r2
st u32 # g[$r2+0x4] $r3
into:
st u64 # g[$r2+0x0] $r2d
But this is only valid if r2 contains an 8 byte aligned address,
which is not guaranteed for compute shaders
This commit checks for src0 dim 0 not being indirect when combining
loads / stores as combining indirect loads / stores may break alignment
rules.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Currently we were two restrictive, and would insert an output move in
cases like: MOV OUT[0], IN[0].xyzw
Loosen the restriction to allow the current instruction to appear in the
neighbor list but only at it's current possition.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Normally the offset in the group would be the same, but not always. For
example, in a sam(w) which only writes the 4th component.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This *seems* like a hw bug, and maybe only applies to certain a4xx
variants/revisions. But setting the SRGB bit in sampler view state
(texconst0) causes invalid alpha for ASTC textures. Work around this
setting up a second texture state and using that to sample alpha
separately.
This way, srgb->linear conversion happens in hw *prior* to
interpolation.
This fixes 546 dEQP tests: dEQP-GLES3.functional.texture.*astc*srgb*
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Allows the build to work when the python3 binary is not "python3".
v2: remove x bit from the script at Emil's suggestion
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When uploading a linear, void-extent, ASTC LDR block on Skylake, we are
required to flush to zero the UNORM16 channel values that would be
denormalized. This is specifically required for the values: 1, 2, and 3.
Fixes the 14 failing tests in:
dEQP-GLES3.functional.texture.compressed.astc.void_extent_ldr.*
v2: Split out flushing function (Kristian Høgsberg)
v3: Map with READ instead of INVALIDATE (Kenneth Graunke)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This is equivalent of 73b01e2711
for blorp.
v2 (Ken): No need to call _mesa_format_has_color_component() now
that the number of components is gotten from
_mesa_base_format_component_count().
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In case blorp needs to configure it will be just as if render or
compute pipeline had configured it.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In the past, BLORP has clobbered all BRW_NEW_* state flags, to trigger
re-emission of the entire 3D pipeline on the next draw. However, there
are some packets BLORP simply leaves alone, so there's no need to
re-emit them. Trying to reduce the set of dirty bits flagged after
BLORP runs is tricky.
Instead, we introduce a BRW_NEW_BLORP flag. This should be set on any
atom which emits a packet that BLORP also emits. When BLORP runs, it
will flag BRW_NEW_BLORP, causing those packets to get re-emitted.
This also makes it easy to avoid re-emitting specific atoms - we can
simply drop the BRW_NEW_BLORP flag on those.
To start, we assume that all packets need to be re-emitted. This is the
safest approach and closest to the existing code's behavior. Many of
these are obviously not required, and can be dropped in subsequent
patches.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is identical to the blorp version which only differs in case
fragment shader isn't used. In that case blorp would reset batch
buffer address to zero.
This is not really needed, and having blorp to use base state
address setup that is compatible with normal upload allows one to
skip resetting it.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're trying to move to more of the new style intrinsics with include
the correct target name, and map directly to ISA instructions.
v2:
- Only do this with LLVM 3.8 and newer.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The range metadata tells LLVM the range of expected values for this intrinsic,
so it can do some additional optimizations on the result.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Although Gen9 samples from most HDR ASTC surfaces of correctly,
there currently are no software workarounds to fix the incorrect
sampling that occurs in others of certain color endpoint modes.
With this change, we are no longer failing the 14 tests from:
dEQP-GLES3.functional.texture.compressed.astc.endpoint_value_hdr_cem_15.*
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Switch boolean template arguments to typename template arguments of type
std::integral_constant<bool, VALUE>.
This allows the template argument unroller to easily be extended to enums.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Part of fixing piglit EXT_framebuffer_multisample/sample-coverage inverted
(there is also a bug with RCL tiled blits)
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
There's no reason we couldn't do non-MSAA full resolution tile buffer
load/stores, but we would have claimed buffer overflow was being
attempted. Nothing does this currently.
This was a bug from the MSAA enabling. Tests for surfaces with
nr_samples==1 instead of 0 (generally GL renderbuffers) would incorrectly
fail out.
Fixes the ARB_framebuffer_sRGB piglit tests other than srgb_conformance.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
I had made the previous blit fix non-MSAA only because I was thinking
about how the hardware infers stride from the RENDERING_CONFIG packet.
However, I'm also inferring the stride for both MSAA src and dst in
vc4_render_cl.c from the width argument in the ioctl.
Fixes 15 EXT_framebuffer_multisample piglit tests.
On Broadwell, I get the following shader-db statistics:
Tessellation Control Shaders:
total instructions in shared programs: 57327 -> 57012 (-0.55%)
instructions in affected programs: 27334 -> 27019 (-1.15%)
helped: 45
HURT: 0
total cycles in shared programs: 265692 -> 255188 (-3.95%)
cycles in affected programs: 263122 -> 252618 (-3.99%)
helped: 184
HURT: 26
Tessellation Evaluation Shaders:
total instructions in shared programs: 23236 -> 23157 (-0.34%)
instructions in affected programs: 2791 -> 2712 (-2.83%)
helped: 27
HURT: 0
total cycles in shared programs: 151858 -> 149704 (-1.42%)
cycles in affected programs: 151858 -> 149704 (-1.42%)
helped: 101
HURT: 114
Geometry Shaders:
Orbital Explorer goes from 6442 -> 6356 instructions.
Two Shadow of Mordor shaders increase by a single instruction.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>