The NIR-to-LLVM pass already does this; now the same fix covers
radeonsi as well.
Fixes various tests of
dEQP-GLES31.functional.texture.filtering.cube_array.combinations.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is the same workaround that radv already applied in commit
3ece76f03d ("radv/ac: gather4 cube workaround integer").
Fixes dEQP-GLES31.functional.texture.gather.basic.cube.rgba8i/ui.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It can't *really* happen since we don't use subroutines.
CID: 1417491
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-By: Gert Wollny <gw.fossdev@gmail.com>
Fixes a regression introduced with b96313c0e1, which removed
BRW_NEW_BLORP for a bunch of SURFACE_STATE setup code, including render
targets, on the basis that blorp invalidates binding tables but not
surface states, however, at least on Broadwell, this caused a regression
in a CTS test, which Ken and Jason tracked down to the fact that we
are not uploading new render target surface states after allocating
new CCS_D surfaces for fast clears (which allocation is deferred until
an actual clear occurs).
The reason this only fails in BDW is that on SKL+ we use CCS_E which
is allocated up front so it exists in the initial surface state, the
problem can be reproduced in these platforms too if we use
INTEL_DEBUG=norcb to force the CCS_D path.
This patch, together with the ones preceding it, fixes the regression
by ensuring that we track and flag as dirty all aux state changes.
Credit goes to Jason and Ken for figuring out the reason for the
regression.
Fixes:
KHR-GL45.transform_feedback.draw_xfb_test
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We want to use this flag to signal changes to the aux surfaces,
so let's not make it about fast clearing only. Suggested by Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Adding gbm_device_get_format_modifier_plane_count made the
test gbm-symbols-check fail, this patch adds the according
function name to the test.
Fixes: 8824141b8d
(gbm: Add a gbm_device_get_format_modifier_plane_count function)
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Jason and I use this for debugging all the time. Recompiling the driver
to enable it is kind of annoying. It's a great thing to try along with
always_flush_batch=true and always_flush_cache=true to detect a class of
problems - namely, atoms listening to an insufficient set of dirty bits.
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: pass llvm context reference instead of a pointer
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Only on GFX9 we implement them as 2D images.
This fixes:
dEQP-VK.image.image_size.1d_array.readonly_12x34
dEQP-VK.image.image_size.1d_array.readonly_1x1
dEQP-VK.image.image_size.1d_array.readonly_32x32
dEQP-VK.image.image_size.1d_array.readonly_7x1
dEQP-VK.image.image_size.1d_array.readonly_writeonly_12x34
dEQP-VK.image.image_size.1d_array.readonly_writeonly_1x1
dEQP-VK.image.image_size.1d_array.readonly_writeonly_32x32
dEQP-VK.image.image_size.1d_array.readonly_writeonly_7x1
dEQP-VK.image.image_size.1d_array.writeonly_12x34
dEQP-VK.image.image_size.1d_array.writeonly_1x1
dEQP-VK.image.image_size.1d_array.writeonly_32x32
dEQP-VK.image.image_size.1d_array.writeonly_7x1
Fixes: 1bcb953e16 "radv: handle GFX9 1D textures"
Reviewed-by: Dave Airlie <airlied@redhat.com>
It's nearly the same so there's no good reason why it can't be in a
common function. The one difference is that _mesa_store_teximage
calls AllocTextureImageBuffer for us, while _mesa_store_texsubimage
doesn't, but we don't need that anyway - intelTexImage already does it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
It is set to false in both callers. It isn't needed for glTexImage
because intelTexImage calls AllocTextureImageBuffer before calling
texsubimage_tiled_memcpy.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
These two paths are basically the same. There's no good reason to have
them in different files.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This fixes a crash on Haswell when we try to upload a stencil texture
with blorp. It would also be a problem if someone tried to texture from
stencil after glBlitFramebuffers.
Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Needed for 32-bit PowerPC.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Fixes: a6a38a038b ("util/u_atomic: provide 64bit atomics where
they're missing")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Platforms without particular atomic operations require the
implementations in u_atomic.c
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Fixes: a6a38a038b ("util/u_atomic: provide 64bit atomics where
they're missing")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Include src/gallium/Automake.inc, correct the build flags accordingly.
Force -std=c++11 (extensively used by the test) as otherwise it gets
defined only when building against llvm >= 3.9.
Fixes: 7be6d8fe12 ("mesa/st: glsl_to_tgsi: add tests for the new
temporary lifetime tracker")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102665
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
Otherwise it will be missing from the tarball, leadin to build failure.
Fixes: d4d777317b ("radv: move shaders related code to radv_shader.c")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
fixes following warning:
warning: format specifies type 'long' but the argument has type 'uint64_t' (aka 'unsigned long long')
cast is needed to avoid this change turning in to another warning:
warning: format specifies type 'unsigned long long' but the argument has type 'uint64_t' (aka 'unsigned long')
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Also, it's useless to set the error code twice. Though, we
should probably skip the next commands when the command buffer
is considered invalid.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Similar to RadeonSI renderer string.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The virgl protocol version of tgsi doesn't handle this yet,
transform it back to the old ways.
Thanks to Nicolai Hähnle <nicolai.haehnle@amd.com>
for also writing nearly the same patch.
Fixes: 41e342d5 tgsi/ureg: always emit constants (and their decls) as 2D
Tested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Otherwise we end up using a 32-bit comparison which didn't end well.
Timothy caught this while playing around with some opt passes.
Fixes: 278580729a (st/glsl_to_tgsi: add support for 64-bit integers)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It's nice to have this information. While we're at it, tweak the
formatting to try and vertically align numbers in the common case.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
We now flush the batch when either the batchbuffer or statebuffer
reaches the original intended batch size, instead of when the sum of
the two reaches a certain size (which makes no sense now that they're
separate buffers).
With this change, we also need to update our "are we near the end?"
estimate to require separate batch and state buffer space. I obtained
these estimates by looking at the size of draw calls in the Unreal 4
Elemental Demo (using INTEL_DEBUG=flush and always_flush_batch=true).
This will significantly impact the size of our batches. I've adjusted
both down to try and be roughly similar to what we had been doing. On
various benchmarks, a 20kB batch and 16kB statebuffer seemed to about
right, but we may need to adjust this further. I tried a 16kB batch,
but that regressed Synmark OglMultithread performance by a fair bit.
32kB for both would have significantly increased our batch sizes.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Now that we can grow the batchbuffer if we absolutely need the extra
space, we don't need to reserve space for the final do-or-die ending
commands.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
We need to set brw->no_batch_wrap to actually avoid flushing in the
middle of our BLORP operation, and instead grow the batchbuffer.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Previously, we would just assert fail and die in this case. The only
safeguard is the "estimated max prim size" checks when starting a draw
(or compute dispatch or BLORP operation)...which are woefully broken.
Growing is fairly straightforward:
1. Allocate a new larger BO.
2. memcpy the existing contents over to the new buffer
3. Set the new BO to the same GTT offset as the old BO. When emitting
relocations, we write the presumed GTT offset of the target BO. If
we changed it, we'd have to update all the existing values (by
walking the relocation list and looking at offsets), which is more
expensive. With the old BO freed, ideally the kernel could simply
place the new BO at that offset anyway.
4. Update the validation list to contain the new BO.
5. Update the relocation list to have the GEM handle for the new BO
(which we can skip if using I915_EXEC_HANDLE_LUT).
v2: Update to handle malloc'd shadow buffers.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Previously, we emitted GPU commands and indirect state into the same
buffer, using a stack/heap like system where we filled in commands from
the start of the buffer, and state from the end of the buffer. We then
flushed before the two met in the middle.
Meeting in the middle is fatal, so you have to be certain that you
reserve the correct amount of space before emitting commands or state
for a draw. Currently, we will assert !no_batch_wrap and die if the
estimate is ever too small. This has been mercifully obscure, but has
happened on a number of occasions, and could in theory happen to any
application that issues a large draw at just the wrong time.
Estimating the amount of batch space required is painful - it's hard to
get right, and getting it right involves a lot of code that would burn
CPU time, and also be painful to maintain. Rolling back to a saved
state and retrying is also painful - failing to save/restore all the
required state will break things, and redoing state emission burns a
lot of CPU. memcpy'ing to a new batch and continuing is painful,
because commands we issue for a draw depend on earlier commands as well
(such as STATE_BASE_ADDRESS, or the GPU being in a pirtacular state).
The best plan is to never run out of space, which is totally doable but
pretty wasteful - a pessimal draw requires a huge amount of space, and
rarely occurs. Instead, we'd like to grow the batch buffer if we need
more space and can't safely flush.
We can't grow with a meet in the middle approach - we'd have to move the
state to the end, which would mean updating every offset from dynamic
state base address. Using separate batch and state buffers, where both
fill starting at the beginning, makes it easy to grow either as needed.
This patch separates the two concepts. We create a separate state
buffer, with a second relocation list, and use that for brw_state_batch.
However, this patch tries to retain the original flushing behavior - it
adds the amount of batch and state space together, as if they were still
co-existing in a single buffer. The hope is to flush at the same time
as before. This is necessary to avoid provoking bugs caused by broken
batch wrap handling (which we'll fix shortly). It also avoids suddenly
increasing the size of the batch (due to state not taking up space),
which could have a significant performance impact. We'll tune it later.
v2:
- Mark the statebuffer with EXEC_OBJECT_CAPTURE when supported (caught
by Chris). Unfortunately, we lose the ability to capture state data
on older kernels.
- Continue to support the malloc'd shadow buffers.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
This will let us access screen->kernel_features in the next patch.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
We'll need to read from both buffers when decoding state.
This also drops the "failed to map" fallback - it's completely useless
on LLC systems where we write directly to the mapped BO. It's not that
useful on non-LLC systems either.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
brw_batch_reloc emits a relocation from the batchbuffer to elsewhere.
brw_state_reloc emits a relocation from the statebuffer to elsewhere.
For now, they do the same thing, but when we actually split the two
buffers, we'll change brw_state_reloc to use the state buffer.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
I'm planning on splitting batch and state into separate buffers, at
which point we'll need two relocation lists. In preparation for that,
this patch refactors the relocation stuff into a structure we can
replicate...which looks a lot like anv_reloc_list.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>