These instructions let us write directly to the phys regfile, instead of
just R4. That lets us avoid moving out of R4 to avoid conflicting with
other SFU results, and to avoid conflicting with thread switches.
There is still an extra instruction of latency, which is not represented
in the scheduler at the moment. If you use the result before it's ready,
the QPU will just stall, unlike the magic R4 mode where you'd read the
previous value. That means that the following shader-db results aren't
quite representative (since we now cause some stalls instead of emitting
nops), but they're impressive enough that I'm happy with the change.
total instructions in shared programs: 95669 -> 91275 (-4.59%)
instructions in affected programs: 82590 -> 78196 (-5.32%)
Similarly to VC4's implementation, by not picking r0 immediately upon
freeing it, we give the scheduler more of a chance to fit later writes in
earlier. I'm not clear on whether there's any real cost to picking phys
over accumulators, so keep that behavior for now.
shader-db:
total instructions in shared programs: 96831 -> 95669 (-1.20%)
instructions in affected programs: 77254 -> 76092 (-1.50%)
This restriction existed in V3D 2.x, but lifting it was a major change in
3.x.
shader-db results:
total instructions in shared programs: 98117 -> 96831 (-1.31%)
instructions in affected programs: 48520 -> 47234 (-2.65%)
It's actually also a bit safer, since now the compiler will warn if
the string is larger than the `.name` array.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
According to the spec, these should apply to all read/write access
types (so would be equivalent to specifying all other access types
individually). Currently, they were doing nothing.
v2: Handle VK_ACCESS_MEMORY_WRITE_BIT in dstAccessMask.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The IOCTLs doesn't pass this along, so computing them in the first
place is kinda pointless.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Users shouldn't use this debugging option except when we
ask them to do!
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
modules[i] can be NULL for merged shaders but we have to
free the NIR code. radv_can_dump_shader_stats() already handles
if modules[i] is NULL, no need to check it twice.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The first fix attempt contained a nasty typo which somehow didn't get
caught in review. It also didn't work as intended because the sRGB
conversion was happening but then throwing away all but the red channel
because it dind't know it was RGB. Really, it's my fault for trying to
fix a bug without first writing tests. I've now written tests and they
pass with this change. :)
Fixes: 11712b9ca1 "intel/blorp: Fix blits to R8G8B8_UNORM_SRGB"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We've had several broadwell hangs that have come down to this bit just
not working correctly. Most recently, we've had a pile of hangs
reported with apps running under DXVK:
https://github.com/doitsujin/dxvk/issues/469
Instead, use the bit that doesn't try to imply weird D3D coherency
things and just force-enables the PS like we want.
cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We support mipmapped and arrayed linear images so we need to support
vkGetImageSubresourceLayout on them. Fortunately, it's just a trivial
call into ISL.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This allows NIR to CSE more operations. LLVM does this also so the
impact is limited, however doing this in NIR allows other opts to
make progress. For example some loops in Civilization Beyond Earth
shaders are unrolled.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Note that the use of ICMS_INNER_CONSERVATIVE disagrees with the GL driver.
Perhaps it's more performant than ICMS_NORMAL and is otherwise permitted?
Not sure, so I left it as-is.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Allow the capability to be exposed, and convert the new execution mode
into fs state.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This patch applies the necessary changes in Android.common.mk
as per automake rules, to avoid following building error:
external/mesa/src/gallium/drivers/nouveau/nouveau_screen.c:159:8:
error: implicit declaration of function 'disk_cache_get_function_timestamp'
is invalid in C99 [-Werror,-Wimplicit-function-declaration]
if (disk_cache_get_function_timestamp(nouveau_disk_cache_create,
^
1 error generated.
(v2) -DENABLE_SHADER_CACHE Android cflag is kept, to leave the AS-IS capability enabled
Fixes: cc10b34 ("util/disk_cache: Fix disk_cache_get_function_timestamp with disabled cache.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The commit 76dfed8ae2 changed nir_intrinsics.h to be a generated
header, but the corresponding dependency was not updated for Android.
It causes the error:
[ 0% 19/4336] target C: libmesa_pipe_radeonsi <= external/mesa/src/gallium/drivers/radeonsi/si_debug.c
...
In file included from external/mesa/src/gallium/drivers/radeonsi/si_debug.c:25:
In file included from external/mesa/src/gallium/drivers/radeonsi/si_pipe.h:28:
In file included from external/mesa/src/gallium/drivers/radeonsi/si_shader.h:140:
In file included from external/mesa/src/amd/common/ac_llvm_build.h:30:
external/mesa/src/compiler/nir/nir.h:966:10: fatal error: 'nir_intrinsics.h' file not found
^~~~~~~~~~~~~~~~~~
1 error generated.
Fixes: 76dfed8ae2 ("nir: mako all the intrinsics")
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Mauro Rossi <issor.oruam@gmail.com>
There always is a continue block, so let us just do unreachable.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 8cacf38f52 "nir: Do not use continue block after removing it."
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107312
After optimization passes and many trasfromations most of memory
NIR holds is a garbage which was being freed only after shader deletion.
Freeing it at the end of linking will save memory which would be useful
in case there are a lot of complex shaders being compiled.
The common case for this issue is 32bit game running under Wine.
The cost of the optimization is around ~3-5% of compilation speed
with complex shaders.
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Nothing in EGL_KHR_gl_image.txt seems to let us deny creation based on
formats, and doing so causes many failures in
dEQP-EGL.functional.image.api.*
The NONE value we were protecting from only gets looked at in the
__DRI_IMAGE_ATTRIB_FORMAT and __DRI_IMAGE_ATTRIB_FOURCC queries, which are
used from wayland and gbm (which throw an error cleanly on unknown format)
and DMABUF export.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The EGL CTS expects that you can make images from all sorts of things,
including things like z16 and s8, which we don't have DRM fourccs for.
Just return an error when trying to export one of those.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Recreating our context's syncobj with ALREADY_SIGNALED meant that if you
created two fences in a row, then waiting on the second would succeed
immediately. Instead, export a sync file in the gallium fence (since we
don't have a syncobj clone ioctl), and just create a new syncobj to wait
on whenever we need to.
Noticed while debugging
dEQP-GLES3.functional.fence_sync.client_wait_sync_finish
It tends to return >0 in the success case (I think the value is something
like "how much of the timeout remained"). Fixes
dEQP-GLES3.functional.fence_sync.client_wait_sync_finish
When requesting a texture of the internal format GL_RGB32F Gallium will
try to allocate a renderable texture and returns RGBA32F or RGBX32F, but
when one requests GL_RGB32I or GL_RGB32UI the according 3-component
texture will be returned. This leads to problems later, when one wants
to use glCopyImageSubData to copy data between these textures that should
be compatible, but given the way virgl and Gallium handle this the latter
fails with an assertion, because the per-texel bit size is different.
By allowing the GL_RGB32* only for texture buffers these problems are avoided
without losing the ARB_tbo_rgb32 extension (thanks Ilia Mirkin).
v2: Correct spelling (Gurchetan Singh)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
When running gdb, make sure to pass the LD_PRELOAD variable only to
the executed program, not the debugger. Otherwise the debugger will
run the preloaded constructor/destructor too and bad things will
happen.
Suggested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
The problem with passing the configuration of the dump lib through a
file descriptor is that it can be read only once. But under gdb you
might want to rerun your program multiple times.
This change hands the configuration through a temporary file that is
deleted once the command line passes to intel_dump_gpu has exited.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
That shouldn't be needed because the DB state is invalid.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The array index needs to be corrected and it must be insured that it is
rounded and its value is non-negative before it is combined with the
face id.
v5: Use RNDNE instead of ADD 0.5 and FLOOR (Ilia Mirkin)
v6: Fix type (Roland Scheidegger)
Fixes 182 from android/cts/master/gles31-master.txt:
dEQP-GLES31.functional.texture.filtering.cube_array.formats.*
dEQP-GLES31.functional.texture.filtering.cube_array.sizes.*
dEQP-GLES31.functional.texture.filtering.cube_array.combinations.nearest_mipmap_*
dEQP-GLES31.functional.texture.filtering.cube_array.combinations.linear_mipmap_*
dEQP-GLES31.functional.texture.filtering.cube_array.no_edges_visible.*
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Correct the array index for TEXTURE_*1D_ARRAY, and TEXTURE_*2D_ARRAY
The standard says the array index is evaluated according to
floor(z + 0.5)
but RNDNE is sufficient also for the test cases were z is close to 1.5
and it is likely to hit 1.5, the corner case were RNDNE gives a result
different from above formula.
v5: - Use RNDNE instead of ADD 0.5 and FLOOR (Ilia Mirkin)
- update commit message
Fixes 325 tests from android/cts/master/gles3-master.txt:
dEQP-GLES3.functional.shaders.texture_functions.texture.*sampler2darray*
dEQP-GLES3.functional.shaders.texture_functions.textureoffset.*sampler2darray*
dEQP-GLES3.functional.shaders.texture_functions.texturelod.sampler2darray*
dEQP-GLES3.functional.shaders.texture_functions.texturelodoffset.*sampler2darray*
dEQP-GLES3.functional.shaders.texture_functions.texturegrad.*sampler2darray*
dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.*sampler2darray*
dEQP-GLES3.functional.texture.filtering.2d_array.formats.*
dEQP-GLES3.functional.texture.filtering.2d_array.sizes.*
dEQP-GLES3.functional.texture.filtering.2d_array.combinations.*
dEQP-GLES3.functional.texture.shadow.2d_array.*
dEQP-GLES3.functional.texture.vertex.2d_array.*
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Gradients used in texture lookups and the offsets must reside in the
same fetch clause (the first is imposed by the hardware and the second
is expected by sb). In order to ensure that no ALU clause is inserted
between emission and use of these, delay the emission of these
instructions until the texture instruction using them is also emitted.
This is needed in preparation for the correction of the texture array
indices.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
radv always needs it, so just check the header instead. Also
do not declare the function if the variable is not set, so we
get a nice compile error instead of failing to open a device
at runtime.
Fixes: b87ef9e606 "util: fix MSVC build issue in disk_cache.h"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reinserting code directly before a jump means the block gets split
and merged, removing the original block and replacing it in the
process.
Hence keeping a pointer to the continue block over a reinsert
causes issues.
This code changes nir_opt_if to simply look for the new continue
block.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107275
CC: 18.1 <mesa-stable@lists.freedesktop.org>
We already check that in radv_cmd_buffer_resolve_subpass().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The goal is to use radv_barrier()/radv_subpass_barrier() as
much as possible for further optimizations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
total instructions in shared programs : 5480808 -> 5472107 (-0.16%)
total gprs used in shared programs : 647530 -> 647532 (0.00%)
total shared used in shared programs : 389120 -> 389120 (0.00%)
total local used in shared programs : 21064 -> 21064 (0.00%)
total bytes used in shared programs : 58551648 -> 58459352 (-0.16%)
local shared gpr inst bytes
helped 0 0 73 2609 2609
hurt 0 0 71 34 34
An alternative solution to the problem fixed in
0bd83d0 ("nv50/ir: move LateAlgebraicOpt to the very end").
total instructions in shared programs : 5481195 -> 5480808 (-0.01%)
total gprs used in shared programs : 647535 -> 647530 (-0.00%)
total shared used in shared programs : 389120 -> 389120 (0.00%)
total local used in shared programs : 21064 -> 21064 (0.00%)
total bytes used in shared programs : 58555784 -> 58551648 (-0.01%)
local shared gpr inst bytes
helped 0 0 2 34 34
hurt 0 0 0 0 0