GuC offers a mechanism for KMD/UMD to provide workload hints and one of
that strategy is low latency hint. We can utilize this hint when the
workload is more latency sensitive like compute usecases.
Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28282>
812b3415 added rules for upcasts with comparisons with a variety of
types. The float & unsigned rules should be ok, but the signed integer rules are
unsound as currently implemented. This can cause end-to-end miscompiles.
I originally hit this issue while debugging a large real world OpenCL kernel. I
found the bug symptoms changed when disabling loop unrolling, which tipped me
off to a compiler bug. I've reduced it to a minimal test case. Imagine my
surprise when I find out the NIR my backend ingested was already constant folded
to be wrong.
In the minimal test case, during optimization we have NIR:
32 %6 = ....
64 %9 = i2i64 %6
64 %44 = load_const (0x0000000000000001)
1 %45 = ilt %9, %44 (0x1)
This is a simple check (int64_t)%6 < 1.
nir_opt_algebraic turns this into:
32 %6 = ...
64 %9 = i2i64 %6
64 %44 = load_const (0x0000000000000001)
64 %55 = load_const (0x0000000080000000 = 2147483648)
1 %56 = ilt %55 (0x80000000), %44 (0x1)
64 %57 = load_const (0x000000007fffffff = 2147483647)
1 %58 = ilt %57 (0x7fffffff), %44 (0x1)
32 %59 = i2i32 %44 (0x1)
1 %60 = ilt %6, %59
1 %61 = ior %58, %60
1 %62 = iand %56, %61
This pile of math constant-folds to an unconditional "false"! The problem is
%56. At first glance, INT32_MIN < 1 is true so %56 should be true. Indeed, it
should. But here's the kicker: both constants are 64-bit here, so the ilt
operation is a 64-bit comparison -- that left-hand side is INT32_MIN
zero-extended to 64-bit for the signed comparison at 64-bit. So in fact, it
evaluates to false, causing the whole expression to go false. If we're going to
do a 64-bit comparison for %56, then we need to sign-extend the bound. So we'll
just adjust the Python and be on our way, right?
Unfortunately the issue is deeper. According to the comment in the generated
nir_opt_algebraic.c file, the guilty algebraic rule is:
('ilt', ('i2i64', 'a@32'), '#b') =>
('iand', ('ilt', -2147483648, 'b'), ('ior', ('ilt', 2147483647, 'b'), ('ilt', 'a', ('i2i32', 'b'))))
From a Python perspective? That rule is correct. -2147483648 < 1 is a true
statement. Adjusting the Python rule is not the appropriate solution here, since
the issue is more fundamental and might affect other rules. The real problem is
the translation of that Python replacement tree into C, incorrectly
zero-extending -2147483648 into 0x0000000080000000 instead of sign-extending to
0xffffffff80000000.
Crawling down the rabbit hole of the generated algebraic file, we see the
constant encoded as:
{ .constant = {
{ nir_search_value_constant, 64 },
nir_type_int, { -0x80000000 /* -2147483648 */ },
} },
NIR correctly translates the negative constant to a C level negate operation of
its absolute value. This maps to the correct sign-extension...
...for all constants except for INT_MIN. Because that constant lacks a ULL
suffix, it is a 32-bit integer. And for this integer (only), negating it hits
signed integer overflow (UB!) and then we end up with an effective
zero-extension when going to 64-bit.
This patch fixes the end-to-end miscompile.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Closes: #11402
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29952>
The shared flag vmw_svga_winsys_surface was used to create shareable surfaces
and these surfaces are not discarded.
Since all surfaces created right now are shareable, there is no need for this
flag except to mark surfaces which should not be discarded. Renaming it to
nodiscard accordingly.
This also simplifies surface creation.
Signed-off-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com>
Reviewed-by: Martin Krastev <martin.krastev@broadcom.com>
Reviewed-by: Ian Forbes <ian.forbes@broadcom.com>
Reviewed-by: Jose Fonseca <jose.fonseca@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29948>
This fixes spec@!opengl 1.0@gl-1.0-polygon-line-aa
spec@!opengl 1.1@clipflat and multiple piglit tests
failures on VGPU9 device
Fixes: 76725452 ("gallium: move vertex stride to CSO")
Reviewed-by: Brian Paul <brian.paul@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29947>
Stop using the option in the generic pass
nir_lower_compute_system_values and use the same code as brw uses for
compute instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29828>
Reorganize the code to make clearer all the lowering cases:
(a) Single invocation workgroup. Index and IDs are all zero.
(b) Local ID provided by hardware.
(c) Local Index provided by the hardware. Depending on the case this
might not be the final local index, e.g. heuristics for tile.
(d) Neither provided by the hardware.
Case (c) is new and supported by Mesh/Task shaders. At the moment the
nir_lower_compute_system_values handle lowering of LocalID for
Task/Mesh, but a later patch will flip that on ANV.
This will make the Task/Mesh use the same lowering as Compute shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29828>
It takes significant amount of time at va context creation, and most
of the time the postproc pipelines are not used anyway.
This reduces total time it takes to run all libva-utils tests on my machine
from 38s to 28s.
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29936>
In case that some drivers might make decision depending on it,
it is better to tell drivers about usage of resource just like
in `blit_to_staging()` and `st_TexSubImage()` etc before going
to blit.
Signed-off-by: Luc Ma <luc@sietium.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29833>
No games are using task shaders with DGC at the moment but this is
supposed to work.
This fixes test_amplification_shader_execute_indirect from vkd3d.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29935>
We need to make sure that the DGC ACE IB will wait for the DGC
prepare shader before the execution starts. When DGC is preprocessed
the synchronization is already correct.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29935>
When the DGC prepare shader is conditionally executed on the graphics
queue, the generated IBs might be uninitialized. It's fine for the
DGC GFX IB because the INDIRECT_PACKET would also be conditionally
skipped but it's not possible to do that for the DGC ACE IB
(ie. no IB2 on compute).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29935>
Looks like some of the tests uses the bias which does not fit into half
float parameter, so it's better to use float param for sample_b.
If we have cube arrays, we anyway combine BIAS and array index properly
so we don't have to worry about the first parameter.
This fixes: GTF-GL46.gtf21.GL3Tests.texture_lod_bias.texture_lod_bias_clamp_m_g_M
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29533>
This has no impact on the generated shaders, but does have a small
(positive) impact on the amount of time spent in shader compilation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29126>
Following https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957,
some Xe2 code paths started triggering asserts.
In the cases fixed by this patch, it was because of the assert added
to brw_type_larger_of() in cf8ed9925f ("intel/brw: Make a helper for
finding the largest of two types"), and then brw_type_larger_of() is
used in 674e89953f. (For example, the assert was triggering when the
SHL types differed between D and UD.)
Fixes: 674e89953f ("intel/brw: Use new builder helpers that allocate a VGRF destination")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29925>