Make sure to cast with the deref type that contains more information
than the returned type because it's valid in SPIR-V.
This fixes dEQP-VK.binding_model.descriptor_heap.graphics.*_vectors and
also the PositiveGpuAV.HeapWithUntypedPointers VVL test.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41469>
We were previously assuming that potentially stale divergence data was
valid. On some paths the register pressure estimator would recalculate
this, but, as is obvious from the results, not always.
v2: Add an assertion in brw_from_nir_emit_impl to ensure we don't end
up in this situation again.
v3: Call nir_divergence_analysis from
brw_nir_lower_deferred_urb_writes. This fixes assertion failures (the
assertion added in v2) in basically every graphics shader. The
altnerative was to call it from brw_compile_vs, brw_compile_gs, and
brw_compile_tes.
shader-db:
All Intel platformms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17050403 -> 17054033 (0.02%)
instructions in affected programs: 296344 -> 299974 (1.22%)
helped: 0 / HURT: 376
total cycles in shared programs: 876063126 -> 875817316 (-0.03%)
cycles in affected programs: 78627328 -> 78381518 (-0.31%)
helped: 91 / HURT: 276
LOST: 1
GAINED: 10
fossil-db:
All Intel platformms had similar results. (Lunar Lake shown)
Totals:
Instrs: 913770429 -> 916075391 (+0.25%); split: -0.00%, +0.26%
CodeSize: 14647414640 -> 14726176320 (+0.54%); split: -0.02%, +0.56%
Cycle count: 102308091527 -> 102290664775 (-0.02%); split: -0.26%, +0.24%
Spill count: 3469632 -> 3469124 (-0.01%); split: -0.08%, +0.07%
Fill count: 5007038 -> 4998674 (-0.17%); split: -0.51%, +0.34%
Max live registers: 192568853 -> 192595355 (+0.01%); split: -0.00%, +0.02%
Max dispatch width: 48713168 -> 48712880 (-0.00%); split: +0.00%, -0.00%
Non SSA regs after NIR: 140252767 -> 140253718 (+0.00%)
Totals from 223099 (11.11% of 2007586) affected shaders:
Instrs: 314077245 -> 316382207 (+0.73%); split: -0.01%, +0.75%
CodeSize: 5335583824 -> 5414345504 (+1.48%); split: -0.06%, +1.54%
Cycle count: 45868025821 -> 45850599069 (-0.04%); split: -0.58%, +0.54%
Spill count: 2062649 -> 2062141 (-0.02%); split: -0.14%, +0.11%
Fill count: 3343019 -> 3334655 (-0.25%); split: -0.76%, +0.51%
Max live registers: 36762498 -> 36789000 (+0.07%); split: -0.02%, +0.09%
Max dispatch width: 5542224 -> 5541936 (-0.01%); split: +0.03%, -0.03%
Non SSA regs after NIR: 43727142 -> 43728093 (+0.00%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> [v1]
Fixes: 1bff4f93ca ("brw: Basic infrastructure to store convergent values as scalars")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41370>
In Gfx9 the enum value was changed to mean SIMD8 double precision, so
drop the old unused enum. At least on Gfx9 there is an extension bit
to set to use the old SIMD4x2 mode, we can recover if we ever need this
in the future.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41457>
FullyCovered will need to know if conservative rasterization is enabled,
so pass it on to the shader.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>
Add a new intrinsic to read the raw shading rate provided to the FS
payload, and lower load_frag_shading_rate in NIR using it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>
We'll need the raw coverage mask provided to the fragment shader in a
future patch.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>
We create driver param instructions once we encounter their first use
and cache them for further uses. This creates problems when the first
use occurs in a block that doesn't dominate all further uses. This was
hit in practice with a driver param that was used both in the preamble
and in the main shader.
Fix this by simply not caching driver params. Since they are simply movs
from const regs, ir3_cp or ir3_cse should clean up most cases of
multiple uses.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 8b0b81339b ("freedreno/ir3: add NIR compiler")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15418
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41484>
Implements shader-based global blending and pre-multiplied alpha support
to YUV compositing, allowing for transparent overlays and alpha-channel
based transparency with RGBA overlays.
Handle pre-multiplied alpha images by un-multiplying the pre-multiplied
alpha colours, to allow for straight-alpha (which is easier to
implement) to be applied.
Thanks nyanmisaka for the help, and for pointing out the difference
between pre-multiplied alpha and straight alpha.
Thanks David Rosca and Benjamin Cheng for improvements to the code and
spotting errors.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/12977
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41090>
Fix typos in the size of proj, and chroma_proj, in the GLSL pseudo-code
comment portion of cs_create_shader.
Thanks Benjamin Cheng <benjamin.cheng@amd.com> for finding it.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41090>
Instead of expecting just 1 address bit to be flipped by 1 coordinate bit,
expect any address bits to be flipped by 1 coordinate bit. If multiple
coordinate bits flip the same address bit, that means all those coordinate
bits are XOR'd.
v2: also print 128bpp
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41431>
It may have been accidentally left in the code.
If there is any doubt about this, then the reason is the same
as accepting screen=NULL in context_create or any other function.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41429>
Sample mask should only be limited to current sample bit when using
sample rate shading, and sample shading flag in multisample state
should be considered.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41283>
Instead of dirtying the root buffer and re-uploading the whole thing for each
draw where a per-draw value like the draw ID is changed, use a smaller
secondary buffer for per-draw data. We can also skip flushing state for every
indiviual batched draw and just flush once for the whole draw command.
This may also be useful in the future for handling how sized index buffers from
maintenance5 and null index buffers from maintenance6 work with robustness2,
allowing us to pass through indexed draw parameters and lower the index buffer
read into the shader with bounds checks.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41399>
`si_init_gfx_screen` already initializes screen state functions, so
avoid doing it twice. This was regressed by d1c57f742e.
Detected by LSan when applications using vaapi exit.
Fixes: d1c57f742e ("radeonsi/gfx: add si_gfx_screen.c")
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: llyyr <llyyr.public@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41442>
List primitives would be handled by geometry unroll as if restart is always
enabled, telling the unroll shader to restart the primitive at the usual
restart index. This would produce invalid geometry for list primitives where
restart is disabled and the restart index is used as a valid index.
Instead, always force the restart index for unrolling to UINT32_MAX when
restart is disabled, and refactor the index promotion logic accordingly.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41333>
simd_ballot/quad_any/quad_all (and probably simd_any/simd_all) appear to
generally be broken within conditional blocks, not just with simd_is_first.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41186>
lower_boolean_reduce only works if the number of components is 1, and even
asserts on this in its prologue. Otherwise, given a boolean vector type, it
may produce output using ballot/vote with a boolean vector input.
Acked-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41186>