Commit graph

215339 commits

Author SHA1 Message Date
Yonggang Luo
5ab8148f23 util: Update os_get_option* comments to match os_set_option
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Antonio Ospite <antonio.ospite@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38640>
2025-11-27 18:22:32 +00:00
Yonggang Luo
2771eb39fd util: Add function os_unset_option/os_set_option for latter use
It's will be used to replace SetEnvironmentVariableA,putenv on windows
and putenv,setenv on non-windows

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Antonio Ospite <antonio.ospite@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38640>
2025-11-27 18:22:32 +00:00
Yonggang Luo
123a66fc43 util,asahi,vulkan,panfrost: Replace the remaining usage of getenv with os_get_option
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Antonio Ospite <antonio.ospite@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38640>
2025-11-27 18:22:32 +00:00
Tapani Pälli
95938823f4 compiler/glsl: validate input blocks with opaque/booleans
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Commit adds a check for booleans/opaque types inside interfaces,
there is existing check for "regular varyings".

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14338
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38613>
2025-11-27 17:40:15 +00:00
Caterina Shablia
a338694c50 panvk: report support for sparseResidencyImage2D
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37483>
2025-11-27 17:05:43 +00:00
Caterina Shablia
5326c45174 panvk/csf: implement sparse image non-opaque binds
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37483>
2025-11-27 17:05:43 +00:00
Caterina Shablia
c87bdde596 panvk: align rows and layers of sparse resident images
When laying out a sparse partially-resident image we need to align
rows of ordered blocks to a mapping granularity in bytes (i.e. the
page size) and array layers to a multiple of sparse block size.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37483>
2025-11-27 17:05:43 +00:00
Caterina Shablia
7421b38521 panvk: sparse partially-resident image -related queries
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37483>
2025-11-27 17:05:43 +00:00
Caterina Shablia
bd9aeeec0a pan/lib: introduce row_align_B and array_align_B constraints
To implement sparse partially-resident images, we need to be able
to express mapping in terms of rectangles of texel blocks.

With row_align_B we can constrain the rows of ordered blocks to
start at mapping boundary (i.e. page size) and using array_align_B
we can ensure that each subresource starts at a multiple of
whatever sparse block size we decide to use.

Not setting each of these fields is the same as setting them to 1.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37483>
2025-11-27 17:05:42 +00:00
Caterina Shablia
dbf20eb49f panvk: move sparse blackhole stuff to panvk_sparse.{c,h}
While we're at it also add the SPDX header to panvk_sparse.c
because I forgor to do that when it was first being added.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37483>
2025-11-27 17:05:42 +00:00
Lionel Landwerlin
515d8f8e3a brw: fix sample mask flag emission
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
It's also used for testing helper invocations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e3328dfa2f ("brw: only initialize sample mask flag if needed")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38699>
2025-11-27 15:59:35 +00:00
Pierre-Eric Pelloux-Prayer
671e943c9b mesa: fix function prototype
Replace void* by GLvoid* and add GLAPIENTRY to match the gl_API.xml
version.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14164
Fixes: ae75b59cb5 ("glthread, tc: Fix buffer release with glthread and tc")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38029>
2025-11-27 16:22:45 +01:00
Leon Perianu
bff723e50c pvr: pvr_pds_fragment_program_create fix allocation callback usage
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The staging buffer is persistent until the destruction of the pvr_pipeline
object, so we should set the allocation scope to PVR_ALLOC_SCOPE_OBJECT instead
of PVR_ALLOC_SCOPE_COMMAND.

Also did the same change in the function pvr_pds_coeff_program_create_and_upload
for the staging buffer, because that buffer is also destroyed at pipeline destruction.

Fixes dEQP-VK.api.object_management.single_alloc_callbacks.graphics_pipeline.

Signed-off-by: Leon Perianu <leon.perianu@imgtec.com>
Reviewed-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Tested-by: Icenowy Zheng <uwu@icenowy.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38662>
2025-11-27 13:18:31 +00:00
Juan A. Suarez Romero
b9b9c676e1 v3d/ci: update expected results
Some failures in OpenCL tests were fixed due commit a643681d.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38694>
2025-11-27 11:59:39 +00:00
Danylo Piliaiev
297c5b5de3 freedreno: Update A7XX_RB_UNKNOWN_8E09 to be in line with blob
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
All A7XX GPUs seem to have A7XX_RB_UNKNOWN_8E09=0x7 according to
blob v819.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38680>
2025-11-27 11:27:03 +00:00
Job Noorman
bcd81c8172 freedreno/computerator: add option to print raw disassembly
It is sometimes useful to see the raw hex values of what instructions
are assembled to, similar to the output of shaders in cffdump. Add an
option for this to computerator.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37595>
2025-11-27 10:27:27 +00:00
Job Noorman
e413615d55 ir3: add ir3_disasm_options struct
We want to add some disassembly options in the future. Add new
ir3_shader_disasm_options function that takes options from a new
ir3_disasm_options struct in which we can add options later. The
original ir3_shader_disasm becomes a wrapper for the new function to not
have to update all call sites now.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37595>
2025-11-27 10:27:27 +00:00
Marek Olšák
166afc592b gallium/hud: don't fclose stdout for GALLIUM_HUD=...,stdout
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This fixes printf doing nothing after the context is destroyed and recreated.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38601>
2025-11-27 03:21:12 +00:00
Yonggang Luo
6356efc4e0 gfxstream: Use os_get_option_dup(VK_DRIVER_FILES)
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
As the return value os_get_option should be immediately consumed.

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38687>
2025-11-27 10:20:52 +08:00
Yonggang Luo
d668c0ad42 gfxstream: Use VK_DRIVER_FILES instead of VK_ICD_FILENAMES as VK_ICD_FILENAMES is deprecated for a while.
This is a prepare for remove VK_ICD_FILENAMES in source tree.

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38687>
2025-11-27 10:20:48 +08:00
Calder Young
09e8a54087 anv: Fix ray query shadow stack buffer size
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38685>
2025-11-26 22:49:52 +00:00
Sagar Ghuge
d8447fd392 vulkan/runtime: Account for pipeline libraries stage count
Don't excludes stages coming from pipeline libraries. This caused valid
group indices referring to library stages to be dropped, leading to
mismatched stage_count.

Fixes: e05a9b77b6 ("vulkan/runtime: split rt shaders hashing from compile")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38669>
2025-11-26 22:17:57 +00:00
Marek Olšák
e47be4f37b st/mesa: call nir_opt_intrinsics slightly later
It makes more progress after nir_lower_atomics_to_ssbo.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38598>
2025-11-26 16:24:06 -05:00
Marek Olšák
2ea30edc70 st/mesa: call nir_opt_intrinsics for the GL_SELECT shader
radeonsi may assert that this pass makes no progress. This is one place
that should call the pass.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38598>
2025-11-26 16:24:04 -05:00
Marek Olšák
eea5959a22 nir/lower_io_passes: call nir_opt_undef to eliminate undef output stores
If we do it here, we won't have to call nir_recompute_io_bases later again.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38598>
2025-11-26 16:23:49 -05:00
Roland Scheidegger
88ae1f8533 llvmpipe: optimize the centroid implementation
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
All things related to selecting the position when no sample is covered
isn't actually dependent on fragment shader loop iteration, in fact
it's not even dependent on the shader invocation, only the sample mask
(which is from jit context, not from shader key, otherwise could just
precalculate all of it). And certainly there's no need for all the extra
per-sample selects.
Just calculate it once at interpolation context init. LLVM should be able
to easily toss out (as with the previous version) all extra code done at
interpolation init if centroid interpolation isn't actually used.
(Although the code didn't turn out as simple as I hoped...)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38664>
2025-11-26 19:12:17 +00:00
Roland Scheidegger
9fb4b1e6dc llvmpipe: implement strict d3d11 rules for centroid interpolation
D3D11 is pretty strict about how to do centroid interpolation.
In particular, llvmpipe didn't honor these rules when no sample was
covered for a pixel (relevant for helper pixels), in this case llvmpipe
selected the position of the sample with the highest index (just due to
initialization, not really by choice).
Given that helper pixels are only really used for derivative calculations,
and derivatives are generally sketchy with centroid interpolation, this
seems quite a lot of work, but I suppose it could be useful if the state
sample mask has only 1 sample set (since these d3d11 rules then guarantee
that even with centroid the derivatives are actually useful as the
interpolation will be done at the position defined by the sample specified
in the sample mask, regardless if that sample is covered by the primitive
or not).
Other APIs might technically not need this (they tend to not even define
at which position centroid interpolation is done, other than it must be
inside the primitive), but it shouldn't really hurt them neither.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38664>
2025-11-26 19:12:17 +00:00
Samuel Pitoiset
930cab7702 radv: fix fbfetch output with ESO
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This fixes a real issue when ESO uses fbfetch output because this
was determined after instead of before.

This solution isn't the most elegant one but binding graphics shaders
earlier would require more work. Let's just handle this specific corner
case for now.

This fixes
dEQP-VK.renderpasses.dynamic_rendering.primary_cmd_buff.custom_resolve.shader_objects.fragment_region*
on some GPUs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38617>
2025-11-26 17:47:07 +00:00
Samuel Pitoiset
6569acbdf2 radv: make sure to reset uses_fbfetch_output for NULL fragment shaders
To prevent useless decompression passes if a previously bound FS was
using fbfetch output.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38617>
2025-11-26 17:47:07 +00:00
Ian Romanick
0c089a5c32 brw: Eliminate duplicate fills
When the register allocator decides to spill a value, all reads of that
value are filled. This can result in cases where the same value is
filled many times in a single block. In those cases, the result of an
earlier fill may still be available when a later fill occurs.

This optimization replaces the later fill with a move from the result of
the earlier fill.

v2: Use FIXED_GRF for register overlap tests. Since this is after
register allocation, the VGRF values will not tell the whole truth.

v3: Use brw_transform_inst. Suggested by Caio. Add
brw_scratch_inst::offset instead of storing it as a source. Suggested by
Lionel.

v4: In intervening spill to the same location also invalidates the
value. 🤦

v5: Don't eliminate a fill if its destination partially overlaps the
preceeding fill destination. Fixes failures in cooperative matrix CTS.

shader-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17249903 -> 17249653 (<.01%)
instructions in affected programs: 35550 -> 35300 (-0.70%)
helped: 20 / HURT: 0

total cycles in shared programs: 893092398 -> 893101836 (<.01%)
cycles in affected programs: 2501720 -> 2511158 (0.38%)
helped: 6 / HURT: 14

total fills in shared programs: 1901 -> 1776 (-6.58%)
fills in affected programs: 1757 -> 1632 (-7.11%)
helped: 20 / HURT: 0

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 929949528 -> 926770338 (-0.34%)
Cycle count: 105126671329 -> 104851299099 (-0.26%); split: -0.28%, +0.02%
Fill count: 6520785 -> 5021518 (-22.99%)

Totals from 54281 (2.69% of 2018922) affected shaders:
Instrs: 239616289 -> 236437099 (-1.33%)
Cycle count: 22051883404 -> 21776511174 (-1.25%); split: -1.33%, +0.08%
Fill count: 6406295 -> 4907028 (-23.40%)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
2025-11-26 17:20:13 +00:00
Ian Romanick
d2e3707ecc brw: Eliminate redundant fills and spills
When the register allocator decides to spill a value, all writes to that
value are spilled and all reads are filled. In regions where there is
not high register pressure, a spill of a value may be followed by a fill
of that same file while the spilled register is still live. This
optimization pass finds these cases, and it converts the fill to a move
from the still-live register.

The restriction that the spill and the fill must have matching NoMask
really hampers this optimization. With the restriction removed, the pass
was more than 2x helpful.

v2: Require force_writemask_all to be the same for the spill and the fill.

v3: Use FIXED_GRF for register overlap tests. Since this is after
register allocation, the VGRF values will not tell the whole truth.

v4: Use brw_transform_inst. Suggested by Caio. The allows two of the
loops to be merged. Add brw_scratch_inst::offset instead of storing it
as a source. Suggested by Lionel.

v5: Add no-fill-opt debug option to disable optimizations. Suggested by
Lionel.

v6: Move a calculation outside a loop. Suggested by Lionel.

v7: Check that spill ranges overlap instead of just checking initial
offset. Zero shaders in fossil-db were affected, but some CTS with
spill_fs were fixed (e.g.,
dEQP-VK.subgroups.arithmetic.compute.subgroupmin_uint64_t_requiredsubgroupsize).
Suggested by Lionel.

v8: Add DEBUG_NO_FILL_OPT to debug_bits in
brw_get_compiler_config_value(). Noticed by Lionel.

shader-db:

Lunar Lake
total instructions in shared programs: 17249907 -> 17249903 (<.01%)
instructions in affected programs: 10684 -> 10680 (-0.04%)
helped: 2 / HURT: 0

total cycles in shared programs: 893092630 -> 893092398 (<.01%)
cycles in affected programs: 237320 -> 237088 (-0.10%)
helped: 2 / HURT: 0

total fills in shared programs: 1903 -> 1901 (-0.11%)
fills in affected programs: 110 -> 108 (-1.82%)
helped: 2 / HURT: 0

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19968898 -> 19968778 (<.01%)
instructions in affected programs: 33020 -> 32900 (-0.36%)
helped: 10 / HURT: 0

total cycles in shared programs: 885157211 -> 884925015 (-0.03%)
cycles in affected programs: 39944544 -> 39712348 (-0.58%)
helped: 8 / HURT: 2

total fills in shared programs: 4454 -> 4394 (-1.35%)
fills in affected programs: 2678 -> 2618 (-2.24%)
helped: 10 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 930445228 -> 929949528 (-0.05%)
Cycle count: 105195579417 -> 105126671329 (-0.07%); split: -0.07%, +0.00%
Spill count: 3495279 -> 3494400 (-0.03%)
Fill count: 6767063 -> 6520785 (-3.64%)

Totals from 43844 (2.17% of 2018922) affected shaders:
Instrs: 212614840 -> 212119140 (-0.23%)
Cycle count: 19151130510 -> 19082222422 (-0.36%); split: -0.39%, +0.03%
Spill count: 2831100 -> 2830221 (-0.03%)
Fill count: 6128316 -> 5882038 (-4.02%)

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 1001375893 -> 1001113407 (-0.03%)
Cycle count: 92746180943 -> 92679877883 (-0.07%); split: -0.08%, +0.01%
Spill count: 3729157 -> 3728585 (-0.02%)
Fill count: 6697296 -> 6566874 (-1.95%)

Totals from 35062 (1.53% of 2284674) affected shaders:
Instrs: 179819265 -> 179556779 (-0.15%)
Cycle count: 18111194752 -> 18044891692 (-0.37%); split: -0.41%, +0.04%
Spill count: 2453752 -> 2453180 (-0.02%)
Fill count: 5279259 -> 5148837 (-2.47%)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
2025-11-26 17:20:13 +00:00
Ian Romanick
b7f5285ad3 brw: Add fill and spill opcodes for LSC platforms
These opcodes are emitted during register allocation instead of the
scratch reads and writes that were previously emitted. These
instructions contain additional information (i.e., the instruction
encodes the scratch offset) that enable optimizations to be added
later.

The fill and spill opcodes are lowered to scratch reads and writes
shortly after register allocation. Eventually this lower may have some
optimizations (e.g., reuse previous address calculations for successive
spills).

v2: Add brw_scratch_inst::offset instead of storing it as a
source. Suggested by Lionel.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
2025-11-26 17:20:12 +00:00
Ian Romanick
2215003d95 brw: Add OPT macro to brw_shader.cpp like brw_opt.cpp
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
2025-11-26 17:20:11 +00:00
Ian Romanick
1f42ff530c brw: Return the new register from brw_lower_vgrf_to_fixed_grf
...and make the function public.

v2: s/struct brw_reg/brw_reg/. Suggested by Lionel.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
2025-11-26 17:20:11 +00:00
Ian Romanick
243a3a4ca7 brw: Don't pass compressed to brw_lower_vgrf_to_fixed_grf
The parameter is never used. It's recalculated in the function.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
2025-11-26 17:20:10 +00:00
Ian Romanick
1fc2f52d36 brw: Force allow_spilling when spill_all is set
This ensures that g0 is reserved for spilling since there is going to be
spilling.

Fixes: 8bca7e520c ("intel/brw: Only force g0's liveness to be the whole program if spilling")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
2025-11-26 17:20:09 +00:00
Ian Romanick
042417a72e brw: Don't spill_all on internal shaders
Basically all of the internal shaders (e.g., from blorp) will fail
assertions if there is any scratch space used.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
2025-11-26 17:20:09 +00:00
Alyssa Rosenzweig
e3328dfa2f brw: only initialize sample mask flag if needed
This is a refinement of 7c129d9365 ("intel/brw/xe2+: Keep PS sample mask in the
f1.0 register whether or not kill is used."). Rather than always insert this
move, do so only when we'll actually read the register: for memory writes and
for discards. This deletes an instruction from piles of fragment shaders.

shader-db on LNL:

total instructions in shared programs: 17134031 -> 17042706 (-0.53%)
instructions in affected programs: 9065743 -> 8974418 (-1.01%)
helped: 65045
HURT: 0
helped stats (abs) min: 1.0 max: 3.0 x̄: 1.40 x̃: 1
helped stats (rel) min: <.01% max: 50.00% x̄: 3.06% x̃: 1.64%
95% mean confidence interval for instructions value: -1.41 -1.40
95% mean confidence interval for instructions %-change: -3.10% -3.03%
Instructions are helped.

total cycles in shared programs: 885172098 -> 884835306 (-0.04%)
cycles in affected programs: 590294230 -> 589957438 (-0.06%)
helped: 53636
HURT: 4500
helped stats (abs) min: 2.0 max: 1126.0 x̄: 8.02 x̃: 4
helped stats (rel) min: <.01% max: 50.00% x̄: 1.24% x̃: 0.24%
HURT stats (abs)   min: 2.0 max: 7706.0 x̄: 20.77 x̃: 6
HURT stats (rel)   min: <.01% max: 82.06% x̄: 1.09% x̃: 0.54%
95% mean confidence interval for cycles value: -6.15 -5.43
95% mean confidence interval for cycles %-change: -1.10% -1.02%
Cycles are helped.

LOST:   385
GAINED: 47

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38665>
2025-11-26 16:53:36 +00:00
Connor Abbott
aa9435f5d1 tu: Set 8E09 once
This was set the same for GMEM and sysmem render passes. Set it in the
beginning instead. Following the blob, only set it for BR.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38581>
2025-11-26 16:30:34 +00:00
Connor Abbott
76c5fb50ac tu: Set GRAS_MODE_CNTL once
Don't set it before the render pass, that's unnecessary. In the future
we may want to move this to the FS state object, as the blob does, but
for now don't set it unnecessarily.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38581>
2025-11-26 16:30:34 +00:00
Connor Abbott
d63581a246 tu: Stop setting GRAS_LRZ_CB_CNTL before GMEM render passes
This register is now written later when setting up the LRZ image.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38581>
2025-11-26 16:30:31 +00:00
Connor Abbott
e173c3d8f3 tu: Stop setting RB_CCU_DBG_ECO_CNTL to 0 for GMEM passes
I can't find any evidence that the blob ever did this on a740 or a750.
Doing this breaks subsequent sysmem render passes and would force an
otherwise-unnecessary WFI with custom_resolve.

Fixes: 062e90f19b ("freedreno: Move RB_CCU_DBG_ECO_CNTL to raw_magic_regs")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38581>
2025-11-26 16:30:26 +00:00
Lionel Landwerlin
5324712952 anv: remove errors on format queries
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
It's pretty spammy and since the whole purpose of queries is to report
support, why bother with errors?

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38661>
2025-11-26 16:06:57 +00:00
David Rosca
15e02eb6ab frontends/va: Use util_dynarray for decode slice data buffers
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38241>
2025-11-26 15:49:59 +00:00
Corentin Noël
3b086706fe ci: Uprev crosvm and virglrenderer
Update to their latest commit on time.

Signed-off-by: Corentin Noël <corentin.noel@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38570>
2025-11-26 15:04:25 +00:00
Dmitry Osipenko
25881c701a virgl: Support new resource-layout command
Support new vrend command that queries layout of a backing GBM buffer
for a giver vrend resource. Use it for querying stride/modifier of a
PIPE_SHARED resource, passing this info down to WSI for exported resources.
Now venus is able to import vrend resources, making gamescope work in KMS
mode on QEMU. Virgl doesn't use stride/modifier info of winsys when it
imports classic vrend resources, hence this change only affects venus
context when it imports virgl WSI buffers.

Based on initial version of resource-layout command from Daniel Stone.

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Yiwei Zhang <zzyiwei@gmail.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37646>
2025-11-26 14:43:12 +00:00
Dmitry Osipenko
29b64d6636 virgl: Implement resource_create_with_modifiers
The .resource_create_with_modifiers() callback became required after
7d1a32fafd for venus to work in KMS mode. This fixes GBM buffer
allocation failure for vkmark-kms and fixes implicit modifier not
working on host when using Intel i915 driver for running Steam with
gamescope-kms on guest. Note that KMS support for venus on QEMU never
worked before, hence this is not regression fix.

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37646>
2025-11-26 14:43:12 +00:00
Karol Herbst
d06aff2243 nak/cmat: use movm
Sadly I don't see an obvious way to use it for int8 matrices, therefore
the code is a bit of a mess right now.

It allows us to vectorize load/stores more often as we can simply
transpose row/col major matrices when needed.

And the movm optimization is also only enabled for 16 bit types, even
though we _could_ do it for 32 bit. It's not clear yet if using it for 32
bit types is an overall advantage or not.

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37998>
2025-11-26 14:09:37 +00:00
Karol Herbst
626c6b35f0 nak: add Movm
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37998>
2025-11-26 14:09:37 +00:00
Karol Herbst
c4f07f3d79 nir: mark cmat_load_shared_nv as CAN_ELIMINATE
It's just a special load shared and has no side effects.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37998>
2025-11-26 14:09:35 +00:00