This will save us the trouble of faking constant folding for the BVH level and
trace ray control values when we lower this intrinsic in the new backends.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42006>
We don't use anything from that header. We call
nir_format_pack_r9g9b9e5(), which comes from nir_format_convert.h,
which we already include.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41976>
The key is only used inside that file. Make it like we do with the
keys in blorp_clear.c.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41976>
When you use designated initializers, anything that is not explicitly
set is set to zero. When you do something like:
struct blorp_blit_prog_key {
.base = BLORP_BASE_KEY_INIT(BLORP_SHADER_TYPE_BLIT),
.base.shader_pipeline = BLORP_SHADER_PIPELINE_RENDER,
};
the second initialization is the only one that does something: it sets
shader_pipeline to the desired value, and all the other fields in
"base" are set to 0. This is easily verifiable by just examining the
contents of all the blorp keys we initialize this way: name and
shader_type are always zero.
This means that if two blorp shaders of different types have the
same key size, the shader cache could confuse them. Still, I don't
think this is happening in the real world.
Fixes: 22ecb4a10f ("intel/blorp: Support compute for slow clears")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/11690
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41976>
If we fail to compile a Kernel, don't silently fail: call mesa_loge()
so we can at least know it happened. On debug builds, just assert(),
so if they ever happen in CI, we'll know.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41976>
A future patch will add more parameters to fill_inline_param(), so lets reduce
the number of parameters by passing a struct to this function instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41351>
As the push constant size limit is only valid in stages that don't use inline
param I had to add and call stage_has_inline_param() first.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41351>
these must be dynamically uniform but can be GPR. fixes validation on
dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_tessellation_evaluation,
and probably really bugs doing indirect loads in divergent control flow
(when lane 0 is masked off).
no fossil-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42056>
This optimization mostly helped BRW because 3-src instructions can't take
immediates, and BRW can't allocate scalars without wasting an entire GRF unit
per scalar. Jay has a better RA that can pack many scalars into a single GRF
unit, so allocating temporary registers for the immediates is far less likely
to lead to as much spilling as it does on BRW.
SIMD16:
Totals from 1331 (50.28% of 2647) affected shaders:
Instrs: 1665848 -> 1665514 (-0.02%); split: -0.16%, +0.14%
CodeSize: 23192072 -> 23215672 (+0.10%); split: -0.30%, +0.40%
SIMD32:
Totals from 1114 (42.09% of 2647) affected shaders:
Instrs: 1959968 -> 1960548 (+0.03%); split: -0.30%, +0.33%
CodeSize: 28004460 -> 28023468 (+0.07%); split: -0.39%, +0.46%
Number of spill instructions: 31157 -> 31161 (+0.01%); split: -0.01%, +0.03%
Number of fill instructions: 32138 -> 32130 (-0.02%); split: -0.05%, +0.02%
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42056>
The idea here is to eliminate the flag used for the select condition,
not eliminate other flag sources.
Previously, if we had an instruction like:
gpr = SEL <not in flag> 0 <already in flag>
we would process source 0 and try to rewrite_without_flags(). Because
it's not in a flag, we think eliminating flags would be useful, so we
rewrite it. But this only eliminates the source 2 selection flag, not
the source 0 flag. It's valid to do so (but debatably useful).
However, we thought we were done, and skipped the setup that ensures
source 0's value was actually loaded into a flag.
Instead, we should just perform this optimization when processing the
selection flag (source 2). By that point, we will have properly set
up any flags for sources 0 and 1. And if source 2 is not in a flag,
we can decide to rewrite without it. Or, if it's already in a flag,
we can keep it as-is.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42056>
delete_solo_discard was removing unconditional discards in the case
where the entire program had been optimized away. However, we can
do better: unconditional discards in the end block can be removed if
1. All render target writes after the discard have been eliminated
2. No intrinsics with side-effects (e.g. image stores) come after
See
dEQP-VK.fragment_operations.early_fragment.discard_early_fragment_tests_depth
where there's a discard at the end of the program which can be removed.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42056>
opt_unconditional_discards may eliminate all render target stores
due to all pixels being discarded. In that case, it tries to add
one back with a Null RT and no colour/depth/stencil outputs, just
to end the thread. In that case, we don't want to predicate it on
helper invocations - we just need a basic message to end the thread.
In particular, we already lowered nir_intrinsic_is_helper_invocation
so we don't want to emit it again, as nothing would lower it afterwards.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42056>
This covers some drivers which expose KHR_display and EXT_present_timing.
Based on Emma Anholt's work from 2025, rebased on current Mesa 26.2-devel,
tiny compile fixes and docs/features updates by Mario Kleiner.
See MR 38472 for reference of Emma's work, based on Keith's work.
Tested locally on AMD Polaris for radv, Intel Kabylake for anv, and on
Mesa CI's VK-CTS VK_GOOGLE_display_timing test case for AMD radv,
Intel anv, Qualcomm Adreno tu.
Original code of Emma is
Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Update of docs/features.txt + new_features.txt updates is
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
Current size of prev_refs is 8, which just means the size of ref-frames
but needs to be aligned with full size of dpb, which is 9.
Also prev_refs is now indexed by dpb slot and holds the last intra frame
written to that slot.
This fixes visible artifacts on AV1 streams that mix super-res and
non-super-res frames in a hierarchical reference structure.
Closes: mesa/mesa#15503
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41846>
This change splits the algorithm in two steps: first we have the
logical decision of which caches to bypass based on the needs of the
send operation, and then we have the code that picks the caching modes
based on which caches to bypass.
This should make it significantly easier for us to add new workarounds
without the risk of breaking existing cases.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41319>
Instead of having an if ladder followed by another if that overwrites
the previous result, have a single if ladder.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41319>
This is the next - but not final - step into making this function more
organized: split cache_mode into atomic, load and store versions, then
pick the version at the end.
v2: Initialize {load,store}_cache_mode (Sagar).
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41319>
We have code to choose cache_mode before send->sfid is assigned, but
after it we have more code to choose cache_mode that relies on
send->sfid. Move everything to after the selection of send->sfid so
the code to pick cache_mode is all together. I plan to simplify this
futher in the next commits, the goal of this patch is to make the next
diff easier to read.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41319>
Multiview often involves a loop over view indexes, and our output
handling assumes that everything is constant-indexed. Unrolling
the loops takes care of this. (brw already does this.)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
It's mildly tempting to reuse the src0_alpha source for color1 since
the two features should never overlap, but for now we add an extra
optional source.
We require SIMD16 for now as we only have SIMD16 messages. Eventually,
we're likely to want to support SIMD32 with 2x16 sends, but this gets
us going for now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>