Coverity notices we're reading off the end of the array here, which is
true. We also intend to do that because we want to read the next field as
well. Cast to a `void *` to help Coverity out.
CID: 1649593
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38437>
This utilizes the RGBX format faking logic from e8cd7a30 to enable
PIPE_FORMAT_R10G10B10X2_UNORM renderer support using swizzling.
This format is needed for better HDR rendering support in the iris driver, to
support the Proton / Wine DXGI implementation, which requires an RGBA ordered
renderer for its Vulkan implementation. This in turn requires the Wayland
display to support both alpha and opaque formats. The check currently fails,
since only PIPE_FORMAT_R10G10B10A2_UNORM is exposed when Gallium (iris) is
the DRI Wayland renderer.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38616>
No shader-db changes on any Intel platform. Both Iris and Crocus use
st_nir_opts, which calls nir_lower_flrp before brw_nir_optimize. The
call still needs to exist for hasvk, but I don't collect fossil-db data
for hasvk.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12526>
Dynamic descriptors are mapped an array of offsets provided through
vkCmdBindDescriptorSets*() commands.
When pipelines are compiled with independent sets layouts, the
implementation might have to do additional runtime calculation to
figure out what offset in the contiguous array maps to what dynamic
descriptor in the pipeline layout.
For graphics pipelines you can always compute that information when
binding the shaders. There is always a limited amount of shaders (5
max).
For ray tracing pipelines, there could be lots of shaders to process
at every pipeline binding call. Besides there is no interface from the
runtime to the driver to list all the shaders used at the moment.
So do that tracking in the runtime and pass the information down to
the driver through the cmd_set_rt_state() vfunc.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 69a04151db ("vulkan/runtime: add ray tracing pipeline support")
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38678>
Weird that CTS did not catch that ...
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fixes: 11195eb8de ("vulkan: Add KHR_swapchain_maintenance1 promotions.")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38728>
The subslice IDs provided by the SR0.0 EU register are not adjusted to account
for fusing, so the upper bound max_scratch_ids can vary from device to device
depending on what specific slices were fused during manufacturing.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38689>
The last holdouts of the var options are gone so we can just emit the
system values. This is overall simpler as it confines all the sysval to
var logic to nir_lower_sysvals_to_varyings().
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38562>
It's also used for testing helper invocations.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e3328dfa2f ("brw: only initialize sample mask flag if needed")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38699>
When the register allocator decides to spill a value, all reads of that
value are filled. This can result in cases where the same value is
filled many times in a single block. In those cases, the result of an
earlier fill may still be available when a later fill occurs.
This optimization replaces the later fill with a move from the result of
the earlier fill.
v2: Use FIXED_GRF for register overlap tests. Since this is after
register allocation, the VGRF values will not tell the whole truth.
v3: Use brw_transform_inst. Suggested by Caio. Add
brw_scratch_inst::offset instead of storing it as a source. Suggested by
Lionel.
v4: In intervening spill to the same location also invalidates the
value. 🤦
v5: Don't eliminate a fill if its destination partially overlaps the
preceeding fill destination. Fixes failures in cooperative matrix CTS.
shader-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17249903 -> 17249653 (<.01%)
instructions in affected programs: 35550 -> 35300 (-0.70%)
helped: 20 / HURT: 0
total cycles in shared programs: 893092398 -> 893101836 (<.01%)
cycles in affected programs: 2501720 -> 2511158 (0.38%)
helped: 6 / HURT: 14
total fills in shared programs: 1901 -> 1776 (-6.58%)
fills in affected programs: 1757 -> 1632 (-7.11%)
helped: 20 / HURT: 0
fossil-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 929949528 -> 926770338 (-0.34%)
Cycle count: 105126671329 -> 104851299099 (-0.26%); split: -0.28%, +0.02%
Fill count: 6520785 -> 5021518 (-22.99%)
Totals from 54281 (2.69% of 2018922) affected shaders:
Instrs: 239616289 -> 236437099 (-1.33%)
Cycle count: 22051883404 -> 21776511174 (-1.25%); split: -1.33%, +0.08%
Fill count: 6406295 -> 4907028 (-23.40%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
When the register allocator decides to spill a value, all writes to that
value are spilled and all reads are filled. In regions where there is
not high register pressure, a spill of a value may be followed by a fill
of that same file while the spilled register is still live. This
optimization pass finds these cases, and it converts the fill to a move
from the still-live register.
The restriction that the spill and the fill must have matching NoMask
really hampers this optimization. With the restriction removed, the pass
was more than 2x helpful.
v2: Require force_writemask_all to be the same for the spill and the fill.
v3: Use FIXED_GRF for register overlap tests. Since this is after
register allocation, the VGRF values will not tell the whole truth.
v4: Use brw_transform_inst. Suggested by Caio. The allows two of the
loops to be merged. Add brw_scratch_inst::offset instead of storing it
as a source. Suggested by Lionel.
v5: Add no-fill-opt debug option to disable optimizations. Suggested by
Lionel.
v6: Move a calculation outside a loop. Suggested by Lionel.
v7: Check that spill ranges overlap instead of just checking initial
offset. Zero shaders in fossil-db were affected, but some CTS with
spill_fs were fixed (e.g.,
dEQP-VK.subgroups.arithmetic.compute.subgroupmin_uint64_t_requiredsubgroupsize).
Suggested by Lionel.
v8: Add DEBUG_NO_FILL_OPT to debug_bits in
brw_get_compiler_config_value(). Noticed by Lionel.
shader-db:
Lunar Lake
total instructions in shared programs: 17249907 -> 17249903 (<.01%)
instructions in affected programs: 10684 -> 10680 (-0.04%)
helped: 2 / HURT: 0
total cycles in shared programs: 893092630 -> 893092398 (<.01%)
cycles in affected programs: 237320 -> 237088 (-0.10%)
helped: 2 / HURT: 0
total fills in shared programs: 1903 -> 1901 (-0.11%)
fills in affected programs: 110 -> 108 (-1.82%)
helped: 2 / HURT: 0
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19968898 -> 19968778 (<.01%)
instructions in affected programs: 33020 -> 32900 (-0.36%)
helped: 10 / HURT: 0
total cycles in shared programs: 885157211 -> 884925015 (-0.03%)
cycles in affected programs: 39944544 -> 39712348 (-0.58%)
helped: 8 / HURT: 2
total fills in shared programs: 4454 -> 4394 (-1.35%)
fills in affected programs: 2678 -> 2618 (-2.24%)
helped: 10 / HURT: 0
fossil-db:
Lunar Lake
Totals:
Instrs: 930445228 -> 929949528 (-0.05%)
Cycle count: 105195579417 -> 105126671329 (-0.07%); split: -0.07%, +0.00%
Spill count: 3495279 -> 3494400 (-0.03%)
Fill count: 6767063 -> 6520785 (-3.64%)
Totals from 43844 (2.17% of 2018922) affected shaders:
Instrs: 212614840 -> 212119140 (-0.23%)
Cycle count: 19151130510 -> 19082222422 (-0.36%); split: -0.39%, +0.03%
Spill count: 2831100 -> 2830221 (-0.03%)
Fill count: 6128316 -> 5882038 (-4.02%)
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 1001375893 -> 1001113407 (-0.03%)
Cycle count: 92746180943 -> 92679877883 (-0.07%); split: -0.08%, +0.01%
Spill count: 3729157 -> 3728585 (-0.02%)
Fill count: 6697296 -> 6566874 (-1.95%)
Totals from 35062 (1.53% of 2284674) affected shaders:
Instrs: 179819265 -> 179556779 (-0.15%)
Cycle count: 18111194752 -> 18044891692 (-0.37%); split: -0.41%, +0.04%
Spill count: 2453752 -> 2453180 (-0.02%)
Fill count: 5279259 -> 5148837 (-2.47%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
These opcodes are emitted during register allocation instead of the
scratch reads and writes that were previously emitted. These
instructions contain additional information (i.e., the instruction
encodes the scratch offset) that enable optimizations to be added
later.
The fill and spill opcodes are lowered to scratch reads and writes
shortly after register allocation. Eventually this lower may have some
optimizations (e.g., reuse previous address calculations for successive
spills).
v2: Add brw_scratch_inst::offset instead of storing it as a
source. Suggested by Lionel.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
This ensures that g0 is reserved for spilling since there is going to be
spilling.
Fixes: 8bca7e520c ("intel/brw: Only force g0's liveness to be the whole program if spilling")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
It's pretty spammy and since the whole purpose of queries is to report
support, why bother with errors?
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38661>
Otherwise we get a lot of individual x/y/z stores to tesslevels when
we should really just be storing the whole thing at once.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
The newly rewritten remap_tess_levels_legacy will have already lowered
anything it cares about to URB intrinsics. So the generic remapping
pass won't see them, as it operates on generic input/output intrinsics.
This also drops some of the callback boilerplate we needed temporarily.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
This unifies the dynamic (SSO) and fixed (linked together) versions.
We emit piles of NIR as if we were doing the dynamic version, but
replace the tess config field access with constant values. It all
should optimize away back to something reasonable. We lower these
directly to URB read/write intrinsics.
It also rewrites the dynamic version to directly read/write the URB
rather than going through temporaries. The old version was broken
in that tessellation control shader invocations can technically use
the shared output area for cross-invocation data sharing with barriers,
although doing so using the built-in tesslevel patch outputs is very
unlikely.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
We were using this for indirect loads of the shader input thread
payload, but there's no reason we can't use it for constant access
too. In this case we can just MOV from the ATTR file directly
without a special opcode that turns into MOV_INDIRECT later.
We also allow it to load multiple components now. This is useful
for say, returning vec4 pushed inputs. And, we allow it in more
stages than just the fragment stage.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
We're going to change the intrinsic to a load(...) which puts "load" in
the name. Also, it's just more consistent with our usual terminology.
We also rename the corresponding backend opcode so they remain matched.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>