This is done in preparation of compression enablement to avoid allocating a VA
and then delete it later because compressed images use the memory object VA and
not the image plane VA. This guarantees we never put the image in an invalid
state and saves us an alloc and free in case of compressed images.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38702>
This lays the groundwork for enabling compression by adding a way to pass in
whether the image will be compressed or not from NVK to NIL.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38702>
Previously, there was some mixing up of alignments between the alignment
provided by the caller, and the minimum alignment we have (4KiB). Additionally,
there was some redundant aligning being done to data already passed in aligned.
This didn't matter because we were always using 4K pages anyways due to kernel
limitations. However, this now needs fixing to allow for larger page support.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38702>
Previously, for imports we wouldn't carry over the PTE kind with the import,
which worked fine up till now. However, compression depends on the PTE kind
being correct otherwise there will be a mismatch between both sides.
The GEM info object we get from the kernel already has the PTE kind embedded in
the tile flags object, so all we have to do is retrieve it and store in the bo
object, and then the lower layers can retrieve the kind from the bo directly.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38702>
This is so we can enable features needing kernel support based on whether the
detected kernel driver supports them or not by checking for the version in
nvkmd.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38702>
In bccb9fe091 ("nvk/nvkmd: nouveau uses the OS page size"),
the alignment size was narrowed to the OS page size in
nvkmd_nouveau_alloc_tiled_mem. This makes the same change
for nvk_AllocateMemory.
This is being done in preparation for large page support, which will
be more picky about alignments.
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38702>
Currently, GPU state is reset immediately after each flush and during
context creation, even when the next command might be a simple BLT/RS
operation that doesn't require the full GPU rendering pipeline.
This patch introduces lazy GPU state reset by:
- Adding a needs_gpu_state_reset flag to track when reset is needed
- Setting the flag to true after flush instead of immediately resetting
- Only performing the actual reset in etna_draw_vbo() when rendering
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36565>
Move RS_SINGLE_BUFFER from global context initialization to individual
RS operations, enabling it before each operation and disabling it
immediately after. The same pattern is seen in traces from the binary
blob driver.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36565>
There's nothing for the driver to do; it's all handled in spirv_to_nir.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38574>
It's will be used to replace SetEnvironmentVariableA,putenv on windows
and putenv,setenv on non-windows
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Antonio Ospite <antonio.ospite@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38640>
When laying out a sparse partially-resident image we need to align
rows of ordered blocks to a mapping granularity in bytes (i.e. the
page size) and array layers to a multiple of sparse block size.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37483>
To implement sparse partially-resident images, we need to be able
to express mapping in terms of rectangles of texel blocks.
With row_align_B we can constrain the rows of ordered blocks to
start at mapping boundary (i.e. page size) and using array_align_B
we can ensure that each subresource starts at a multiple of
whatever sparse block size we decide to use.
Not setting each of these fields is the same as setting them to 1.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37483>
While we're at it also add the SPDX header to panvk_sparse.c
because I forgor to do that when it was first being added.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37483>
It's also used for testing helper invocations.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e3328dfa2f ("brw: only initialize sample mask flag if needed")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38699>
The staging buffer is persistent until the destruction of the pvr_pipeline
object, so we should set the allocation scope to PVR_ALLOC_SCOPE_OBJECT instead
of PVR_ALLOC_SCOPE_COMMAND.
Also did the same change in the function pvr_pds_coeff_program_create_and_upload
for the staging buffer, because that buffer is also destroyed at pipeline destruction.
Fixes dEQP-VK.api.object_management.single_alloc_callbacks.graphics_pipeline.
Signed-off-by: Leon Perianu <leon.perianu@imgtec.com>
Reviewed-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Tested-by: Icenowy Zheng <uwu@icenowy.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38662>
It is sometimes useful to see the raw hex values of what instructions
are assembled to, similar to the output of shaders in cffdump. Add an
option for this to computerator.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37595>
We want to add some disassembly options in the future. Add new
ir3_shader_disasm_options function that takes options from a new
ir3_disasm_options struct in which we can add options later. The
original ir3_shader_disasm becomes a wrapper for the new function to not
have to update all call sites now.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37595>
Don't excludes stages coming from pipeline libraries. This caused valid
group indices referring to library stages to be dropped, leading to
mismatched stage_count.
Fixes: e05a9b77b6 ("vulkan/runtime: split rt shaders hashing from compile")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38669>
All things related to selecting the position when no sample is covered
isn't actually dependent on fragment shader loop iteration, in fact
it's not even dependent on the shader invocation, only the sample mask
(which is from jit context, not from shader key, otherwise could just
precalculate all of it). And certainly there's no need for all the extra
per-sample selects.
Just calculate it once at interpolation context init. LLVM should be able
to easily toss out (as with the previous version) all extra code done at
interpolation init if centroid interpolation isn't actually used.
(Although the code didn't turn out as simple as I hoped...)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38664>
D3D11 is pretty strict about how to do centroid interpolation.
In particular, llvmpipe didn't honor these rules when no sample was
covered for a pixel (relevant for helper pixels), in this case llvmpipe
selected the position of the sample with the highest index (just due to
initialization, not really by choice).
Given that helper pixels are only really used for derivative calculations,
and derivatives are generally sketchy with centroid interpolation, this
seems quite a lot of work, but I suppose it could be useful if the state
sample mask has only 1 sample set (since these d3d11 rules then guarantee
that even with centroid the derivatives are actually useful as the
interpolation will be done at the position defined by the sample specified
in the sample mask, regardless if that sample is covered by the primitive
or not).
Other APIs might technically not need this (they tend to not even define
at which position centroid interpolation is done, other than it must be
inside the primitive), but it shouldn't really hurt them neither.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38664>
This fixes a real issue when ESO uses fbfetch output because this
was determined after instead of before.
This solution isn't the most elegant one but binding graphics shaders
earlier would require more work. Let's just handle this specific corner
case for now.
This fixes
dEQP-VK.renderpasses.dynamic_rendering.primary_cmd_buff.custom_resolve.shader_objects.fragment_region*
on some GPUs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38617>
When the register allocator decides to spill a value, all reads of that
value are filled. This can result in cases where the same value is
filled many times in a single block. In those cases, the result of an
earlier fill may still be available when a later fill occurs.
This optimization replaces the later fill with a move from the result of
the earlier fill.
v2: Use FIXED_GRF for register overlap tests. Since this is after
register allocation, the VGRF values will not tell the whole truth.
v3: Use brw_transform_inst. Suggested by Caio. Add
brw_scratch_inst::offset instead of storing it as a source. Suggested by
Lionel.
v4: In intervening spill to the same location also invalidates the
value. 🤦
v5: Don't eliminate a fill if its destination partially overlaps the
preceeding fill destination. Fixes failures in cooperative matrix CTS.
shader-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17249903 -> 17249653 (<.01%)
instructions in affected programs: 35550 -> 35300 (-0.70%)
helped: 20 / HURT: 0
total cycles in shared programs: 893092398 -> 893101836 (<.01%)
cycles in affected programs: 2501720 -> 2511158 (0.38%)
helped: 6 / HURT: 14
total fills in shared programs: 1901 -> 1776 (-6.58%)
fills in affected programs: 1757 -> 1632 (-7.11%)
helped: 20 / HURT: 0
fossil-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 929949528 -> 926770338 (-0.34%)
Cycle count: 105126671329 -> 104851299099 (-0.26%); split: -0.28%, +0.02%
Fill count: 6520785 -> 5021518 (-22.99%)
Totals from 54281 (2.69% of 2018922) affected shaders:
Instrs: 239616289 -> 236437099 (-1.33%)
Cycle count: 22051883404 -> 21776511174 (-1.25%); split: -1.33%, +0.08%
Fill count: 6406295 -> 4907028 (-23.40%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
When the register allocator decides to spill a value, all writes to that
value are spilled and all reads are filled. In regions where there is
not high register pressure, a spill of a value may be followed by a fill
of that same file while the spilled register is still live. This
optimization pass finds these cases, and it converts the fill to a move
from the still-live register.
The restriction that the spill and the fill must have matching NoMask
really hampers this optimization. With the restriction removed, the pass
was more than 2x helpful.
v2: Require force_writemask_all to be the same for the spill and the fill.
v3: Use FIXED_GRF for register overlap tests. Since this is after
register allocation, the VGRF values will not tell the whole truth.
v4: Use brw_transform_inst. Suggested by Caio. The allows two of the
loops to be merged. Add brw_scratch_inst::offset instead of storing it
as a source. Suggested by Lionel.
v5: Add no-fill-opt debug option to disable optimizations. Suggested by
Lionel.
v6: Move a calculation outside a loop. Suggested by Lionel.
v7: Check that spill ranges overlap instead of just checking initial
offset. Zero shaders in fossil-db were affected, but some CTS with
spill_fs were fixed (e.g.,
dEQP-VK.subgroups.arithmetic.compute.subgroupmin_uint64_t_requiredsubgroupsize).
Suggested by Lionel.
v8: Add DEBUG_NO_FILL_OPT to debug_bits in
brw_get_compiler_config_value(). Noticed by Lionel.
shader-db:
Lunar Lake
total instructions in shared programs: 17249907 -> 17249903 (<.01%)
instructions in affected programs: 10684 -> 10680 (-0.04%)
helped: 2 / HURT: 0
total cycles in shared programs: 893092630 -> 893092398 (<.01%)
cycles in affected programs: 237320 -> 237088 (-0.10%)
helped: 2 / HURT: 0
total fills in shared programs: 1903 -> 1901 (-0.11%)
fills in affected programs: 110 -> 108 (-1.82%)
helped: 2 / HURT: 0
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19968898 -> 19968778 (<.01%)
instructions in affected programs: 33020 -> 32900 (-0.36%)
helped: 10 / HURT: 0
total cycles in shared programs: 885157211 -> 884925015 (-0.03%)
cycles in affected programs: 39944544 -> 39712348 (-0.58%)
helped: 8 / HURT: 2
total fills in shared programs: 4454 -> 4394 (-1.35%)
fills in affected programs: 2678 -> 2618 (-2.24%)
helped: 10 / HURT: 0
fossil-db:
Lunar Lake
Totals:
Instrs: 930445228 -> 929949528 (-0.05%)
Cycle count: 105195579417 -> 105126671329 (-0.07%); split: -0.07%, +0.00%
Spill count: 3495279 -> 3494400 (-0.03%)
Fill count: 6767063 -> 6520785 (-3.64%)
Totals from 43844 (2.17% of 2018922) affected shaders:
Instrs: 212614840 -> 212119140 (-0.23%)
Cycle count: 19151130510 -> 19082222422 (-0.36%); split: -0.39%, +0.03%
Spill count: 2831100 -> 2830221 (-0.03%)
Fill count: 6128316 -> 5882038 (-4.02%)
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 1001375893 -> 1001113407 (-0.03%)
Cycle count: 92746180943 -> 92679877883 (-0.07%); split: -0.08%, +0.01%
Spill count: 3729157 -> 3728585 (-0.02%)
Fill count: 6697296 -> 6566874 (-1.95%)
Totals from 35062 (1.53% of 2284674) affected shaders:
Instrs: 179819265 -> 179556779 (-0.15%)
Cycle count: 18111194752 -> 18044891692 (-0.37%); split: -0.41%, +0.04%
Spill count: 2453752 -> 2453180 (-0.02%)
Fill count: 5279259 -> 5148837 (-2.47%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>