Move options were bit or-ing from the wrong enum, causing undefined
behaviour when the number of intrinsics changed.
Replaced it with the values from the right nir_move_options enum that
were previously working. (Further refinement needed on these after
extensive testing.)
Fixes: f1b24267d2 ("pco: rework nir processing and passes")
Signed-off-by: Radu Costas <radu.costas@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40568>
The old calculation depended on the sample count, and gave subpar
results for 8x MSAA with standard sample locations. The new calculation
is based on the Intel pass, with some changing of the constants so that
the sample count is always proportional to alpha for 2xMSAA and 4xMSAA
and the addition of rotating the sample mask based on the pixel.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39335>
Use the valid/input coverage masks for tile buffer store coverage masks
when running single/multi-sampled fragment shaders respectively.
Fixes: 297a0c269a ("pvr, pco: tile buffer support")
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reported-by: Nick Hamilton <nick.hamilton@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40456>
Replace loop with macros
Rewrite channel op to multi channel select to avoid extra swizzle
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Signed-off-by: Radu Costas <radu.costas@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40320>
Add checks for integer coordinates and array indexing HW features
Features require HW support and the PCO_DEBUG env var to contain the
adv_smp entry
Integer coordinates are supported for images and textures without an LOD
setting
Array indexing is not supported and will trigger an abort
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Signed-off-by: Radu Costas <radu.costas@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40139>
Format extraction moved to separate function
Removed some magic numbers
Signed-off-by: Radu Costas <radu.costas@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40067>
Add support for the use of vertex input registers as additional general
purpose registers which previously was restricted to temporary
registers. Use of vertex input registers as additional general purpose
registers is not available for fragment shaders.
Vertex input registers are similar to temporary registers. The only
difference is that vertex input registers can contain pre-initialised
data when the shader starts.
By default, the number of vertex input registers used for register
allocation is the number of vertex input registers used for their
pre-initialised data rounded up to the nearest multiple of 4, as vertex
input registers are allocated in blocks of 4.
If PCO_DEBUG=alloc_extra_vtxins is used, a mimimum of 12 vertex input
registers are available for register allocation.
Signed-off-by: Duncan Brawley <duncan.brawley@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39886>
When sampling a subpass input attachment when multiview is been used
the the view index is required.
This becomes an extra output from the vertex shader which is then
iterated into the fragment shader input where it is then used as the
array layer index when sampling the subpass input attachment.
However this extra output was not having its interpolation mode
configured correctly leading to incorrect instructions being added
causing the view index to always be zero and thus sampling the
subpass input attachment incorrectly.
Fix is to make sure the view index interpolation mode is set to flat.
Fix:
dEQP-VK.multiview.input_attachments.no_queries.1_2_4_8_16_32
dEQP-VK.multiview.input_attachments.no_queries.1_2_4_8
dEQP-VK.multiview.input_attachments.no_queries.15_15_15_15
dEQP-VK.multiview.input_attachments.no_queries.15
dEQP-VK.multiview.input_attachments.no_queries.5_10_5_10
dEQP-VK.multiview.input_attachments.no_queries.8_1_1_8
dEQP-VK.multiview.input_attachments.no_queries.8
dEQP-VK.multiview.input_attachments.no_queries.max_multi_view_view_count
dEQP-VK.multiview.renderpass2.input_attachments.no_queries.1_2_4_8_16_32
dEQP-VK.multiview.renderpass2.input_attachments.no_queries.1_2_4_8
dEQP-VK.multiview.renderpass2.input_attachments.no_queries.15_15_15_15
dEQP-VK.multiview.renderpass2.input_attachments.no_queries.15
dEQP-VK.multiview.renderpass2.input_attachments.no_queries.5_10_5_10
dEQP-VK.multiview.renderpass2.input_attachments.no_queries.8_1_1_8
dEQP-VK.multiview.renderpass2.input_attachments.no_queries.8
dEQP-VK.multiview.renderpass2.input_attachments.no_queries.max_multi_view_view_count
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39715>
The comment claims this was to unroll loops, but nir_opt_loop doesn't do that.
Whatever issue the AGX code was originally working around, it doesn't apply now
(I confirmed we produce similar code with or without the pass). In the meantime,
Panfrost and PowerVR cargo-culted the same broken logic. Drop it all.
Closes: #14732
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39588>
Within the driver buffers are treated as 2D as sampling them as 1D
will run into HW restrictions on max size.
The compiler does the same however for atomic image ops the address
is manually calculated and doing this via the 2D path leads to
incorrect offsets.
The fix is to treat buffers as 1D for atomic ops which calculates
the correct offsets for the operations.
Fix deqp:
dEQP-VK.image.atomic_operations.add.buffer.*
dEQP-VK.image.atomic_operations.and.buffer.*
dEQP-VK.image.atomic_operations.compare_exchange.buffer.*
dEQP-VK.image.atomic_operations.dec.buffer.*
dEQP-VK.image.atomic_operations.exchange.buffer.*
dEQP-VK.image.atomic_operations.inc.buffer.*
dEQP-VK.image.atomic_operations.max.buffer.*
dEQP-VK.image.atomic_operations.min.buffer.*
dEQP-VK.image.atomic_operations.or.buffer.*
dEQP-VK.image.atomic_operations.sub.buffer.*
dEQP-VK.image.atomic_operations.xor.buffer.*
Fixes: 6dc5e1e109 ("pco: fully support Vulkan 1.2 image atomics")
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39521>
Mesa now has a statistics framework. This adds support for emitting
additional statistics about PowerVR shaders for the Rogue architecture.
Add support for emitting the following statistics: Code size, scratch
size, spill count, temp count, loop count, number of inst groups, number
of main inst groups, number of bitwise inst groups and number of control
inst groups.
Add support for new PCO_DEBUG_PRINT option "stats" to emit shader stats.
Signed-off-by: Duncan Brawley <duncan.brawley@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39523>
The skip check should only be checking the format rather than the entire
packed word.
Fixes: 52ddc40a75 ("pco: restrict shadow sampler comparator clamping to unorm formats")
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39428>
Only clamp shadow sampler comparators for "unsigned normalized
fixed-point format[s]" as per the Vulkan spec.
Fixes: 69a56d33de ("pco: Fix for shadow sampler comparison not clamping the compare value")
Reported-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39389>
PCO has global_atomic_{,_swap}pco intrinsics that are different from
generic 2x32 ones, but the 2x32 ones are emitted by
nir_lower_explicit_io.
Add code to lower the generic 2x32 intrinsics to the PCO variant.
Passed Vulkan CTS dEQP-VK.glsl.atomic_operations.* and fixes crash of
dEQP-VK.glsl.atomic_operations.*_reference .
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39054>
When doing sample comparisons on shadow images the compare value
should be clamped to [0,1]
Fix:
dEQP-VK.glsl.texture_functions.texture.samplercubearrayshadow_fragment
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39327>
When calculating the relative offset for a branch the pco_first_igrp
function is used to find the first instruction of a block.
However if the block is empty the function does not return NULL as it
description implies but returns a pointer to the list head which is not a
valid node. Using this leads to a garbage relative offset been calculated
which leads to unexpected behaviour.
Fix is to add a check for the list been empty and return NULL (the same
issue also exists in pco_last_igrp). This leads to the calling function,
pco_cf_node_offset, searching for the next none empty block which is the
expected behaviour.
Fix deqp:
dEQP-VK.graphicsfuzz.cov-two-nested-loops-switch-case-matrix-array-increment
dEQP-VK.graphicsfuzz.stable-binarysearch-tree-false-if-discard-loop
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39287>
The sample mask test instructions were only being added if the sample
mask was an input or no other discard test was being done.
In the failing test cases, instructions to handle alpha to one and/or
alpha to coverage were being added and the program marked as having
a discard test. This would lead to the instructions for the sample
mask test to not be added.
The fix is to specifically check for whether the instructions for sample mask
have been added instead of check for any discard test.
Fix deqp:
dEQP-VK.pipeline.monolithic.extended_dynamic_state.after_pipelines.multi_sample_sample_mask_disable
dEQP-VK.pipeline.monolithic.extended_dynamic_state.after_pipelines.single_sample_sample_mask_disable
dEQP-VK.pipeline.monolithic.extended_dynamic_state.before_draw.large_static_rasterization_samples_off
dEQP-VK.pipeline.monolithic.extended_dynamic_state.before_draw.multi_sample_sample_mask_disable
dEQP-VK.pipeline.monolithic.extended_dynamic_state.before_draw.single_sample_sample_mask_disable
dEQP-VK.pipeline.monolithic.extended_dynamic_state.before_good_static.multi_sample_sample_mask_disable
dEQP-VK.pipeline.monolithic.extended_dynamic_state.before_good_static.single_sample_sample_mask_disable
dEQP-VK.pipeline.monolithic.extended_dynamic_state.between_pipelines.multi_sample_sample_mask_disable
dEQP-VK.pipeline.monolithic.extended_dynamic_state.between_pipelines.single_sample_sample_mask_disable
dEQP-VK.pipeline.monolithic.extended_dynamic_state.cmd_buffer_start.large_static_rasterization_samples_off
dEQP-VK.pipeline.monolithic.extended_dynamic_state.cmd_buffer_start.multi_sample_sample_mask_disable
dEQP-VK.pipeline.monolithic.extended_dynamic_state.cmd_buffer_start.single_sample_sample_mask_disable
dEQP-VK.pipeline.monolithic.extended_dynamic_state.three_draws_dynamic.multi_sample_sample_mask_disable
dEQP-VK.pipeline.monolithic.extended_dynamic_state.three_draws_dynamic.single_sample_sample_mask_disable
dEQP-VK.pipeline.monolithic.extended_dynamic_state.two_draws_dynamic.multi_sample_sample_mask_disable
dEQP-VK.pipeline.monolithic.extended_dynamic_state.two_draws_dynamic.single_sample_sample_mask_disable
dEQP-VK.pipeline.monolithic.extended_dynamic_state.two_draws_static.multi_sample_sample_mask_disable
dEQP-VK.pipeline.monolithic.extended_dynamic_state.two_draws_static.single_sample_sample_mask_disable
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Tested-by: Icenowy Zheng <uwu@icenowy.me>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39040>
Rather than adding another boolean to optionally lower PLS vars, pass
the types we want to lowers through a nir_variable_mode bitmask.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37110>
When we do multiarch, we want to be able to refer to the headers
separately from the sources here, so let's split this dependency in two.
Reviewed-by: Ashish Chauhan <ashish.chauhan@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38423>
We add a bunch of new helpers to avoid the need to touch >parent_instr,
including the full set of:
* nir_def_is_*
* nir_def_as_*_or_null
* nir_def_as_* [assumes the right instr type]
* nir_src_is_*
* nir_src_as_*
* nir_scalar_is_*
* nir_scalar_as_*
Plus nir_def_instr() where there's no more suitable helper.
Also an existing helper is renamed to unify all the names, while we're
churning the tree:
* nir_src_as_alu_instr -> nir_src_as_alu
..and then we port the tree to use the helpers as much as possible, using
nir_def_instr() where that does not work.
Acked-by: Marek Olšák <maraeo@gmail.com>
---
To eliminate nir_def::parent_instr we need to churn the tree anyway, so I'm
taking this opportunity to clean up a lot of NIR patterns.
Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>
On Mali, we need not only clamp but also convert to float16 on Valhall+.
We could have a separate pass for this but it fits in nicely with the
rest of nir_lower_point_size() so we might as well put it there.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38379>
Most of the time, we can infer the type to append in
util_dynarray_append using __typeof__, which is standardized in C23 and
support in Jesse's MSMSVCV. This patch drops the type argument most of
the time, making util_dynarray a little more ergonomic to use.
This is done in four steps.
First, rename util_dynarray_append -> util_dynarray_append_typed
bash -c "find . -type f -exec sed -i -e 's/util_dynarray_append(/util_dynarray_append_typed(/g' \{} \;"
Then, add a new append that infers the type. This is much more ergonomic
for what you want most of the time.
Next, use type-inferred append as much as possible, via Coccinelle
patch (plus manual fixup):
@@
expression dynarray, element;
type type;
@@
-util_dynarray_append_typed(dynarray, type, element);
+util_dynarray_append(dynarray, element);
Finally, hand fixup cases that Coccinelle missed or incorrectly
translated, of which there were several because we can't used the
untyped append with a literal (since the sizeof won't do what you want).
All four steps are squashed to produce a single patch changing every
util_dynarray_append call site in tree to either drop a type parameter
(if possible) or insert a _typed suffix (if we can't infer). As such,
the final patch is best reviewed by hand even though it was
tool-assisted.
No Long Linguine Meals were involved in the making of this patch.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38038>
Introduces an opt pass that attempts to optimize
load_barycentric_at_{sample,offset} with simpler load_barycentric_*
equivalents where possible, and optionally lowers
load_barycentric_at_sample to load_barycentric_at_offset with a position
derived from the sample ID instead.
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37658>
In release builds, the assertion checking whether spilling failed isn't
evaluated, which can result in shader compilation continuing despite this.
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37872>
This is more in line with similar opcodes like umul_32x16.
Also change its const expr: the masking based on bit size was
unnecessary as it is only defined for 32 bits. Use simple casts instead.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37863>
Ensures early algebraic passes aren't called again following late
algebraic passes, so that the latter's opts aren't undone (e.g.
unfusing ffmas).
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37724>
The rounding behaviour on [iu]2f32 ops needs to be explicitly set in
order to match the implicit behaviour described in the
KHR_shader_float_controls properties.
Fixes: e306abc6e6 ("pvr: implement KHR_shader_float_controls")
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37724>
This is so that passes and backends can tell if a coherent load/store is
atomic or not, instead of having to assume it could be either.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
Set the has_f2i32_rtne shader compiler option to indicate hardware
support for it. This enables NIR's late algebraic optimization pass to
generate more efficient code for float-to-int conversions.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37680>