One of the two SHUFFLE implementations wasn't taking into account the
destination stride at all, and the other (more commonly used) one was
taking it into account incorrectly since brw_reg::hstride represents
the stride logarithmically, so we need to use a left-shift operator
instead of product. Found by inspection.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
This fixes a bug in the handcrafted SIMD lowering done by the SHUFFLE
code generation, which wasn't taking into account the source and
destination region strides while deciding whether it needs to split an
instruction.
v2: Use new element_sz() helper instead of left shift. (Lionel)
Fixes: 90c9f29518 ("i965/fs: Add support for nir_intrinsic_shuffle")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
This adds some generic infrastructure that allows splitting any
instruction into a number of instructions of a smaller legal execution
type. This is meant to replace several instances of handcrafted 64bit
type lowering done manually in the code generator, which is rather
error-prone, prevents scheduling of the lowered instructions, and
makes them invisible to the SWSB pass on Gfx12+ platforms, which will
become especially problematic on Gfx12.5+ since the EUs introduce
multiple asynchronous execution pipelines which the SWSB pass needs to
be able to synchronize to one another, so it's critical for the real
execution type of the instruction to be visible to the SWSB pass.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
Right now the execution type lowering functionality of this pass
assumes that an integer type of the original bit size is always
acceptable, however we'll want more complex behavior than that in
order to leverage this pass to automate the lowering of unsupported
64-bit operations into multiple 32-bit operations.
In order to do that calculate the closest legal execution type from a
new helper function, and take advantage of that function from the
has_invalid_exec_type() helper, along the lines of other
lower_regioning() helpers structured as a pair of has_invalid_foo() +
required_foo() functions.
This shouldn't have any functional changes.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
Previously the software scoreboard structure would drop previous
dependencies for a given register and replace them with the most
recent one for the same register when a new instruction (or set of
instructions) is processed. This worked correctly on the Gfx12LP
platforms this code was originally designed for, because a repeated
dependency on the same register would either require the second
instruction to synchronize against the first (so the first dependency
could be disregarded from that point on) *or* require the dependency
to be RaR and in-order, which allows the synchronization to be
optimized out (the first dependency could still be disregarded as
well, since the pipeline is in-order). However the latter assumption
will break on upcoming Gfx12HP platforms, because they have multiple
asynchronous FPU pipelines, so whenever we hit a RaR dependency we
need to propagate forward both dependencies, since the order in which
both reads will complete is not guaranteed by the hardware in cases
where they occur from different asynchronous pipelines.
Note that this dependency propagation change requires us to change the
definition of dependency::done as well, since that constant is defined
to discard any previous dependency information when used as argument
for shadow().
This has been reported to fix the following conformance failures on DG2:
KHR-GL46.shaders.uniform_block.random.all_per_block_buffers.19
dEQP-GLES3.functional.shaders.derivate.fwidth.*
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5670
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
As explained at the header of the lowering:
"Once this pass is done, the color write will either have one
component (for single sample) with packed argb8888, or 4 components
with the per-sample argb8888 result."
So in several cases the lowering was updating the number of
components, so we need to update the writemask too.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14708>
We reserved space for chaining by subtracting 4 words from max_dw, but
then the new alignment code in radv_amdgpu_cs_finalize ended up running
all over that. That resulted in going over buffer size when chaining.
When lucky you'd get a crash, and when unlucky other stuff might happen.
This always adds the 4 words at the end, but initializes with NOP by
default. That way we still adhere to the alignment rules.
Fixes: 1f36f6b83f ("radv/winsys: use same IBs padding as the kernel")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14644>
We can't report conformance for an older major API version and this is
required to pass dEQP-VK.api.driver_properties.conformance_version.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14707>
It seems the vulkan common runtime code exposes VK_EXT_tooling but
doesn't (yet) have a fallback if the backend doesn't enable this
extension. Implement it as a no-op for a temporary workaround.
This fixes crashes with dEQP-VK.api.tooling_info.*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14707>
To fix a mismatch with
dEQP-VK.api.info.get_physical_device_properties2.features.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14707>
The spec states that descriptor set layouts can be destroyed almost
at any time:
"VkDescriptorSetLayout objects may be accessed by commands that operate
on descriptor sets allocated using that layout, and those descriptor
sets must not be updated with vkUpdateDescriptorSets after the descriptor
set layout has been destroyed. Otherwise, a VkDescriptorSetLayout object
passed as a parameter to create another object is not further accessed
by that object after the duration of the command it is passed into."
Copied mostly from ANV.
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5893
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14622>
We tightened the requirements for multisampling on Gfx7 but didn't
format that at the Vulkan level.
This will break more conformance tests on Gfx7, but we weren't
conformant anyway.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 531b1b7511 ("intel/isl: Strengthen MCS SINT format restriction")
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14679>
Allow to vectorize operations from a smaller bit-size into
scalar operations of a larger bit-size. This allows us to
turn 2x8-bit into a equivalent scalar 16-bit load/store.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
This generalizes the support we added for 16-bit to also handle
8-bit loads via ldunifa. The story is the same: we align the address
to 32-bit downwards and we skip any bytes that are not of interest.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
Just like with 16-bit, this mode only supports scalar access, but
we are already lowering all non 32-bit accesses to scalar.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
Since ldunif is a 32-bit instruction we need to demote these to
UBO loads, like we do for indirect indexing, with the exception
of scalar 16bit uniforms with an offset that is 32-bit aligned.
For the exception where we can use lfdunif we read a 32-bit slot
from memory where the uniform data is in the lower 16-bit and we
will read garbage in the upper 16-bit which we won't use anyway.
It should be noted that by using ldunif, we are consuming
32-bit from the uniform stream, but this is fine because
if there is valid uniform data in the upper 16-bit (i.e.
we had a ivec2 uniform aligned to a 32-bit address), since
we scalarize 16-bit loads, we would see another load uniform
with an unaligned offset for the second component, which we
will demote to UBO.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
These are required by VK_KHR_16bit_storage. Our hardware, however,
doesn't provide any mechanism to decide on the rounding mode of
the conversion and it seems to be using RTE, so we implement
RTZ in software.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
If we know we have a load with a constant offset, then even if it
is not aligned to 32-bit we can still produce an aligned offset
and then skip over the bytes we don't need.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
Even though ldunifa is strictly 32-bit we may be able to use it
to load 16-bit values that sit at 32-bit aligned addresses.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
The vectorization pass can inject 32_2x16 (un)packing opcodes
upon successful vectorization of 16-bit operations into 32-bit
counterparts, so make sure we lower these to something our
backend can handle.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
This allows us to implement 16-bit access on uniform and
storage buffers.
Notice that V3D hardware can only do general access on scalar
16-bit elements, which we currently enforce by running a lowering
pass during shader compile.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
V3D hardware doesn't support vector access for general TMU load/store
operations like the ones we use for UBO and SSBO, so we need to split
these to scalar operations.
It should be noted that we also have a vectorization pass (which runs
later, during optimization), that may reconstruct some of these into
32-bit operations when possible (i.e. when the resulting operation
is 32-bit aligned).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
Program resource queries provide equivalent code, gl_resource_name
introduced by commit dea558cbd2 takes care of ARB_gl_spirv special
case where name information is not available.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14636>
This matches how _mesa_get_program_resourceiv was done and this
makes it possible to skip some validation and shader program lookup
when calling it from glGetProgramiv.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14636>
I noticed the inefficiency in NIR-to-TGSI output while trying to debug a
failure handling some arrays in r600. While this makes reading CTS
shaders easier, the effect in the real world is pretty limited. From
softpipe shader-db:
total instructions in shared programs: 2929840 -> 2929836 (<.01%)
instructions in affected programs: 118 -> 114 (-3.39%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14321>
This allows the exported fds to be mapped for writing. My use case is
for virtio-gpu blob resources where the fds are mapped rw and mappings
are added to the guests using KVM_SET_USER_MEMORY_REGION.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14699>
With upstream mesa PIPE_CAP_IMAGE_STORE_FORMATTED needs to be set to enable
ARB_shader_image_load_store extension. This will reenable GL43 support for svga GL43 capable
device
Fixes: 3b81d2d30d ('mesa/st: do not expose ARB_shader_image_load_store if not fully implemented')
Tested with glretrace
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14688>
I missed updating this code to check res->aux.clear_color_unknown when
I added it a while back. While we're here, also refactor this code into
a helper function - I'll want to use it in another place shortly.
Fixes: e83da2d8e3 ("iris: Don't try to CPU read imported clear color BOs")
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14687>