We'll use this to apply ray tracing operations in our trivial return
shader based on the stage we're in.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
In the future we'll want to reuse this intrinsic to deal with ray
queries. Ray queries will use a different global pointer and
programmatically change the control/level arguments of the trace send
instruction.
v2: Comment on barrier after sync trace instruction (Caio)
Generalize lsc helper (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
We're replacing a generic instruction by an intel specific one, we
need to remove the previous instruction.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c5a42e4010 ("intel/fs: fix shader call lowering pass")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
This index will be used for accessing ray query data in memory.
v2: Drop a MOV (Caio)
v3: Rework back code emission (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
We'll want different types of IDs based on topology. Let's make this
more flexible and also move the bit shifting code a layer above where
it's easier to do bitshifting operations, especially if you need to
stash things into temporary registers.
v2: Keep previous comment.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
This will allow to reuse the same intrinsic for various topology based
ID.
v2: fix intrinsic comment (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
This option only applies to relative shuffles (up/down/xor), and in a
moment we're going to add an option to lower normal shuffles, so rename
it.
While we're here, rename lower_shuffle() to lower_to_shuffle() for
similar reasons.
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14412>
Replace load_mesh_global_arg_addr_intel with a more general intrinsic
load_mesh_inline_data_intel, since inline data now hold both
a pointer descriptor information and the first few push constants.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14788>
On platforms where we don't support 64 bit instructions we shouldn't
pass such instructions for the code generator to lower into supported
instructions, because this makes their execution pipeline
unpredictable to the scoreboard lowering pass on XeHP+ platforms.
We really should be reducing all these 64 bit instructions before code
generation, so here we add an assert to help us catch and fix these
cases more easily.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
[ Francisco Jerez: Also allow has_integer_dword_mul. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
This fixes a bug in the CLUSTER_BROADCAST code generation that causes
the original IR region to be ignored, this will be a problem when we
start lowering 64-bit CLUSTER_BROADCAST instructions at the IR level,
since it will lead to instructions with non-trivial regioning.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
One of the two SHUFFLE implementations wasn't taking into account the
destination stride at all, and the other (more commonly used) one was
taking it into account incorrectly since brw_reg::hstride represents
the stride logarithmically, so we need to use a left-shift operator
instead of product. Found by inspection.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
This fixes a bug in the handcrafted SIMD lowering done by the SHUFFLE
code generation, which wasn't taking into account the source and
destination region strides while deciding whether it needs to split an
instruction.
v2: Use new element_sz() helper instead of left shift. (Lionel)
Fixes: 90c9f29518 ("i965/fs: Add support for nir_intrinsic_shuffle")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
This adds some generic infrastructure that allows splitting any
instruction into a number of instructions of a smaller legal execution
type. This is meant to replace several instances of handcrafted 64bit
type lowering done manually in the code generator, which is rather
error-prone, prevents scheduling of the lowered instructions, and
makes them invisible to the SWSB pass on Gfx12+ platforms, which will
become especially problematic on Gfx12.5+ since the EUs introduce
multiple asynchronous execution pipelines which the SWSB pass needs to
be able to synchronize to one another, so it's critical for the real
execution type of the instruction to be visible to the SWSB pass.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
Right now the execution type lowering functionality of this pass
assumes that an integer type of the original bit size is always
acceptable, however we'll want more complex behavior than that in
order to leverage this pass to automate the lowering of unsupported
64-bit operations into multiple 32-bit operations.
In order to do that calculate the closest legal execution type from a
new helper function, and take advantage of that function from the
has_invalid_exec_type() helper, along the lines of other
lower_regioning() helpers structured as a pair of has_invalid_foo() +
required_foo() functions.
This shouldn't have any functional changes.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
Previously the software scoreboard structure would drop previous
dependencies for a given register and replace them with the most
recent one for the same register when a new instruction (or set of
instructions) is processed. This worked correctly on the Gfx12LP
platforms this code was originally designed for, because a repeated
dependency on the same register would either require the second
instruction to synchronize against the first (so the first dependency
could be disregarded from that point on) *or* require the dependency
to be RaR and in-order, which allows the synchronization to be
optimized out (the first dependency could still be disregarded as
well, since the pipeline is in-order). However the latter assumption
will break on upcoming Gfx12HP platforms, because they have multiple
asynchronous FPU pipelines, so whenever we hit a RaR dependency we
need to propagate forward both dependencies, since the order in which
both reads will complete is not guaranteed by the hardware in cases
where they occur from different asynchronous pipelines.
Note that this dependency propagation change requires us to change the
definition of dependency::done as well, since that constant is defined
to discard any previous dependency information when used as argument
for shadow().
This has been reported to fix the following conformance failures on DG2:
KHR-GL46.shaders.uniform_block.random.all_per_block_buffers.19
dEQP-GLES3.functional.shaders.derivate.fwidth.*
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5670
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
It's not obvious why the (gl_FrontFacing ? -1.0 : 1.0) case was handled
different for Gfx12+ than for previous generations, and it's not
correct. It tries to negate the result as an integer, and it does this
before the mask operation that clears the other bits in the value.
When we eventually support dual-SIMD8 dispatch, the other front-facing
bit is in g1.6 at bit 15, so similar code should be possible there.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: c92fb60007 ("intel/fs/gen12: Implement gl_FrontFacing on gen12+.")
Closes: #5876
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14625>
This creates an internal shader_prim enum, I've fixed up most
users to use it instead of GL types.
don't store the enum in shader_info as it changes size, and confuses
other things.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
To avoid dragging gl.h into places it has no business being,
defined tessellation primitive mode to an enum.
This has a lot of fallout all over the place.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
VARYING_SLOT_{VIEWPORT,LAYER,PSIZ} all live in the same VUE header slot,
and the FS is already set up to read the x/y/z/w component of that vec4.
However, we were setting up the SBE to pass each of those items as a
separate FS input, so hypothetically if a shader read all three, we
would burn 3 FS inputs with redundant data. Not only was this passing
extra data to the FS, but it would count as extra input slots for the
"Do we have 16 or fewer attributes?" check for using SBE swizzling to
rearrange them in a convenient manner.
Now we make them share a single FS attribute and only count them once.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14210>
This will be used by crocus and iris to clamp pointsizes only
on the last stage of the shader compile.
Fixes: 3077d96856 ("crocus: Clamp VS point sizes to the HW limits as required.")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14359>
As indicated by
VkPhysicalDeviceFragmentShadingRatePropertiesKHR::fragmentShadingRateWithShaderSampleMask
our implementation will clamp to 1x1 when reading samplemask or
writing to samplemask.
This fixes vkd3d-proton tests test_sample_mask_dxbc & test_sample_mask_dxil
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b6332fc4a8 ("intel/compiler: handle coarse pixel in render target writes descriptors")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14553>
Because 0 is no longer a recognizable value (it's NEVER, which isn't a
good default), we add an emit_alpha_test bool to tell the back-end when
to bother alpha testing. This lets us only touch crocus with the
change.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14157>
The current packed dispatch assumptions for fragment shaders seem to
be the reason that the fs-readFirstInvocation-uint-loop Piglit
test-case for the ARB_shader_ballot extension fails on DG2 in
combination with the patches in this series that enable pixel pipe
hashing (thanks Jordan for reporting the regression). I've confirmed
that the brw_fs_test_dispatch_packing() test fails on DG2 hardware for
fragment shaders, while it succeeds for other shader stages,
indicating that the PSD hardware no longer guarantees packed dispatch.
Disable it.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>
Adreno GPUs has native instruction for unsigned and mixed dot_4x8 but
not signed dot product.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>
When we lower SPIR-V to NIR for textures in vtn_handle_texture, we only
bump the number of coordinate components when the op is not a lod query.
Update the assert to take this into account.
This fixes:
- dEQP-VK.robustness.robustness2.bind.template.r32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.r32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.r32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.r32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.r32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.r32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rg32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rg32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rg32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rg32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rg32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rg32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rgba32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rgba32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rgba32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rgba32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rgba32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rgba32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.r32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.r32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.r32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.r32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.r32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.r32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rg32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rg32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rg32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rg32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rg32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rg32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rgba32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rgba32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rgba32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rgba32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rgba32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rgba32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
Fixes: 231337a1 ("intel/fs/xehp: Assert that the compiler is sending all 3 coords for cubemaps.")
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13925>
Without this change, test_fs_scoreboard.cpp does not compile on GCC 7
due to the use of C99 initializers in a C++ source file.
Fixes: c847bfaaf5 ("intel/fs/gen12: Add tests for scoreboard pass")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14349>