Most of the time, this doesn't matter. On the versions with _sat, if
the destination type is incorrect, the clamping will not happen
correctly.
Fixes the following CTS tests:
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_uu_v4i8_out32
v2: Update anv-tgl-fails.txt.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Fixes: 0f809dbf40 ("intel/compiler: Basic support for DP4A instruction")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15417>
For any fence greater than local scope, always set flush type to at
least invalidate so that fence goes on properly.
v2: Fixup condition to trigger workaround (Lionel)
v3: Simplify workaround (Curro)
v4: Don't drop the existing WA (Curro)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: 22.0 <mesa-stable>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14947>
We only have a single prog_data::total_scratch for all shader variants
(SIMD 8, 16, 32). Therefore we should always max the total_scratch on
top of existing variant.
We probably haven't run into that issue before because we compile by
increasing SIMD size and higher SIMD size is more likely to spill. But
for bindless shaders with return shaders, if the last return part
doesn't spill, we completely ignore the previous parts' scratch
computation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15193>
Also change the code to preserve certain metadata: control flow is not changed
so both block indices and dominance information is preserved.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15206>
If we say that per-primitive attributes are flat (which is communicated by
3DSTATE_SBE.ConstantInterpolationEnable), GPU freaks out and applies it
to other (non-flat) attributes.
Fixes: be89ea3231 ("intel/compiler: Handle per-primitive inputs in FS")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15169>
Bit is flipped compared to all the other packets.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 705395344d ("intel/fs: Add support for compiling bindless shaders with resume shaders")
Fixes: c3ac9afca3 ("anv: Create and return ray-tracing pipeline SBT handles")
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15078>
Commit 38800b38 changed nir_opcodes.py, but that doesn't seem to have
triggered nir_opt_algebraic.py. The change in 75ef5991 depends on
opt_algebraic lowering 16-bit versions of slt, but if opt_algebraic is
not rebuilt, this may not happen. This resulted in some people seeing
assertion failures in, for example,
dEQP-VK.spirv_assembly.instruction.compute.float16.arithmetic_3.step,
due to the backend seeing nir_op_slt that it didn't know how to handle.
v2: Add nir_opcodes.py to nir_algebraic_py so that all the per-driver
algebraic passes pick up the dependency too. Rename it to
nir_algebraic_depends. Suggested by Emma.
Closes: #6047
Fixes: d1992255bb ("meson: Add build Intel "anv" vulkan driver")
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15050>
This bit in the compiler looks like it was added by accident on one of
the latest versions of the original commit, but it clearly doesn't
belong there.
Fixes: 03e1e19246 ("anv: Refactor descriptor copy")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15016>
We really need offsets to be in dwords, not in vec4s.
The bug manifests as random failure of func.mesh.clipdistance.5 crucible
test, where stores to gl_MeshVerticesNV[x].gl_ClipDistance[4+n] actually write to
gl_MeshVerticesNV[x].gl_ClipDistance[1+n].
Fixes: 1f438eb033 ("intel/compiler: Implement Mesh Output")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14997>
This (sort of) matches the behavior of nir_opt_algebraic. This ensures
that subnormal values are properly flushed to zero.
With the aid of "nir/search: Float sources of texture instructions are
float users" and "nir/search: Transitively apply is_only_used_as_float",
there would have been no shader-db regressions on Intel platforms.
However, those caused a significant increase in compile time. Since the
instruction regressions were so small, I just dropped those commits
rather than improve them.
All Haswell and newer platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20125042 -> 20125094 (<.01%)
instructions in affected programs: 7184 -> 7236 (0.72%)
helped: 0
HURT: 32
HURT stats (abs) min: 1 max: 4 x̄: 1.62 x̃: 2
HURT stats (rel) min: 0.11% max: 1.49% x̄: 0.85% x̃: 0.78%
95% mean confidence interval for instructions value: 1.39 1.86
95% mean confidence interval for instructions %-change: 0.74% 0.96%
Instructions are HURT.
total cycles in shared programs: 862745586 -> 862746551 (<.01%)
cycles in affected programs: 109872 -> 110837 (0.88%)
helped: 12
HURT: 23
helped stats (abs) min: 2 max: 774 x̄: 90.83 x̃: 19
helped stats (rel) min: 0.07% max: 25.23% x̄: 3.06% x̃: 0.40%
HURT stats (abs) min: 2 max: 1106 x̄: 89.35 x̃: 12
HURT stats (rel) min: 0.08% max: 45.40% x̄: 3.01% x̃: 0.47%
95% mean confidence interval for cycles value: -60.09 115.23
95% mean confidence interval for cycles %-change: -2.21% 4.07%
Inconclusive result (value mean confidence interval includes 0).
All of the shaders hurt are in either UE4 shooter-game or shooter_demo.
Tiger Lake
Instructions in all programs: 159893213 -> 159893290 (+0.0%)
SENDs in all programs: 6936431 -> 6936431 (+0.0%)
Loops in all programs: 38385 -> 38385 (+0.0%)
Cycles in all programs: 7019259514 -> 7019260087 (+0.0%)
Spills in all programs: 101389 -> 101389 (+0.0%)
Fills in all programs: 131532 -> 131532 (+0.0%)
Ice Lake
Instructions in all programs: 143624164 -> 143624235 (+0.0%)
SENDs in all programs: 6980289 -> 6980289 (+0.0%)
Loops in all programs: 38383 -> 38383 (+0.0%)
Cycles in all programs: 8440082767 -> 8440083238 (+0.0%)
Spills in all programs: 102246 -> 102246 (+0.0%)
Fills in all programs: 131908 -> 131908 (+0.0%)
Skylake
Instructions in all programs: 134185424 -> 134185495 (+0.0%)
SENDs in all programs: 6938790 -> 6938790 (+0.0%)
Loops in all programs: 38356 -> 38356 (+0.0%)
Cycles in all programs: 8222366529 -> 8222366923 (+0.0%)
Spills in all programs: 98821 -> 98821 (+0.0%)
Fills in all programs: 125218 -> 125218 (+0.0%)
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: f5dd6dfe01 ("anv: enable VK_KHR_shader_float_controls and SPV_KHR_float_controls")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
Refactor descriptor copies to use the existing helper functions instead
of rolling our own. In order to facilitate this, we need to store the
appropriate buffer views for the relevant descriptors internally and
reuse them in the helpers.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14909>
v2: Add helper for acceleration->root_node computation (Caio)
v3: Update comment on "done" bit (Caio)
Remove progress bool value for impl function (Caio)
Don't use nir_shader_instructions_pass to search the shader (Caio)
v4: Rename variable for if/else block (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
Since we need to be able to perform ray queries in helper invocations,
we need to have all the helpers properly tag their load/store
operations so that they operate in helper lanes.
v2: Switch from macros to inline functions (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
We'll use this to apply ray tracing operations in our trivial return
shader based on the stage we're in.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
In the future we'll want to reuse this intrinsic to deal with ray
queries. Ray queries will use a different global pointer and
programmatically change the control/level arguments of the trace send
instruction.
v2: Comment on barrier after sync trace instruction (Caio)
Generalize lsc helper (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
We're replacing a generic instruction by an intel specific one, we
need to remove the previous instruction.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c5a42e4010 ("intel/fs: fix shader call lowering pass")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
This index will be used for accessing ray query data in memory.
v2: Drop a MOV (Caio)
v3: Rework back code emission (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
We'll want different types of IDs based on topology. Let's make this
more flexible and also move the bit shifting code a layer above where
it's easier to do bitshifting operations, especially if you need to
stash things into temporary registers.
v2: Keep previous comment.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
This will allow to reuse the same intrinsic for various topology based
ID.
v2: fix intrinsic comment (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>