Based on Rafael's:
* "nir/lower_tex: Add option to lower offset for tg4 too."
* "intel/compiler: Lower offsets for tg4 on gen9+."
* "WIP: Do not lower basic offsets."
* "WIP: intel/compiler: Enable lowering offsets restriction."
But, with these changes:
* Fixed range checking to be signed 4 bits
* Converted to filter
* Apply only to gfx12.5+
* Use nir_src_is_const / nir_src_comp_as_int (s-b Jason)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14142>
We were setting anv_pipeline::sample_shading_enable based on
sampleShadingEnable without looking at minSampleShading. We would then
pass this value into nir_lower_wpos_center which would add sample_pos to
frag_coord. Then the back-end compiler picks up on the existence of
sample_pos and forces persample dispatch. This leads to doing
per-sample dispatch whenever sampleShadingEnable = VK_TRUE regardless of
the value of minSampleShading. This is almost certainly costing us
perf somewhere.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14022>
The alignment and scale down values have changed on this platform.
To support drivers that won't use a CCS surface on this platform, this
patch computes the CCS fast clear rectangle using the main surface.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13555>
According to Bspec 2424, "Render Target Resolve":
The Resolve Rectangle size is same as Clear Rectangle size from SKL+.
Use get_fast_clear_rect in blorp_ccs_resolve for SKL+.
Note that the Bspec differs from Vol7 of the Sky Lake PRM, which only
specifies aligning by the scaledown factors.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13555>
The comment states that 64K alignment of surfaces is required when an
aux map is present on the platform. However, the code checks for GFX12
instead of dev->info->has_aux_map. Update the code to match the comment.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13555>
Previously a predicated BREAK that appeared immediately before the WHILE
would get merged into the WHILE. This doesn't work if other flow
control (e.g., a CONT) can transfer directly to the WHILE.
On Intel platforms, this fixes the CTS test
dEQP-VK.graphicsfuzz.stable-binarysearch-tree-nested-if-and-conditional.
No shader-db changes on any Intel platform.
When this commit was first created (over a month before it is going to
land), there were some regressions that were prevented by other commits
in MR !13095. That does not appear to be the case now, so I don't know
what changed. Basically, the treatment of discard as a combination of
demote and terminate causes additional continues in some loops, and
those continues trigger this bug. The other commits from that MR
prevent those continues from being generated in the first place.
All Intel platforms had simlar fossil-db results. (Ice Lake shown)
Instructions in all programs: 144419989 -> 144419995 (+0.0%)
SENDs in all programs: 6947332 -> 6947332 (+0.0%)
Loops in all programs: 38277 -> 38277 (+0.0%)
Spills in all programs: 204075 -> 204075 (+0.0%)
Fills in all programs: 319480 -> 319480 (+0.0%)
A few shaders in Doom 2016 were hurt by one instruction each. It seems
likely that these shaders would have experienced at least some
mis-rendering.
Closes: #4213
Fixes: d13bcdb3a9 ("i965/fs: Extend predicated break pass to predicate WHILE.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14128>
This virtual instruction is translated into multiple half float
physical instructions, even though its destination is typically of
integer type, which prevents the software scoreboard pass from
inferring the correct execution pipeline for the virtual instruction
on XeHP+ platforms. Teach the SWSB lowering pass about this
inconsistency between the IR and physical instruction types.
Fixes among other tests:
dEQP-GLES31.functional.shaders.builtin_functions.pack_unpack.packhalf2x16_compute
Fixes: d4537770bb ("intel/fs: Add helper functions inferring sync and exec pipeline of an instruction.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5685
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14002>
include it explicitly in the correct places
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14104>
This adds a bunch of other headers in, and adds mtypes.h to iris
for perf query object.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14104>
This file needs futexes so make an explicit include, so it doesn't
come via the compiler
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14104>
These checks are done in isl_format_supports_ccs_*. Since
isl_surf_supports_ccs calls these functions, it doesn't need to check
them itself.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14082>
These formats are used when creating surfaces with the
I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS modifier.
Makes iris pass the out-of-tree piglit test,
ext_image_dma_buf_import-intel-modifiers.
Fixes: 1433fe7860 ("intel/isl: Unify fmt checks in isl_surf_supports_ccs")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14082>
On XeHP, the XY_BLOCK_COPY_BLT command has a number of fields that
describe the layout of the surface, much like SURFACE_STATE does.
Several of them are encoded in such a similar manner that we really
would like to reuse the isl helpers for emitting those. This commit
moves them into a new isl_genX_helpers.h file which I can include
from the BLORP code. (The alternative would be to add XY_BLOCK_COPY_BLT
filling commands to isl, but that...seems more like a BLORP feature.)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14094>
Those defines exist in the packing headers too and some parts of the
code (like mi_builder.h) include both.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13954>
Those names are a bit too common and sometimes clash variables.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13954>
Use the same URB access helpers that were added for Task Output. The
Arrayed I/O (per-primitive and per-vertex) is handled by applying the
pitch from the MUE layout into the NIR intrinsics and including the
non-arrayed offset on top of it. After that, the index src can be
used directly for lowering.
Because we keep around the non-arrayed offset AND the pitch is
aligned, we can identify cases where the access is indirect but
guaranteed to be aligned, and dispatch a single message. Added a TODO
to explore that later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
Implement the output written by the task *workgroup* and available to
all the mesh *workgroups* dispatched from that task. We currently
ignore any layout annotations (since they are not really testable) and
produce a (packed) layout ourselves.
The URB messages are only SIMD8, so for larger SIMDs, the functions
will produce multiple messages. Making this lowering here instead of
the generic lower_simd_width() since it is not just a matter of
zip/unzip, e.g. the offset must be adjusted.
Indirect writes/reads are implemented by handling one component at a
time and using the PER_SLOT variant of the messages.
Note that VK_NV_mesh_shader allows reading outputs, so add support for
that as well.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
Task/Mesh stages are CS-like stages, and include many
builtins (e.g. workgroup ID/index) and intrinsics (e.g. workgroup
memory primitives) originally present only in CS.
This commit add two new stages (task and mesh) that 'inherit' from CS
by embedding a brw_cs_prog_data in their own prog_data structure, so
that CS functionality can be easily reused. They also currently use
the same helpers to select the SIMD variant to use -- that was
recently added for CS.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
Allows to assert its existence for per-primitive variables and will
later be useful to implement the "more than 16 attributes" case for
Mesh.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>