This maps directly to what Intel's thread payload gives us, allowing us to
optimize out frcp's in some cases. Jay will use this.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40835>
These lowered versions map to what Jay can deal with. The hardware is more
flexible but we're not due to data model restrictions. We choose to lower to get
us off the ground, we can revisit later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40835>
This will let us share a common scratch swizzling between brw and jay.
Changes by Ken:
- Use an immediate SIMD width when known so we don't need to re-lower
- Switch to load_simd_width_intel because it may not match
info->api_subgroup_size on Vulkan without VK_EXT_subgroup_size_control
- Stop using DWord Scattered Write messages for scratch. These take an
offset in DWords, and our offsets are now always in bytes. This also
means that we no longer create MEMORY_OPCODE_* IR with inconsistent
units of either bytes or dwords. Yikes. We use byte scattered
messages now.
fossil-db stats on Battlemage:
Instrs: 500477504 -> 500450056 (-0.01%); split: -0.01%, +0.00%
CodeSize: 7807432368 -> 7806786192 (-0.01%); split: -0.01%, +0.00%
Cycle count: 62404008370 -> 62398437734 (-0.01%); split: -0.01%, +0.00%
Fill count: 546690 -> 546695 (+0.00%); split: -0.00%, +0.00%
Max live registers: 141257956 -> 141258100 (+0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 72350283 -> 72336544 (-0.02%)
Totals from 99 (0.01% of 1581969) affected shaders:
Instrs: 366593 -> 339145 (-7.49%); split: -7.58%, +0.09%
CodeSize: 6425936 -> 5779760 (-10.06%); split: -10.06%, +0.00%
Cycle count: 2412009876 -> 2406439240 (-0.23%); split: -0.26%, +0.03%
Fill count: 19675 -> 19680 (+0.03%); split: -0.02%, +0.04%
Max live registers: 17600 -> 17744 (+0.82%); split: -0.09%, +0.91%
Non SSA regs after NIR: 37894 -> 24155 (-36.26%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40843>
Just a bit cleaner, and we can unify point size too.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40677>
First, rename them to make them a bit more clear. They act on global
memory so they should be _global and they map to ld/st_cvt so so _cvt is
nice and obvious. Second, they don't need IO semantics as they're not
IO. But they do need ACCESS so that we can better control things like
CAN_REORDER. Third, add a src_type to store_global_cvt even though it
won't be used just yet because we'll want it for lowering VS stores.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40391>
Unlike load[_interpolated]_input, which has to deal with all sorts of
ABI nonsense between driver and compiler, these new intrinsics are
dumber than bricks. They're literally just the HW ops as NIR
intrinsics. These will allow us do the lowering in NIR and put the
driver in total control over what goes down what path. Among other
things, a driver could choose to lower some things to ld_var and others
to ld_var_buf.
Co-authored-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40391>
Move lowering to nir_lower_subgroups. At some point Intel
backend might want to skip that and lower at the backend IR
boundary, but for now lowering always applies.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40376>
Adds a new intrinsic allowing to do raw write in the various ISBE spaces
where attributes are stored.
This also adapt isberd_nv to map to what we have since SM70+.
This will be used to support mesh shaders.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39716>
Tessellation evaluation shaders have a single convergent URB handle
(for the common patch data) used by all lanes. Every other stage's
IO handles have separate handles in each lane.
Thanks to Alyssa Rosenzweig for catching this bug.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40280>
This address is delivered on Gfx12.5+ in compute/mesh/task shaders
from the command stream instruction.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40174>
It's pretty much the same mechanism, except it's a different register
location.
With this change we gain indirect loading support.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39405>
Prefer to be explicit when handling it, like is done for regular
nir_instr_type_call.
Even though functions called by cmat_call have restrictions on them ("no
tangled instructions" for example), which could allow a couple of passes
to treat them differently, there's no tracking of what functions are
used only in such cases, so being conservative here should be safe.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39903>
Add load_texture_scale to the list of intrinsics whose divergence
depends on their sources. This is needed to support running divergence
analysis on VC4.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39768>
This started out as dynamic configuration for MSAA related state, but
has since expanded to cover many dynamic fragment shader options.
We rename it to intel_fs_config, similar to intel_tess_config, to
better indicate its purpose.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>
For mesh/task shaders, the thread payload provides a local invocation
index, but it's always linear so it doesn't give the correct value when
quad derivatives are in use.
The lowering pass where all of this is done correctly for compute
shaders assumes load_local_invocation_index will be lowered in the
backend for mesh/task, calculates the values for the quads correctly but
then avoid replacing the original intrinsic and we remain with the wrong
results.
Add an intel specific intrinsic and always lower the generic one to that
(or whatever else was calculated) to avoid ambiguities and fix the value
for quad derivatives.
Fixes future CTS tests using mesh/task shaders under:
dEQP-VK.spirv_assembly.instruction.compute.compute_shader_derivatives.*
Fixes: d89bfb1ff7 ("intel/brw: Reorganize lowering of LocalID/Index to handle Mesh/Task")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39276>
Instead of making it explicitly about outputs, this switchies it to
being a NIR version of LD_TILE. It means we have to do a bit of work in
NIR and add a builder helper but the end result is something much more
versatile.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39367>
We don't need the shader_info fields anymore. sample and centroid fields
are unused. The interp field is already available from
si_shader_info::color_interpolate.
The loads don't need to be sysvals. Add also the _amd suffix.
Don't handle it in st_nir_lower_drawpixels either because the intrinsics
are created much later in compilation now.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38802>
We used load_frag_coord_unscaled_ir3 for loading the fragment coord for
input attachments in GMEM, where the normal scaling for gl_FragCoord
shouldn't be used. However with custom resolve a different scaling will
apply to attachments in GMEM. Separate "unscaled" from "gmem" and rename
the NIR options, in preparation for this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38451>
We're going to change the intrinsic to a load(...) which puts "load" in
the name. Also, it's just more consistent with our usual terminology.
We also rename the corresponding backend opcode so they remain matched.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
We add several new intrinsics for accessing URB handles:
- load_urb_output_handle_intel
- load_urb_input_handle_intel
- load_urb_input_handle_intel_indexed
The latter is used by stages like TCS and GS where each input control
point has a unique handle. The index is which ICP to read from. The
others are for most stages, where all inputs or outputs are accessed
via a single handle.
Then we have URB load and store operations, split for Xe2+ (URB via LSC)
and earlier (HDC OWord messages):
- load_urb_vec4_intel
- load_urb_lsc_intel
- store_urb_vec4_intel
- store_urb_lsc_intel
The legacy vec4 variants take a handle and a 128-bit OWord offset as
sources. Additionally, stores take a set of channel enables to mask
off and avoid writing vec4 components. We don't use the WRITE_MASK
const-index as our channel enables are not required to be constant.
The Xe2+ LSC variants are simpler. Handles are byte offsets into the
URB memory region, and offsets are expressed in bytes. So we simply
add them into a single "address" source. We don't support writemasks
here, as they aren't really necessary with the better addressability.
(Plus, the store_cmask operations work significantly differently than
the previous HDC OWord messages). We will lower disjoint writemasks
to multiple stores.
Based on earlier code by Lionel Landwerlin.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>