Jay will use this to lower & optimize subgroup shuffles. This is closer to
how Intel hardware works but still much higher level than the hardware
primitive. This gets us NIR optimizations on the multiply however.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40835>
This maps directly to what Intel's thread payload gives us, allowing us to
optimize out frcp's in some cases. Jay will use this.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40835>
These lowered versions map to what Jay can deal with. The hardware is more
flexible but we're not due to data model restrictions. We choose to lower to get
us off the ground, we can revisit later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40835>
This will let us share a common scratch swizzling between brw and jay.
Changes by Ken:
- Use an immediate SIMD width when known so we don't need to re-lower
- Switch to load_simd_width_intel because it may not match
info->api_subgroup_size on Vulkan without VK_EXT_subgroup_size_control
- Stop using DWord Scattered Write messages for scratch. These take an
offset in DWords, and our offsets are now always in bytes. This also
means that we no longer create MEMORY_OPCODE_* IR with inconsistent
units of either bytes or dwords. Yikes. We use byte scattered
messages now.
fossil-db stats on Battlemage:
Instrs: 500477504 -> 500450056 (-0.01%); split: -0.01%, +0.00%
CodeSize: 7807432368 -> 7806786192 (-0.01%); split: -0.01%, +0.00%
Cycle count: 62404008370 -> 62398437734 (-0.01%); split: -0.01%, +0.00%
Fill count: 546690 -> 546695 (+0.00%); split: -0.00%, +0.00%
Max live registers: 141257956 -> 141258100 (+0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 72350283 -> 72336544 (-0.02%)
Totals from 99 (0.01% of 1581969) affected shaders:
Instrs: 366593 -> 339145 (-7.49%); split: -7.58%, +0.09%
CodeSize: 6425936 -> 5779760 (-10.06%); split: -10.06%, +0.00%
Cycle count: 2412009876 -> 2406439240 (-0.23%); split: -0.26%, +0.03%
Fill count: 19675 -> 19680 (+0.03%); split: -0.02%, +0.04%
Max live registers: 17600 -> 17744 (+0.82%); split: -0.09%, +0.91%
Non SSA regs after NIR: 37894 -> 24155 (-36.26%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40843>
Just a bit cleaner, and we can unify point size too.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40677>
First, rename them to make them a bit more clear. They act on global
memory so they should be _global and they map to ld/st_cvt so so _cvt is
nice and obvious. Second, they don't need IO semantics as they're not
IO. But they do need ACCESS so that we can better control things like
CAN_REORDER. Third, add a src_type to store_global_cvt even though it
won't be used just yet because we'll want it for lowering VS stores.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40391>
Unlike load[_interpolated]_input, which has to deal with all sorts of
ABI nonsense between driver and compiler, these new intrinsics are
dumber than bricks. They're literally just the HW ops as NIR
intrinsics. These will allow us do the lowering in NIR and put the
driver in total control over what goes down what path. Among other
things, a driver could choose to lower some things to ld_var and others
to ld_var_buf.
Co-authored-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40391>
Move lowering to nir_lower_subgroups. At some point Intel
backend might want to skip that and lower at the backend IR
boundary, but for now lowering always applies.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40376>
Adds a new intrinsic allowing to do raw write in the various ISBE spaces
where attributes are stored.
This also adapt isberd_nv to map to what we have since SM70+.
This will be used to support mesh shaders.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39716>
Tessellation evaluation shaders have a single convergent URB handle
(for the common patch data) used by all lanes. Every other stage's
IO handles have separate handles in each lane.
Thanks to Alyssa Rosenzweig for catching this bug.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40280>
This address is delivered on Gfx12.5+ in compute/mesh/task shaders
from the command stream instruction.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40174>
It's pretty much the same mechanism, except it's a different register
location.
With this change we gain indirect loading support.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39405>
Prefer to be explicit when handling it, like is done for regular
nir_instr_type_call.
Even though functions called by cmat_call have restrictions on them ("no
tangled instructions" for example), which could allow a couple of passes
to treat them differently, there's no tracking of what functions are
used only in such cases, so being conservative here should be safe.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39903>
Add load_texture_scale to the list of intrinsics whose divergence
depends on their sources. This is needed to support running divergence
analysis on VC4.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39768>
This started out as dynamic configuration for MSAA related state, but
has since expanded to cover many dynamic fragment shader options.
We rename it to intel_fs_config, similar to intel_tess_config, to
better indicate its purpose.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>
For mesh/task shaders, the thread payload provides a local invocation
index, but it's always linear so it doesn't give the correct value when
quad derivatives are in use.
The lowering pass where all of this is done correctly for compute
shaders assumes load_local_invocation_index will be lowered in the
backend for mesh/task, calculates the values for the quads correctly but
then avoid replacing the original intrinsic and we remain with the wrong
results.
Add an intel specific intrinsic and always lower the generic one to that
(or whatever else was calculated) to avoid ambiguities and fix the value
for quad derivatives.
Fixes future CTS tests using mesh/task shaders under:
dEQP-VK.spirv_assembly.instruction.compute.compute_shader_derivatives.*
Fixes: d89bfb1ff7 ("intel/brw: Reorganize lowering of LocalID/Index to handle Mesh/Task")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39276>
Instead of making it explicitly about outputs, this switchies it to
being a NIR version of LD_TILE. It means we have to do a bit of work in
NIR and add a builder helper but the end result is something much more
versatile.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39367>
We don't need the shader_info fields anymore. sample and centroid fields
are unused. The interp field is already available from
si_shader_info::color_interpolate.
The loads don't need to be sysvals. Add also the _amd suffix.
Don't handle it in st_nir_lower_drawpixels either because the intrinsics
are created much later in compilation now.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38802>
We used load_frag_coord_unscaled_ir3 for loading the fragment coord for
input attachments in GMEM, where the normal scaling for gl_FragCoord
shouldn't be used. However with custom resolve a different scaling will
apply to attachments in GMEM. Separate "unscaled" from "gmem" and rename
the NIR options, in preparation for this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38451>
We're going to change the intrinsic to a load(...) which puts "load" in
the name. Also, it's just more consistent with our usual terminology.
We also rename the corresponding backend opcode so they remain matched.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>