Valhall removed Bifrost's memory segments and added in its place memory
access. Those were bolted on reserved bits as "pseudo-segments" and the
emitter would catch these and emit the right memory access. This commit
cleans it up a bit by making memory_access available directly and
exposing it to NIR (this will be useful later).
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40391>
This removes the previous hack that searched the psiz write by looking
for 16-bit stores with the correct pseudo segment. We also add a new
intrinsic that mimicks global stores but tags psiz writes, this will be
used later in the series.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40391>
First, rename them to make them a bit more clear. They act on global
memory so they should be _global and they map to ld/st_cvt so so _cvt is
nice and obvious. Second, they don't need IO semantics as they're not
IO. But they do need ACCESS so that we can better control things like
CAN_REORDER. Third, add a src_type to store_global_cvt even though it
won't be used just yet because we'll want it for lowering VS stores.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40391>
Unlike load[_interpolated]_input, which has to deal with all sorts of
ABI nonsense between driver and compiler, these new intrinsics are
dumber than bricks. They're literally just the HW ops as NIR
intrinsics. These will allow us do the lowering in NIR and put the
driver in total control over what goes down what path. Among other
things, a driver could choose to lower some things to ld_var and others
to ld_var_buf.
Co-authored-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40391>
Allows a uniform name to be passed to force_explicit_uniform_loc_zero
allowing us to set that uniform to an explicit location of zero.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40448>
Move lowering to nir_lower_subgroups. At some point Intel
backend might want to skip that and lower at the backend IR
boundary, but for now lowering always applies.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40376>
This codepath had a bug (always setting `elems[0]`) since it was last
reworked, but there's no subgroup instruction that uses this helper and
support Composites, so it can be replace with an assert.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40356>
We can produce a transposed value sometimes, and we have to make sure
that val->transposed is also updated when that happens.
Noticed by inspection after the previous commit.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40017>
This comes up in lowered load_ubo sequences (observed in OpenCL test
test_api min_max_parameter_size). Hopefully the pack gets coalesced,
it's like nir_op_vec2 on most backends, so it should usually be ok to
sink even though the register pressure heuristic will reject it.
Allowing it to sink allows the UBO load to sink.
Intel's backend scheduler can optimize the relevant sequences locally
but there should still be a win here for global load sinking.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40267>
ministat (nir_analyze_fp_class):
Difference at 95.0% confidence
-201983 +/- 1064.87
-9.31575% +/- 0.0468505%
(Student's t, pooled s = 1257.67)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40346>
ministat (nir_analyze_fp_class):
Difference at 95.0% confidence
-4484.55 +/- 1288.68
-0.205419% +/- 0.0589514%
(Student's t, pooled s = 1521.99)
This should also use less memory.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40346>
Currently the fixed function vertex shader is built as io_lowered
shaders; however the gl_nir_add_point_size() function currently expects
the original shader to be not io_lowered, and this function is called to
lower the fixed function vertex shader.
Add support for adding point size store_output intrinsics for io_lowered
shaders.
This fixes fixed function rendering on Zink with a Vulkan driver w/o
VK_KHR_maintence5.
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40373>
This means we don't run into undefined behavior when testing nan/inf inputs.
Also make sure that patterns using is_only_used_as_float are signed zero correct.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291>
These can never come from the API but there's a few cases where panvk
wants them.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38681>
When masking out of bounds image loads, we previously returned a vector
of all zeros. However, for robustImageAccess2, depending on the format,
some components such as the alpha channel in an RGB format
should evaluate to 1.
This corrects the replacement value based on the format swizzle.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39430>
Code-motion should not move back upconversions without any other
instruction, that would only increase memory pressure without any
significant performance benefit (conversions are usually cheap).
This should also help lowering mediump varyings early by not reversing
their work.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40273>
On SM86+, we can use a 16-bit unsigned offset along side the register
for it.
This adds a new base indice that will be used for it, integration with
nir_opt_offsets and a lowering pass to get ride of the base on
unsupported generations.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39716>