Some of us want to lower all TXD with min_lod regardless of whether or
not it's shadow or cube or whatever.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25000>
Extracted from nir_lower_to_source_mods, this is useful for back-ends
which don't want to rely on NIR source and destination modifiers.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25000>
Right now some drivers are doing dsign lowering in GLSL and haven't had to
have a NIR path due to there not being a corresponding vulkan driver. We
want this in NIR now so that we can retire that batch of GLSL lowering
code.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25777>
i915g and r300-r400 don't have if statements, and discards are all
nir_intrinsic_discard_if. We can flatten those discards here, saving a
separate GLSL pass to try to do so.
i915g:
GAINED: shaders/closed/xcom-enemy-unknown/413.shader_test FS
rv370:
GAINED: shaders/closed/xcom-enemy-unknown/12.shader_test FS
GAINED: shaders/closed/xcom-enemy-unknown/122.shader_test FS
GAINED: shaders/closed/xcom-enemy-unknown/132.shader_test FS
GAINED: shaders/closed/xcom-enemy-unknown/145.shader_test FS
GAINED: shaders/closed/xcom-enemy-unknown/146.shader_test FS
GAINED: shaders/closed/xcom-enemy-unknown/19.shader_test FS
GAINED: shaders/closed/xcom-enemy-unknown/413.shader_test FS
GAINED: shaders/closed/xcom-enemy-unknown/415.shader_test FS
Closes: #9918
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24763>
Shifts always need 32 bit for their second source.
Fixes: c70d94a889 ("nir_lower_mem_access_bit_sizes: Support unaligned stores via a pair of atomics")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25740>
Map __constant with a 64-bit address format to load_global_constant instead of
load_global. This notably allows nir_opt_preamble to hoist the load.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25625>
In general, it is unsafe to speculatively hoist conditionally executed loads
into the preamble. For example, if the shader does:
if (ptr is valid) {
foo(*ptr)
}
we cannot dereference ptr in the preamble without knowing that the pointer is
valid (which may not be determinable, since it might not be uniform).
nir_opt_preamble needs to stop speculating in this case, or otherwise using
preambles can cause faults on legal shaders.
However, some platforms may be able to speculate loads safely. For example,
Apple hardware is able to suppress MMU faults, making speculation safe. This is
controlled global register to control this behaviour, set at boot-time by the
kernel. (macOS suppresses these faults unconditionally, this feature may be
used in their implementation of sparse textures. Currently Linux does not
suppress any faults but this may change later.)
Since nir_opt_preamble should work soundly and optimally on a variety of
platforms, we need to respect the ACCESS flag.
Thanks to the if-else hoisting implemented earlier in the series, this isn't too
terrible of a band-aid on Asahi:
total instructions in shared programs: 1499674 -> 1507699 (0.54%)
instructions in affected programs: 78865 -> 86890 (10.18%)
helped: 0
HURT: 337
Instructions are HURT.
total bytes in shared programs: 10238284 -> 10279308 (0.40%)
bytes in affected programs: 554504 -> 595528 (7.40%)
helped: 3
HURT: 334
Bytes are HURT.
total halfregs in shared programs: 452049 -> 454015 (0.43%)
halfregs in affected programs: 7569 -> 9535 (25.97%)
helped: 7
HURT: 150
Halfregs are HURT.
There are no shader-db changes on ir3 as expected, since ir3 can safely
speculate all instructions in my shader-db.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24011>
Add infrastructure to reconstruct if's. Later in the series, this will let us
hoist loads from inside uniform if's without speculating. For now, it lets us
handle phi's in nir_opt_preamble in a straightforward way.
Results on AGX are good:
total instructions in shared programs: 1504730 -> 1499674 (-0.34%)
instructions in affected programs: 153673 -> 148617 (-3.29%)
helped: 496
HURT: 0
Instructions are helped.
total bytes in shared programs: 10287768 -> 10238284 (-0.48%)
bytes in affected programs: 1113724 -> 1064240 (-4.44%)
helped: 496
HURT: 0
Bytes are helped.
total halfregs in shared programs: 452669 -> 452049 (-0.14%)
halfregs in affected programs: 14825 -> 14205 (-4.18%)
helped: 152
HURT: 99
Halfregs are helped.
total threads in shared programs: 16469504 -> 16470784 (<.01%)
threads in affected programs: 8960 -> 10240 (14.29%)
helped: 10
HURT: 0
Threads are helped.
Results on ir3 is a bit more of a wash but still should be a win overall: The
regression in moves seems scary, but the cost model already accounts for them as
evidenced by instruction count coming out ahead.
total instructions in shared programs: 3108750 -> 3105993 (-0.09%)
instructions in affected programs: 317367 -> 314610 (-0.87%)
helped: 675
HURT: 242
Instructions are helped.
total nops in shared programs: 673152 -> 675048 (0.28%)
nops in affected programs: 74551 -> 76447 (2.54%)
helped: 353
HURT: 347
Inconclusive result (%-change mean confidence interval includes 0).
total non-nops in shared programs: 2435598 -> 2430945 (-0.19%)
non-nops in affected programs: 232664 -> 228011 (-2.00%)
helped: 816
HURT: 38
Non-nops are helped.
total mov in shared programs: 78201 -> 84011 (7.43%)
mov in affected programs: 10726 -> 16536 (54.17%)
helped: 60
HURT: 781
Mov are HURT.
total cov in shared programs: 74964 -> 74906 (-0.08%)
cov in affected programs: 273 -> 215 (-21.25%)
helped: 17
HURT: 0
Cov are helped.
total dwords in shared programs: 6716814 -> 6748726 (0.48%)
dwords in affected programs: 879778 -> 911690 (3.63%)
helped: 12
HURT: 948
Dwords are HURT.
total full in shared programs: 193210 -> 193212 (<.01%)
full in affected programs: 278 -> 280 (0.72%)
helped: 12
HURT: 22
Inconclusive result (value mean confidence interval includes 0).
total constlen in shared programs: 493632 -> 494816 (0.24%)
constlen in affected programs: 19904 -> 21088 (5.95%)
helped: 9
HURT: 306
Constlen are HURT.
total cat0 in shared programs: 742476 -> 745046 (0.35%)
cat0 in affected programs: 84455 -> 87025 (3.04%)
helped: 277
HURT: 489
Cat0 are HURT.
total cat1 in shared programs: 153303 -> 159059 (3.75%)
cat1 in affected programs: 17810 -> 23566 (32.32%)
helped: 69
HURT: 780
Cat1 are HURT.
total cat2 in shared programs: 1144508 -> 1140731 (-0.33%)
cat2 in affected programs: 121284 -> 117507 (-3.11%)
helped: 841
HURT: 0
Cat2 are helped.
total cat3 in shared programs: 942098 -> 934804 (-0.77%)
cat3 in affected programs: 87140 -> 79846 (-8.37%)
helped: 855
HURT: 1
Cat3 are helped.
total cat4 in shared programs: 65261 -> 65249 (-0.02%)
cat4 in affected programs: 42 -> 30 (-28.57%)
helped: 12
HURT: 0
Cat4 are helped.
total sstall in shared programs: 237311 -> 241281 (1.67%)
sstall in affected programs: 33755 -> 37725 (11.76%)
helped: 179
HURT: 493
Sstall are HURT.
total (ss) in shared programs: 58166 -> 58795 (1.08%)
(ss) in affected programs: 4535 -> 5164 (13.87%)
helped: 35
HURT: 664
(ss) are HURT.
total systall in shared programs: 503784 -> 503805 (<.01%)
systall in affected programs: 3170 -> 3191 (0.66%)
helped: 16
HURT: 13
Inconclusive result (value mean confidence interval includes 0).
total (sy) in shared programs: 27261 -> 27259 (<.01%)
(sy) in affected programs: 76 -> 74 (-2.63%)
helped: 8
HURT: 5
Inconclusive result (value mean confidence interval includes 0).
total waves in shared programs: 439848 -> 439872 (<.01%)
waves in affected programs: 160 -> 184 (15.00%)
helped: 12
HURT: 0
Waves are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24011>
The way backends walk NIR when translating. This will make it easy to filter
can_move based on the parent control flow.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24011>
Determining whether it is safe to hoist a load instruction out of control flow
depends on complex hardware and driver details. Rather than encoding this as
knobs in every NIR pass that wants to do so (notably nir_opt_preamble and
nir_opt_peephole_select), add a per-load ACCESS flag for backends to set.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24011>
This allows us to pack the is_if boolean into the bottom bit of the parent
pointer, eliminating the boolean and hence shrinking the nir_src by 8 bytes (due
to the extra 63 bits of padding incurred in the old layout).
Because all access is forced through helpers now, this is a local change.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671>
It is undefined behaviour in C to read a different member of a union than was
written. Nothing in-tree should be using this behaviour with the nir_src union:
nir_if should never be read as nir_instr and vice versa. Assert this.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671>
First, we need to give the parent_instr field a unique name to be able to
replace with a helper. We have parent_instr fields for both nir_src and
nir_def, so let's rename nir_src::parent_instr in preparation for rework.
This was done with a combination of sed and manual fix-ups.
Then we use semantic patches plus manual fixups:
@@
expression s;
@@
-s->renamed_parent_instr
+nir_src_parent_instr(s)
@@
expression s;
@@
-s.renamed_parent_instr
+nir_src_parent_instr(&s)
@@
expression s;
@@
-s->parent_if
+nir_src_parent_if(s)
@@
expression s;
@@
-s.renamed_parent_if
+nir_src_parent_if(&s)
@@
expression s;
@@
-s->is_if
+nir_src_is_if(s)
@@
expression s;
@@
-s.is_if
+nir_src_is_if(&s)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671>
These will become nontrivial later in the series. For now these have no smarts
in them, in order to make the conversion completely mechanical.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671>
As noted in the previous commit, the intermediate cast to float from
double can produce wrong results.
Fixes upcoming Vulkan CTS tests:
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.rounding_rte_sconst_conv_from_fp64_up
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.rounding_rte_sconst_conv_from_fp64_up_nostorage
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_sconst_conv_from_fp64_up_vert
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_sconst_conv_from_fp64_up_nostorage_vert
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_sconst_conv_from_fp64_up_frag
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_sconst_conv_from_fp64_up_nostorage_frag
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25281>
Appendix A: Vulkan environemtn for SPIR-V says:
Operations described as “correctly rounded” will return the infinitely
precise result, x, rounded so as to be representable in
floating-point. The rounding mode is not specified, unless the entry
point is declared with the RoundingModeRTE or the RoundingModeRTZ
Execution Mode.
Conversion between types are classified as correctly rounded, so let's
do rounding correctly.
v2: check rounding mode for destination bit size (Georg)
Fixes upcoming Vulkan CTS tests:
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp32.input_args.rounding_rtz_conv_from_uint64_up
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp32.input_args.rounding_rtz_conv_from_int64_up
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp32.input_args.rounding_rtz_conv_from_uint64_up_vert
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp32.input_args.rounding_rtz_conv_from_int64_up_vert
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp32.input_args.rounding_rtz_conv_from_uint64_up_frag
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp32.input_args.rounding_rtz_conv_from_int64_up_frag
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25281>
This patch fixes the following issues that lead to crashes in some cases:
* an instruction is inserted to get texture lod that depends on a
texture instruction that hasn't been inserted yet.
* this code tries to read channel 1 of the lod, but lod is scalar
* the code assumed there would only be 2 srcs, this isn't the case when
bindless is used.
Fixes: b154a4154b ("nir/lower_tex: rewrite tex/txb -> txd/txl before saturating srcs")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25621>
This is necessary to allow optimizing VS inputs after nir_lower_io, which
is currently impossible because the loss of dual-slot information in NIR
would break VS inputs. With this, driver locations can be recomputed by
calling nir_recompute_io_bases. It's a prerequisite for optimizing varyings
with lowered IO.
When this is used, we will be able to eliminate unused dual-slot VS inputs
as well as unused low and high halves of dual-slot VS inputs for the first
time, which can happen due to optimizations of varyings. Without this,
st/mesa binds vertex buffers for dual-slot inputs that are fully or
partially unused in the shader.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25394>
New push consts loading consist of:
- Push consts are set for the entire pipeline via HLSQ_SHARED_CONSTS_IMM
array which could fit up to 256b of push consts.
- For each shader stage that uses push consts READ_IMM_SHARED_CONSTS
should be set in HLSQ_*_CNTL, otherwise push consts may get overwritten
by new push consts that are set after the draw.
- Push consts are loaded into consts reg file in a shader preamble via
stsc at the very start of the preamble.
OPC_PUSH_CONSTS_LOAD_MACRO is used instead of directly translating NIR
intrinsic into stsc because: we don't want to teach legalize pass how
to set (ss) between stores and loads of consts reg file, don't want for
stsc to be reordered, etc.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25086>