fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 05:08:06 +02:00

Author	SHA1	Message	Date
Jason Ekstrand	d1eae6f36b	nir: Properly clean up nir_src/dest indirects Now that they're no longer ralloc'd, we have to be much more careful about indirects. We have to make sure every time a source or destination is overwritten, its indirect (if any) is freed. We also have to choose a memory ownership convention for the rewrite functions. Assuming that they will be called with the source from some other instruction, we choose to always make a copy of the indirect (if any). It's the responsibility of the caller to ensure its copy of the indirect is freed. Unfortunately, all this extra logic is going to make nir_instr_rewrite/move_src/dest more expensive because they now have all the logic of nir_src/dest_copy instead of a simple struct assignment. Fortunately, the vast majority of rewrite calls are done by nir_ssa_def_rewrite_uses which is an SSA-only fast-path. Fixes: `879a569884` "nir: Switch from ralloc to malloc for NIR instructions." Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12884>	2021-09-16 11:28:36 +00:00
Emma Anholt	aed4c0b5a9	nir: Drop the unused instr arg for src/dest copy functions. Now that we don't use ralloc, we don't need this arg to get at the right ralloc ctx. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:06 +00:00
Emma Anholt	879a569884	nir: Switch from ralloc to malloc for NIR instructions. By replacing the 48-byte ralloc header with our exec_node gc_node (16 bytes), runtime of shader-db on my system across this series drops -4.21738% +/- 1.47757% (n=5). Inspired by discussion on #5034. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:06 +00:00
Emma Anholt	feee5e6974	nir/tests: Fix transmuting an SSA dest to be non-SSA With the de-ralloc changes, having the register dest not have its .reg properly initialized caused crashes. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:06 +00:00
Emma Anholt	1edff520e2	nir/lower_phis_to_scalar: Use nir_instr_free() to free instrs. Preparation for de-rallocing instrs. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:06 +00:00
Emma Anholt	d1a2870f78	nir: Add all allocated instructions to a GC list. Right now we're using ralloc to GC our NIR instructions, but ralloc has significant overhead for its recursive nature so it would be nice to use a simpler mechanism for GCing instructions. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:06 +00:00
Emma Anholt	22788d68eb	nir: Consistently pass the instr to nir_src_copy(). The arg says it's supposed to be the instr, not the shader. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:05 +00:00
Emma Anholt	5e37cfb7fe	nir: Consistently pass the shader to the shader arg of instr creation. We were using the ralloc parent in some places, which should work out to be the shader I think, but to de-ralloc the instrs we should just pass the existing shader pointer in. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:05 +00:00
Emma Anholt	7a4bbe60c1	nir/from_ssa: Use nir_instr_free() to free instrs instead of ralloc. This code was being tricky with passing a mem_ctx instead of the shader, then freeing the mem_ctx when the pass was done and all the parallel copies had been removed from the shader. Use the right type for instr creation and do a bit of manual list management to prepare the way for non-ralloc NIR instrs. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:05 +00:00
Emma Anholt	b99efb8af0	nir: Pull the instr list free function out to a helper. With the de-rallocing, we're going to have some more places that free a list of instrs. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:05 +00:00
Emma Anholt	36d9bdca0b	nir: Add a nir_instr_free() to replace ralloc_free(instr). This will gain another step shortly. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:05 +00:00
Ian Romanick	7956a701d8	nir/lower_gs_intrinsics: Make nir_lower_gs_intrinsics be idempotent Calling this lower pass twice in a row would cause spurious set_vertex_and_primitive_count(0, undef) intrinsics after the proper set_vertex_and_primitive_count intrinsic. This pretty much turns any geometry shader into garbage. Fix this by treating nir_intrinsic_emit_vertex_with_counter and nir_intrinsic_end_primitive_with_counter just like the non-_with_counter versions. If no blocks would need set_vertex_and_primitive_count intrinsics added, exit the pass before doing any work. This prevents the need for DCE to do extra clean up later. Since this pass is potentially called multiple times via multiple invocations of a finalize_nir callback, it is (hypothetically?) possible that control flow could be changed to add new blocks that need this intrinsic. The check implemented in this commit should be robust against that possibility. v2: Add a_block_needs_set_vertex_and_primitive_count. Suggested by Timur. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12802>	2021-09-14 09:13:07 -07:00
Ian Romanick	edf357b233	nir/lower_gs_intrinsics: Return progress if append_set_vertex_and_primitive_count makes progress Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Fixes: `542d40d698` ("nir: Add new GS intrinsics that maintain a count of emitted vertices.") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12802>	2021-09-14 09:12:47 -07:00
Bas Nieuwenhuizen	b05cd10b8e	nir: Avoid visiting instructions multiple times in nir_instr_free_and_dce. Sadly need to poke a bit in the src internals to avoid using yet another heap allocated datastructure. Fixes: `5251548572` ("nir: Add a nir_instr_remove that recursively removes dead code.") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5323 Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12726>	2021-09-09 21:35:03 +00:00
Rhys Perry	c1f724b2b9	nir: fix serialization of loop/if control Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Fixes: `e76ae39ae2` ("nir: add support for user defined select control") Fixes: `b56451f82c` ("nir: add support for user defined loop control") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12778>	2021-09-09 10:32:30 +00:00
Qiang Yu	7054c1b7fd	nir/linker: pack varyings with different interpolation qualifier Driver like radeonsi load varying in a scalar manner, so prefer to pack varying with different interpolation qualifier into same slot to save space. But driver like panfrost/bifrost can load varying in vector manner, so prefer to pack varying with same interpolation qualifier. Driver can add interpolation qualifiers which are able to be packed into same varying slot to pack_varying_options nir option. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12537>	2021-09-09 06:00:58 +00:00
Qiang Yu	5a24aed1ac	nir/lower_io_to_vector: check centroid & sample when merge variable These qualifiers should be respected for different varying load code generation. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12537>	2021-09-09 06:00:58 +00:00
Rob Clark	b8b475ad4e	nir/lower_amul: Fix usage of nir_foreach_src() nir_foreach_src() bails after cb returns false for any src. Which isn't the behavior we were looking for. Move progress flag to state struct instead, so we don't skip visiting some sources. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12732>	2021-09-06 15:58:05 +00:00
Rob Clark	5800fde1bb	nir/lower_amul: Handle load/store_global These need more than 24b. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12732>	2021-09-06 15:58:05 +00:00
Enrico Galli	9461fe5cf1	nir: Add CAN_REORDER to load_ubo_dxil Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12707>	2021-09-03 16:21:03 +00:00
Rhys Perry	41ecef7855	nir: add sdot_2x16 and udot_2x16 opcodes Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>	2021-09-03 13:21:27 +00:00
Rhys Perry	ae00f5af61	nir: separate lower_add_sat Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>	2021-09-03 13:21:27 +00:00
Timur Kristóf	33630090a2	nir: Add comment to explain the sad_u8x4 opcode. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12649>	2021-09-01 08:42:03 +00:00
Emma Anholt	33182c555f	nir/nir_lower_uniforms_to_ubo: Set the explicit stride of the UBO 0 uniform. Normal UBOs have explicit strides on them, make our lowered one behave the same. Reviewed-by: Adam Jackson <ajax@redhat.com> Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12175>	2021-08-31 20:12:16 +00:00
Emma Anholt	01759d3fb2	nir: Set .driver_location for GLSL UBO/SSBOs when we lower to block indices. Without this, there's no way to match the UBO nir_variable declarations to the load_ubo intrinsics referencing their data. Reviewed-by: Adam Jackson <ajax@redhat.com> Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12175>	2021-08-31 20:12:16 +00:00
Timur Kristóf	548b383310	nir: Fix local_invocation_index upper bound for non-compute-like stages. The lowered LS and NGG stages use local_invocation_index and they can benefit from the unsigned upper bound because they can emit a less expensive integer multiplication instruction. This was working in the past, but accidentally borked by a refactor. Fossil DB changes on Sienna Cichlid: Totals from 956 (0.74% of 128647) affected shaders: CodeSize: 2354172 -> 2344712 (-0.40%) Instrs: 434359 -> 434327 (-0.01%) Latency: 1883949 -> 1876814 (-0.38%) InvThroughput: 762638 -> 757405 (-0.69%) Fossil DB changes on Sienna Cichlid (with NGGC enabled): Totals from 57873 (44.99% of 128647) affected shaders: CodeSize: 155844192 -> 155607064 (-0.15%) Instrs: 29799184 -> 29799152 (-0.00%) Latency: 130959764 -> 130814224 (-0.11%); split: -0.11%, +0.00% InvThroughput: 21100300 -> 20928635 (-0.81%); split: -0.81%, +0.00% Fixes: `8af6766062` Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12558>	2021-08-30 14:05:33 +00:00
Timur Kristóf	a25fd1787a	nir: Add unsigned upper bound for extract opcodes. This helps with some cases of extract, such as: - Emitting more optimal integer multiplications - Better address calculation - Possibly others Fossil DB results on Sienna Cichlid: Totals from 4064 (3.16% of 128647) affected shaders: VGPRs: 262040 -> 262032 (-0.00%) CodeSize: 28856648 -> 28811892 (-0.16%); split: -0.18%, +0.02% Instrs: 5370279 -> 5367827 (-0.05%); split: -0.08%, +0.04% Latency: 74230112 -> 74016671 (-0.29%); split: -0.29%, +0.01% InvThroughput: 12082532 -> 12036365 (-0.38%); split: -0.39%, +0.01% VClause: 108506 -> 108721 (+0.20%); split: -0.03%, +0.22% SClause: 217731 -> 216602 (-0.52%); split: -0.67%, +0.15% Copies: 265689 -> 270811 (+1.93%); split: -0.26%, +2.19% PreSGPRs: 201982 -> 204907 (+1.45%); split: -0.01%, +1.46% PreVGPRs: 236099 -> 236079 (-0.01%) Fossil DB results on Sienna Cichlid with NGGC enabled: Totals from 60375 (46.93% of 128647) affected shaders: VGPRs: 2212576 -> 2212568 (-0.00%) CodeSize: 180870420 -> 179684816 (-0.66%); split: -0.66%, +0.00% Instrs: 34386715 -> 34213682 (-0.50%); split: -0.51%, +0.01% Latency: 199676290 -> 198987998 (-0.34%); split: -0.35%, +0.00% InvThroughput: 32288299 -> 31736433 (-1.71%); split: -1.71%, +0.00% VClause: 621521 -> 621743 (+0.04%); split: -0.00%, +0.04% SClause: 900447 -> 899392 (-0.12%); split: -0.16%, +0.04% Copies: 3439529 -> 3445305 (+0.17%); split: -0.02%, +0.19% PreSGPRs: 2216297 -> 2219220 (+0.13%); split: -0.00%, +0.13% PreVGPRs: 1842887 -> 1842867 (-0.00%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12558>	2021-08-30 14:05:33 +00:00
Caio Marcelo de Oliveira Filho	10a03e30cf	nir: Allow Task/Mesh to lower compute system values Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>	2021-08-28 03:56:43 +00:00
Caio Marcelo de Oliveira Filho	4f52681a2d	nir: Don't lower Task/Mesh I/O to temporaries These won't work since a workgroup can span more than one thread, and the temporaries are not shared memory. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>	2021-08-28 03:56:43 +00:00
Caio Marcelo de Oliveira Filho	27697d5eb8	nir/divergence_analysis: Handle Task/Mesh shaders Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>	2021-08-28 03:56:42 +00:00
Caio Marcelo de Oliveira Filho	bf5f6add01	nir/lower_io: Identify Mesh output as arrayed Mesh shader outputs are either: - non-array builtins - array builtins that are either per-primitive or per-vertex - user-defined outputs that must be either per-primitive or per-vertex So we can identify any array output as "arrayed" for the purposes of I/O lowering. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>	2021-08-28 03:56:42 +00:00
Caio Marcelo de Oliveira Filho	cd394017c8	nir: Add per-primitive I/O intrinsics Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>	2021-08-28 03:56:42 +00:00
Caio Marcelo de Oliveira Filho	f95daad3a2	nir: Add a way to identify per-primitive variables Per-primitive is similar to per-vertex attributes, but applies to all fragments of the primitive without any interpolation involved. Because they are regular input and outputs, keep track in shader_info of which I/O is per-primitive so we can distinguish them after deref lowering. These fields can be used combined with the regular `inputs_read`, `outputs_written` and `outputs_read`. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>	2021-08-28 03:56:42 +00:00
Caio Marcelo de Oliveira Filho	927584fa67	nir: Update documentation for location to mention Task/Mesh Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>	2021-08-28 03:56:42 +00:00
Filip Gawin	46f3582c6f	nir: fix ifind_msb_rev by using appropriate type As you can see comparion "x < 0" doesn't make sense if x is unsigned. Fixes: `a5747f8a` ("nir: add opcodes for *find_msb_rev and lowering ") Reviewed-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12548>	2021-08-26 18:35:31 +00:00
Filip Gawin	9083e9a483	nir: fix shadowed variable in nir_lower_bit_size.c Fixes: `6d79298992` ("nir/lower_bit_size: fix lowering of {imul,umul}_high") Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12527>	2021-08-26 18:04:22 +00:00
Lionel Landwerlin	a13e79843e	nir: prevent peephole from generating invalid NIR We can't append instructions following a return/halt instruction because the control flow helpers will modify the successor of the block containing the return/halt. And the NIR validator enforces that the return/halt must have the end of the function as successor. This tends to happen following lower_shader_calls lowering which inserts halts. This probably doesn't prevent the optimization, it'll just happen in one of the return shaders after the halt has been removed. v2: Move prev block ending check earlier in the function (Daniel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12506>	2021-08-25 11:38:21 +00:00
Samuel Pitoiset	cff106c4b6	nir/opt_algebraic: optimize fmax(-fmin(b, a), b) -> fmax(fabs(b), -a) and fmin(-fmax(b, a)) to fmin(-fabs(b), -a). fossils-db (Sienna Cichlid): Totals from 34 (0.02% of 150170) affected shaders: CodeSize: 388540 -> 387748 (-0.20%) Instrs: 74621 -> 74423 (-0.27%) Latency: 1039407 -> 1039011 (-0.04%) InvThroughput: 208364 -> 208150 (-0.10%) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12519>	2021-08-25 07:18:24 +02:00
Ian Romanick	a6db40605e	nir/algebraic: Add some extract optimizations These help quite a bit when vectored versions of SpvOpSDotKHR and friends are emitted as packed versions and then lowered. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>	2021-08-24 19:58:57 +00:00
Ian Romanick	839495efc6	nir/algebraic: Add lowering for dot_4x8 instructions v2: Fix copy-and-paste bugs in lowering patterns. v3: Add has_sudot_4x8 flag. Requested by Rhys. v4: Since the names of the opcodes changed from dp4 to dot_4x8, also change the names of the lowering helpers. Suggested by Jason. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>	2021-08-24 19:58:57 +00:00
Ian Romanick	806cd2341c	nir/algebraic: Basic patterns for dot_4x8 v2: Add and modify patterns to let constant folding do better. v3: Remove '(is_not_zero)' from the patterns that try to combine addends. I honestly don't know why I had it there in the first place, and nothing in my deep git logs could help clue me in. Noticed by Alyssa. Remover patterns that detect open-coded udot_4x8. Suggested by Alyssa and Jason. Add missing sudot_4x8 patterns. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>	2021-08-24 19:58:57 +00:00
Ian Romanick	6c18a3b497	nir/opcodes: Add integer dot-product opcodes Six opcodes are added: sdot_4x8_iadd, udot_4x8_uadd, sudot_4x8_iadd, sdot_4x8_iadd_sat, udot_4x8_uadd_sate, and sudot_4x8_iadd_sat. These represent the combinations of integer dot-product and add that operate on packed source vectors. That is, the four 8-bit values for each vector is stored in a single 32-bit integer. Some hardware may prefer to operate on unpacked byte vectors. When such hardware comes to Mesa, we'll have to figure out how to name things. v2: Add nir_op_iudp4a and nir_op_iudp4a_sat instructions. These opcodes are not 2-source commutative. v3: Rename all opcodes to be more like some existing 4x8 opcodes. Suggested by Timur. Change type of packed vector sources to uint32, change types of constant folding variables to have explicit size, and delete some extra casts. All suggested by Jason. v4: Fix typo previously noticed by Alyssa but missed in v2. v5: Add has_sudot_4x8 flag. Requested by Rhys. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>	2021-08-24 19:58:57 +00:00
Ian Romanick	7d8bf7c167	nir/lower_bit_size: Support add_sat and sub_sat Without this, lowered saturating ALU instructions would only clamp to the range of the new type instead of the range of the old type. v2: Use nir_iclamp. Suggested by Jason. Use new u_{int,uint}N_{min,max}() helpers. Fixes: `090e282407` ("nir: Add a saturated unsigned integer add opcode") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>	2021-08-24 19:58:57 +00:00
Rhys Perry	3d228b6926	nir/gcm: pin some instructions which require uniform sources fossil-db (Sienna Cichlid, GCM enabled): Totals from 6192 (4.12% of 150170) affected shaders: VGPRs: 548392 -> 542040 (-1.16%) SpillSGPRs: 3702 -> 3990 (+7.78%); split: -0.54%, +8.32% CodeSize: 62418488 -> 62481516 (+0.10%); split: -0.07%, +0.17% MaxWaves: 70582 -> 71718 (+1.61%) Instrs: 11768497 -> 11795079 (+0.23%); split: -0.07%, +0.30% Latency: 445891848 -> 523561297 (+17.42%); split: -0.07%, +17.49% InvThroughput: 115675481 -> 121494913 (+5.03%); split: -0.09%, +5.12% VClause: 164914 -> 164934 (+0.01%); split: -0.05%, +0.06% SClause: 405991 -> 395302 (-2.63%); split: -2.64%, +0.00% Copies: 907216 -> 926429 (+2.12%); split: -1.11%, +3.23% Branches: 456373 -> 457478 (+0.24%); split: -0.13%, +0.38% PreSGPRs: 648030 -> 642953 (-0.78%); split: -0.88%, +0.10% PreVGPRs: 522425 -> 516355 (-1.16%); split: -1.16%, +0.00% Seems to affect Detroit: Become Human and Cyberpunk 2077. The Cyberpunk 2077 changes look like a fixed bug. At least some of the Detroit: Become Human changes could probably be removed with better divergence analysis. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12444>	2021-08-24 16:52:31 +00:00
Rhys Perry	884ac52eaa	nir: consider push constant loads as always dynamically uniform Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12444>	2021-08-24 16:52:31 +00:00
Daniel Schürmann	2cf164feb9	nir/opt_algebraic: optimize flrp(fadd, fadd, x) only if fadd are used_once Totals from 201 (0.13% of 150170) affected shaders: (GFX10.3) VGPRs: 13880 -> 13856 (-0.17%) CodeSize: 1517328 -> 1518124 (+0.05%); split: -0.04%, +0.10% MaxWaves: 3184 -> 3192 (+0.25%) Instrs: 285487 -> 285569 (+0.03%); split: -0.06%, +0.08% Latency: 7774066 -> 7780877 (+0.09%); split: -0.10%, +0.19% InvThroughput: 1936341 -> 1935287 (-0.05%); split: -0.07%, +0.02% SClause: 11446 -> 11448 (+0.02%); split: -0.01%, +0.03% Copies: 17500 -> 17506 (+0.03%); split: -0.51%, +0.55% Branches: 8174 -> 8180 (+0.07%); split: -0.13%, +0.21% PreVGPRs: 12507 -> 12427 (-0.64%) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12061>	2021-08-24 16:10:30 +00:00
Daniel Schürmann	89a842b2b6	nir/loop_analyze: consider instruction cost of nir_op_flrp Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12061>	2021-08-24 16:10:30 +00:00
Rhys Perry	aeb1b4c30c	nir/lower_io: use nir_vector_insert_imm() This creates a single nir_op_vecn instead of a nir_op_vecn and several copies. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12469>	2021-08-24 10:35:19 +00:00
Samuel Pitoiset	f4b858e746	Revert "nir/opt_algebraic: optimize fmax(-fmin(b, a), b) -> fmax(b, -a)" This is wrong for negative values. This reverts commit `07cd30ca29`. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12515>	2021-08-24 08:58:38 +00:00
Samuel Pitoiset	07cd30ca29	nir/opt_algebraic: optimize fmax(-fmin(b, a), b) -> fmax(b, -a) Found with Cyberpunk 2077. fossils-db (GFX10.3): Totals from 128 (2.34% of 5465) affected shaders: CodeSize: 769720 -> 767656 (-0.27%); split: -0.27%, +0.00% Instrs: 145748 -> 145229 (-0.36%) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11604>	2021-08-23 17:53:38 +00:00

1 2 3 4 5 ...

3331 commits