fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-16 14:08:07 +02:00

Author	SHA1	Message	Date
Caio Oliveira	0dd5378ffe	intel/compiler: Make scheduler classes take an external mem_ctx Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	04aa2df461	intel/compiler: Separate schedule_node temporary data Some fields in schedule_node will need to be reset each time they are used. The `cand_generation` needs to be back to zero, and both `unblocked_time` and `parent_count` need to be back to their initial values, which were pre-calculated. Rename the initial data fields and add new ones for the temporary data. Note the helper function is `per node` to allow it "tag along" with an existing loops. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	81594d0db1	intel/compiler: Move earlier scheduler code that is not mode-specific This will be useful later on when we reuse the same scheduler for multiple modes. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	73d4e4118a	intel/compiler: Tidy up code in scheduler related to reads_remaining - Just assert in functions we expect it to exist - Predicate usage with `!post_reg_alloc` to avoid suggest there are more combinations. - Reuse an existing loop to call the count function. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	4f246cf4e7	intel/compiler: Merge child/latency arrays in schedule_node Values are used together, saves one pointer in schedule_node, reduces amount of reallocations when children count grows. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	e59a054203	intel/compiler: Move FS specific fields to fs_instruction_scheduler Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	a6297d05ca	intel/compiler: Remove virtual calls from scheduler Pull run() and schedule_instructions() for fs, and pull a very simplified version of those into a run() for vec4. Because of the previous patches the duplication is small. Since we are touching these, change run() implementations to use the cfg from the existing reference to the visitor/shader instead of taking one as argument. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	d76d58cf50	intel/compiler: Cache issue_time information Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	ecd7ffcf78	intel/compiler: Extract scheduling related basic functions Those will be used in multiple places later. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	8a8dd2db0c	intel/compiler: Add only available instructions to scheduling list The list was used for iterating through all instructions and then later also to track the available ones. Now that the array iteration is used, change how we fill it and rename it to reflect its only job. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	ddff6428c5	intel/compiler: Use array to iterate the scheduler nodes For all the preparation data collection before the scheduling actually happens, it is possible to walk the schedule nodes in order by iterating on the range of the array dedicated to a given block. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	fe6ac5a184	intel/compiler: Allocate all schedule_nodes at once Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	be012055da	intel/compiler: Remove reference to brw_isa_info from schedule_node It is always the same for all nodes, so use the one available in the scheduler itself. Also, per Matt's suggestion, collect is_haswell from devinfo instead of from a function argument. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	6987571737	intel/compiler: Use linear allocator in parts of brw_schedule_instructions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	fcd025c1ce	intel/compiler: Remove is_tex() The current name doesn't cover all the tex related instructions and in all usages, we already have a switch statement to dispatch per instruction type, so is more natural to list the instructions we care there. In fs::is_send_from_grf() we can simply ignore them since the instructions are either lowered directly to SEND (Gfx7+) or use MRF (Gfx6-). With this change, the fs_inst::size_read() generated code gets simplified (the "tex" entries get added to the switch jump table in gcc) and the default case loses the conditional handling tex. This reduces shader compilation time, as illustrated by replaying fossils (tested on my TGL laptop): ``` // Rise of the Tomb Raider (N=13) Difference at 95.0% confidence -1.32231 +/- 0.0170138 -4.37605% +/- 0.0563054% (Student's t, pooled s = 0.0210159) // Cyberpunk 2077 (N=7) Difference at 95.0% confidence -3.64 +/- 0.114993 -2.95188% +/- 0.0932544% (Student's t, pooled s = 0.09873) ``` Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25721>	2023-11-10 15:43:31 +00:00
Francisco Jerez	073b876539	intel/fs/xe2+: Don't special case SEL_EXEC in inferred_exec_pipe(). This is lowered to 32-bit integer execution type by the regioning lowering pass now, so the existing special casing is redudant for Gfx12 and buggy for Xe2+, since SEL_EXEC is now emitted without lowering for 64-bit integers. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>	2023-11-08 23:17:42 -08:00
Francisco Jerez	23e14a6c27	intel/eu/xe2+: Add definition for size of GRF space on Xe2. And use it in various places in the compiler that require knowledge about the size of the register file. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>	2023-11-08 23:17:24 -08:00
Francisco Jerez	ff3814abdd	intel/fs/xe2+: Handle extended math instructions as in-order in SWSB pass. Extended math instructions are now synchronized as in-order instructions like other ALU operations, which is more efficient than the out-of-order tracking we had to do in previous generations, and avoids false dependencies introduced due to SBID aliasing. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>	2023-11-08 23:17:12 -08:00
Francisco Jerez	5fb6760f11	intel/fs/xe2+: Teach SWSB pass about the behavior of double precision instructions. Xe2 hardware has a "long" EU pipeline specifically for FP64 instructions, so these are handled as in-order instructions which require RegDist synchronization. 64-bit integer instructions are now handled by the normal integer pipeline, so the existing special-casing inherited from ATS needs to be disabled. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>	2023-11-08 23:17:03 -08:00
Francisco Jerez	9e446c9282	intel/fs/xe2+: Add comment reminding us to take advantage of the 32 SBID tokens. The additional SBID tokens will be useful when large GRF mode is implemented. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>	2023-11-08 23:16:54 -08:00
Francisco Jerez	15d6c6ab11	intel/eu/xe2+: Add support for 10-bit SWSB representation on Xe2+ platforms. This implements the extended 10-bit encoding of the software scoreboard information used by Xe2 platforms. The new encoding is different enough that there are few opportunities for sharing code during translation to machine code, but the high-level tgl_swsb representation remains roughly the same. Among other changes the 10-bit SWSB format provides 5 bits worth of SBID tokens (though they're only usable in large GRF mode) instead of 4 bits, the extended math pipeline is handled as an in-order (RegDist) pipeline instead of as an out-of-order one, and the dual-argument encodings support additional combinations of RegDist and SBID synchronization modes. A new encoding is introduced for preventing the accumulator hardware scoreboard from being updated, but this is currently not needed. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>	2023-11-08 23:12:32 -08:00
Caio Oliveira	40416850f1	intel/compiler: Re-enable opt_zero_samples() in many cases for Gfx12.5 The workaround applies specifically to Cube and Cube Arrays, so we can still apply the optimization for the others. Ideally we would like to pull opt_zero_samples logic into the lowering sends -- to avoid adding a bit to communicate between passes. However the texture coordinates for the LOGICAL backend instructions, which are a common target for the optimization, are combined into offsets over a single VGRF, so we can't easily identify the constant cases. The copy-prop pass make this more visible for opt_zero_samples. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>	2023-11-09 03:56:28 +00:00
Caio Oliveira	daeab51a62	intel/compiler: Re-enable opt_zero_samples() for Gfx7+ Inadvertently, because of a sequence of changes elsewhere, this pass ended up not having any effect: - Before Gfx5 the optimization is not applicable. - On Gfx5-6 it doesn't apply because it sampler operations don't currently use LOAD_PAYLOAD, but write the MOVs directly. Not clear to me whether they ever did. - On Gfx7+ it doesn't apply anymore because now the logical sampler operations are now lowered directly to SENDs, and the is_tex() check would skip them. Since the LOAD_PAYLOAD implementation applies for Gfx7+ only, rework the pass to work again by handling SEND instructions. To make the pass easier, the optimization will happen before opt_split_sends() so only one LOAD_PAYLOAD needs to be cared for. Update the code to accept BAD_FILE sources in addition to zeros, these are added in some cases as padding and effectively are don't care values, so we can assume them zeros. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>	2023-11-09 03:56:28 +00:00
Caio Oliveira	ef8553082e	intel/compiler: Rework opt_split_sends to not rely/modify LOAD_PAYLOAD This is a preparation to (re-)enable opt_zero_samples(), which will reduce a SEND mlen before we split it. When that happen, opt_split_sends() won't be able to rely on the fact that mlen covers the entire LOAD_PAYLOAD. Since we are changing that, take the opportunity to also not modify the existing LOAD_PAYLOAD, just create two new ones with the exact set of sources. This allows the pass to be further simplified by iterating forward and not require live_variables analysis. The helper function was added so can be used later for opt_zero_samples(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>	2023-11-09 03:56:28 +00:00
Caio Oliveira	e017bcae59	intel/compiler: Clarify the asserts in nir_load_workgroup_id lowering For Task/Mesh WorkgroupID is now lowered to WorkgroupIndex by the generic NIR pass, so we shouldn't hit this. We can now simplify the asserting code in emit_work_group_id_setup(). Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25977>	2023-11-08 17:18:36 -08:00
Caio Oliveira	f4601d82c1	intel/compiler: Remove unused parameter from brw_nir_analyze_ubo_ranges() This parameter was used by i965 driver that is now gone. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25986>	2023-11-08 18:10:31 +00:00
Caio Oliveira	d2125dac85	intel/compiler: Take more precise params in brw_nir_optimize() Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25986>	2023-11-08 18:10:31 +00:00
Caio Oliveira	c4be90b4ba	intel/compiler: Remove unused parameter from brw_nir_adjust_payload() Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25986>	2023-11-08 18:10:31 +00:00
Alyssa Rosenzweig	cc3f20ca6c	nir: Also gather decomposed primitive count Simple extension. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Antonino Maniscalco <antonino.maniscalco@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26056>	2023-11-07 00:05:54 +00:00
Rohan Garg	2444a3cd46	intel/compiler: migrate WA 14013672992 to use WA framework Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26006>	2023-11-02 16:39:25 +00:00
Kenneth Graunke	fddad4d5f9	intel/compiler: Assert that FS_OPCODE_[REP_]FB_WRITE is for pre-Gfx7 We use SHADER_OPCODE_SEND directly instead of FS_OPCODE_FB_WRITE (for a while now) and FS_OPCODE_REP_FB_WRITE (since the previous commit). Assert that it isn't used on Gfx7+. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>	2023-10-30 23:03:23 +00:00
Kenneth Graunke	48f60f4c4b	intel/compiler: Convert the repclear shader to use send-from-GRF Sandybridge uses this code and needs MRFs, but all other platforms send from GRFs. Do that directly rather than relying on the MRF hack. Ivybridge and later also use SHADER_OPCODE_SEND directly rather than a virtual opcode that's handled in the generator, so we follow suit. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>	2023-10-30 23:03:23 +00:00
Kenneth Graunke	ef7d1b5f44	intel/compiler: Drop unused saturate handling in repclear shader We never set key->clamp_fragment_color when compiling the BLORP fast clear shaders. Besides, we were setting saturate on an FB write opcode, which...isn't even a thing. We would need it on the MOV, and weren't setting it there. So it wouldn't have even worked. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>	2023-10-30 23:03:23 +00:00
Kenneth Graunke	e6d9267d4f	intel/compiler: Delete repclear shader's special case for 1 color target This is basically just once through the loop but copy and pasted. One difference is that the single render target case used a headerless message, and the multiple render target case always used headers. Now we use headerless messages for the first render target, even in the multiple render target case. While we already have it set up for the other RTs, it's still 2 fewer registers to send. Minor improvement. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>	2023-10-30 23:03:23 +00:00
Kenneth Graunke	e6460fe66b	intel/compiler: Delete unused repclear shader uniform handling A long time ago, we used a uniform for the clear color. Back in 2014, we added support for using a flat input instead, as this was easier for Vulkan, but we left the option of using a uniform for OpenGL. Eventually nobody used the uniform approach anymore, but the compiler code to handle it remained. Drop the dead code. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>	2023-10-30 23:03:23 +00:00
Kenneth Graunke	b35f1fc910	intel/compiler: Delete unused emit_dummy_fs() This code is compiled out, but has been left in place in case we wanted to use it for debugging something. In the olden days, we'd use it for platform enabling. I can't think of the last time we did that, though. I also used to use it for debugging. If something was misrendering, I'd iterate through shaders 0..N, replacing them with "draw hot pink" until whatever shader was drawing the bad stuff was brightly illuminated. Once it was identified, I'd start investigating that shader. These days, we have frameretrace and renderdoc which are like, actual tools that let you highlight draws and replace shaders. So we don't need to resort iterative driver hacks anymore. Again, I can't think of the last time I actually did that. So, this code is basically just dead. And it's using legacy MRF paths, which we could update...or we could just delete it. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>	2023-10-30 23:03:23 +00:00
Lionel Landwerlin	7cff4cc9c8	intel/fs: Xe2 fix for ExBSO on UGM Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> BSpec: 56890 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25506>	2023-10-27 10:58:12 +03:00
Yonggang Luo	43715516fc	treewide: Merge num_mesh_vertices_per_primitive and u_vertices_per_prim into mesa_vertices_per_prim Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25880>	2023-10-26 09:35:04 +00:00
Lionel Landwerlin	439b0e8688	intel/fs: fix dynamic interpolation mode selection We can end up in situation where we are dispatched with a multisample framebuffer but not at per-sample. In this case we would request the at_sample value with the wrong message configuration. Relying on the BRW_WM_MSAA_FLAG_MULTISAMPLE_FBO flag superseeds BRW_WM_MSAA_FLAG_PERSAMPLE_DISPATCH. Fixes piglit tests : spec@arb_gpu_shader5@arb_gpu_shader5-interpolateatsample* With Zink on Anv Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `68027bd38e` ("intel/fs: implement dynamic interpolation mode for dynamic persample shaders") Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25854>	2023-10-25 21:15:48 +00:00
Caio Oliveira	b91ed68fa0	intel/compiler: Don't emit calls to validate() in release build While the fs_visitor::validate() implementation is empty in release build, we still emit calls to it since it is defined in a separate compilation unit than its callers. To fix this, just expose an inline empty function in the header for the release mode. Fossil run time differences in TGL laptop (difference at 95.0% confidence): ``` Rise of The Tomb Rider (Native) [n=7] -0.482857 +/- 0.010932 -1.60608% +/- 0.0363621% Cyberpunk 2077 (DXVK) [n=7] -0.987143 +/- 0.0904516 -0.82996% +/- 0.076049% Batman Arkham City (DXVK) [n=7] -7.74857 +/- 0.329561 -1.46298% +/- 0.0622231% ``` Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25847>	2023-10-24 21:10:35 +00:00
Dave Airlie	d6613deed9	intel-clc: avoid using spirv-linker. There is not real need to use the spirv-linker here at all, we can just read all the CL C files into one buffer, then compile that buffer in a single pass. This worksaround an issue seen with llvm17 and opaque pointers and the spirv linker. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25759>	2023-10-17 13:53:52 +10:00
Lionel Landwerlin	3f973a4f45	Revert "intel/fs: limit register flag interaction of FIND_LIVE_CHANNEL" This reverts commit `c9739e8912`. We don't have a full understanding of what is going on but reverting definitely fixes a hang. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `c9739e8912` ("intel/fs: limit register flag interaction of FIND_LIVE_CHANNEL") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9868 Tested-By: Valentin Geyer <trayshar@t-online.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25563>	2023-10-13 08:37:28 +03:00
Alyssa Rosenzweig	c39896b17b	nir: Use getters for nir_src::parent_* First, we need to give the parent_instr field a unique name to be able to replace with a helper. We have parent_instr fields for both nir_src and nir_def, so let's rename nir_src::parent_instr in preparation for rework. This was done with a combination of sed and manual fix-ups. Then we use semantic patches plus manual fixups: @@ expression s; @@ -s->renamed_parent_instr +nir_src_parent_instr(s) @@ expression s; @@ -s.renamed_parent_instr +nir_src_parent_instr(&s) @@ expression s; @@ -s->parent_if +nir_src_parent_if(s) @@ expression s; @@ -s.renamed_parent_if +nir_src_parent_if(&s) @@ expression s; @@ -s->is_if +nir_src_is_if(s) @@ expression s; @@ -s.is_if +nir_src_is_if(&s) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671>	2023-10-10 04:58:05 -04:00
Iván Briano	54498937c5	intel/compiler: round f2f16 correctly for RTNE case v2: bcsel -> b2i32 (Ian) Fixes upcoming Vulkan CTS tests: dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_nostorage dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_vert dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_nostorage_vert dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_frag dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_nostorage_frag Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25281>	2023-10-09 23:37:52 +00:00
Ian Romanick	bac10ef4aa	intel/fs: Add DP4A to get_lowered_simd_width While working on cooperative matrix support, I noticed some invalid DP4A instructions being generated. dp4a(32) g33<1>UD g21<8,8,1>UD g1.0<0,1,0>UD g9<1,1,1>UD This violates the constraint that the destination or a source can only access two consecutive GRFs. I'm a little surprised that validation didn't catch this. Perhaps because it's a 3 source instruction? Either way, it seems like a bigger project to fix that. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Fixes: `0f809dbf40` ("intel/compiler: Basic support for DP4A instruction") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25554>	2023-10-07 02:27:53 +00:00
Caio Oliveira	81bc09bf97	intel/fs: Tweak default case of fs_inst::size_read() In the default case, there's a special case with a few conditions. Prefer the cheapest conditions first, so we can take advantage of short-circuiting. Effect is a small but still significant reduce in shader compilation times, as can be seen by: - Fossil replay for Rise of the Tomb Raider ``` Difference at 95.0% confidence -0.433333 +/- 0.028609 -1.42556% +/- 0.0941163% (Student's t, pooled s = 0.0337886) ``` - Fossil replay for Batman Arkham City ``` Difference at 95.0% confidence -8.84 +/- 0.146083 -1.65932% +/- 0.0274207% (Student's t, pooled s = 0.125423) ``` Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25549>	2023-10-06 09:16:56 +00:00
Sviatoslav Peleshko	8361cd4c4c	intel/eu/validate: Validate "packed word exception" stricter Fixes: `75b7f5a2` ("i965: Validate "Region Alignment Rules"") Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25378>	2023-10-05 01:41:42 +00:00
Sviatoslav Peleshko	8f23b45252	intel/fs: Fix "packed word exception" condition for register regioning Fixes: `a6bf5f88` ("i965/fs: Enforce common regioning restrictions by SIMD splitting.") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9432 Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25378>	2023-10-05 01:41:42 +00:00
Lionel Landwerlin	a25f96c00c	intel/fs: switch from SIMD 1 to 8 instructions surface/sampler rematerialization SIMD1 instructions are problematic because they are considered partial writes. This increases the liveness of the destination register written by those instructions. To workaround this we use UNDEF instructions to bound the liveness of the register. But this causing other issues like in this case : undef(1) vgrf2 mov(1) vgrf2, u4.0 add(1) vgrf3, vgrf2.0, 64UD In this case the copy propagation pass in unable to see that vgrf2 in the add() instruction can be replaced with the uniform u4.0. To fix this problem, we switch NoMask SIMD8 instructions that cover the entire register. We can drop the UNDEF instructions and now copy propagation can do its job. Good results on 2 apps : Cyberpunk 2077 : Totals from 7258 (68.80% of 10549) affected shaders: Instrs: 6332210 -> 6073833 (-4.08%); split: -4.11%, +0.03% Cycles: 130667501 -> 127351268 (-2.54%); split: -3.12%, +0.58% Subgroup size: 90320 -> 90400 (+0.09%) Spill count: 90 -> 68 (-24.44%) Fill count: 82 -> 64 (-21.95%) Scratch Memory Size: 8192 -> 6144 (-25.00%) Max live registers: 385464 -> 375152 (-2.68%) Max dispatch width: 64336 -> 64424 (+0.14%); split: +0.96%, -0.82% Gaining 60 SIMD16/SIMD32 shaders, loosing 33 Strange Brigade : Totals from 2137 (53.12% of 4023) affected shaders: Instrs: 1544031 -> 1457544 (-5.60%); split: -5.60%, +0.00% Cycles: 22292564 -> 21868978 (-1.90%); split: -2.43%, +0.53% Subgroup size: 25328 -> 25344 (+0.06%) Max live registers: 113716 -> 111214 (-2.20%) Max dispatch width: 17232 -> 18608 (+7.99%); split: +8.36%, -0.37% Gaining 138 SIMD16/SIMD32 shaders, loosing 4 On app slightly negatively affected : Dota2 : Totals from 232 (14.73% of 1575) affected shaders: Instrs: 30029 -> 28194 (-6.11%) Cycles: 385155 -> 371422 (-3.57%); split: -3.59%, +0.02% Max live registers: 6792 -> 6780 (-0.18%) Max dispatch width: 2256 -> 2160 (-4.26%) Loosing 6 SIMD32 shaders Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24554>	2023-09-29 10:46:47 +00:00
Lionel Landwerlin	d28f42f85d	intel/fs: handle add3 in surface/sampler rematerialization Some recent NIR changes started generated those instructions. We need to handle them to be able to rematerialize. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24554>	2023-09-29 10:46:47 +00:00

1 2 3 4 5 ...

2830 commits