fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 22:20:14 +01:00

Author	SHA1	Message	Date
Sviatoslav Peleshko	8f8cde4c60	intel/fs: Don't optimize DW1 MUL if it stores value to the accumulator Fixes: `a8b86459` ("i965/fs: Optimize a 1.0 -> a.") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9570 Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25710>	2023-12-19 13:32:23 +00:00
Kenneth Graunke	49b8ccbcdc	intel/fs: Drop opt_register_renaming() In the past, multiple writes to a single register were pretty common, but since we've transitioned to NIR, and leave the IR in SSA form for everything not captured in a phi-web, the pattern of generating new temporary registers at each step is a lot more common. This pass isn't nearly as useful now. Across fossil-db on Alchemist, this affects only 0.55% of shaders, which fall into two cases: - Coarse pixel shading pixel-X/Y setup. There are a few cases where we write a partial calculation into a register, then have a second instruction read that as a source and overwrite it as a destination. While we could use a temporary here, it doesn't actually help with register pressure at all, since there's the same amount of values live at both instructions regardless. So while this pass kicks in, it doesn't do anything useful. - Geometry shader control data bits (5 shaders total). We track masks for handling EndPrimitive in a single register across the program, and apparently in some cases can split the live range. However, it's a single register...only in geometry shaders...which use EndPrimitive. None of them appear to be in danger of spilling, either. So this tiny benefit doesn't seem to justify the cost of running the pass. So, just throw it out. It's not worth keeping. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26343>	2023-12-19 11:07:18 +00:00
Caio Oliveira	bfc953add7	intel/compiler: Use C helpers to access builtin types Remove usage of C++ static members as they are going to be removed. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26658>	2023-12-15 03:09:19 +00:00
Caio Oliveira	a8b2426419	intel/compiler: Use reference instead of pointer for fs_visitor Per Ian suggestion. Also clear up a few unnecessary casts around the code and use `s` for fs_visitor ("shader"). Note to include a reference in ntf we need to set it during initialization, so create an explicit mem_ctx for it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>	2023-12-12 19:36:14 +00:00
Caio Oliveira	4e5fcccd01	intel/compiler: Create and use nir_to_brw() function Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>	2023-12-12 19:36:14 +00:00
Caio Oliveira	38a42e5aa1	intel/compiler: Add ctor to fs_builder that just takes the shader Uses the dispatch_width from the shader (fs_visitor). This was not possible before because the dispatch_width was not part of backend_shader. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>	2023-12-12 19:36:14 +00:00
Caio Oliveira	cf730adc58	intel/compiler: Make fs_builder include fs_visitor and not the other way This will allow fs_builder have a reference to an fs_visitor (a "fs_shader" really), instead of a reference to a backend_shader. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>	2023-12-12 19:36:14 +00:00
Caio Oliveira	f5032c4d52	intel/compiler: Make fs_visitor not depend on fs_builder At this point this is more a header dependency due to inline functions, so shuffle them around. The end goal is to allow fs_builder have a reference to a fs_visitor (really a fs_shader). Note the header is still included, a later patch will move the includes to the call-sites. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>	2023-12-12 19:36:14 +00:00
Caio Oliveira	5b8ec015f2	intel/compiler: Don't use fs_visitor::bld in remaining places The remaining users can simply create a new builder at_end() if needed. In many places a new builder object is already being constructed, so just give more specific instructions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>	2023-12-12 19:36:14 +00:00
Caio Oliveira	c12460b01e	intel/compiler: Move NIR emission code to brw_fs_nir.cpp This is a preparation to reorganize NIR emission code. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>	2023-12-12 19:36:13 +00:00
Daniel Schürmann	1179d83a89	nir: remove info.fs.needs_all_helper_invocations Use info.uses_wide_subgroup_intrinsics instead. Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26026>	2023-11-22 11:31:11 +01:00
Lionel Landwerlin	295734bf88	intel/fs: fix residency handling on Xe2 We're missing a few reg_unit() scaling when dealing with residency data. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26208>	2023-11-15 20:06:12 +00:00
Caio Oliveira	a9f95bf687	intel/compiler: Reuse same scheduler for all pre-RA scheduling modes Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	fcd025c1ce	intel/compiler: Remove is_tex() The current name doesn't cover all the tex related instructions and in all usages, we already have a switch statement to dispatch per instruction type, so is more natural to list the instructions we care there. In fs::is_send_from_grf() we can simply ignore them since the instructions are either lowered directly to SEND (Gfx7+) or use MRF (Gfx6-). With this change, the fs_inst::size_read() generated code gets simplified (the "tex" entries get added to the switch jump table in gcc) and the default case loses the conditional handling tex. This reduces shader compilation time, as illustrated by replaying fossils (tested on my TGL laptop): ``` // Rise of the Tomb Raider (N=13) Difference at 95.0% confidence -1.32231 +/- 0.0170138 -4.37605% +/- 0.0563054% (Student's t, pooled s = 0.0210159) // Cyberpunk 2077 (N=7) Difference at 95.0% confidence -3.64 +/- 0.114993 -2.95188% +/- 0.0932544% (Student's t, pooled s = 0.09873) ``` Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25721>	2023-11-10 15:43:31 +00:00
Caio Oliveira	40416850f1	intel/compiler: Re-enable opt_zero_samples() in many cases for Gfx12.5 The workaround applies specifically to Cube and Cube Arrays, so we can still apply the optimization for the others. Ideally we would like to pull opt_zero_samples logic into the lowering sends -- to avoid adding a bit to communicate between passes. However the texture coordinates for the LOGICAL backend instructions, which are a common target for the optimization, are combined into offsets over a single VGRF, so we can't easily identify the constant cases. The copy-prop pass make this more visible for opt_zero_samples. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>	2023-11-09 03:56:28 +00:00
Caio Oliveira	daeab51a62	intel/compiler: Re-enable opt_zero_samples() for Gfx7+ Inadvertently, because of a sequence of changes elsewhere, this pass ended up not having any effect: - Before Gfx5 the optimization is not applicable. - On Gfx5-6 it doesn't apply because it sampler operations don't currently use LOAD_PAYLOAD, but write the MOVs directly. Not clear to me whether they ever did. - On Gfx7+ it doesn't apply anymore because now the logical sampler operations are now lowered directly to SENDs, and the is_tex() check would skip them. Since the LOAD_PAYLOAD implementation applies for Gfx7+ only, rework the pass to work again by handling SEND instructions. To make the pass easier, the optimization will happen before opt_split_sends() so only one LOAD_PAYLOAD needs to be cared for. Update the code to accept BAD_FILE sources in addition to zeros, these are added in some cases as padding and effectively are don't care values, so we can assume them zeros. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>	2023-11-09 03:56:28 +00:00
Caio Oliveira	ef8553082e	intel/compiler: Rework opt_split_sends to not rely/modify LOAD_PAYLOAD This is a preparation to (re-)enable opt_zero_samples(), which will reduce a SEND mlen before we split it. When that happen, opt_split_sends() won't be able to rely on the fact that mlen covers the entire LOAD_PAYLOAD. Since we are changing that, take the opportunity to also not modify the existing LOAD_PAYLOAD, just create two new ones with the exact set of sources. This allows the pass to be further simplified by iterating forward and not require live_variables analysis. The helper function was added so can be used later for opt_zero_samples(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>	2023-11-09 03:56:28 +00:00
Caio Oliveira	e017bcae59	intel/compiler: Clarify the asserts in nir_load_workgroup_id lowering For Task/Mesh WorkgroupID is now lowered to WorkgroupIndex by the generic NIR pass, so we shouldn't hit this. We can now simplify the asserting code in emit_work_group_id_setup(). Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25977>	2023-11-08 17:18:36 -08:00
Kenneth Graunke	48f60f4c4b	intel/compiler: Convert the repclear shader to use send-from-GRF Sandybridge uses this code and needs MRFs, but all other platforms send from GRFs. Do that directly rather than relying on the MRF hack. Ivybridge and later also use SHADER_OPCODE_SEND directly rather than a virtual opcode that's handled in the generator, so we follow suit. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>	2023-10-30 23:03:23 +00:00
Kenneth Graunke	ef7d1b5f44	intel/compiler: Drop unused saturate handling in repclear shader We never set key->clamp_fragment_color when compiling the BLORP fast clear shaders. Besides, we were setting saturate on an FB write opcode, which...isn't even a thing. We would need it on the MOV, and weren't setting it there. So it wouldn't have even worked. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>	2023-10-30 23:03:23 +00:00
Kenneth Graunke	e6d9267d4f	intel/compiler: Delete repclear shader's special case for 1 color target This is basically just once through the loop but copy and pasted. One difference is that the single render target case used a headerless message, and the multiple render target case always used headers. Now we use headerless messages for the first render target, even in the multiple render target case. While we already have it set up for the other RTs, it's still 2 fewer registers to send. Minor improvement. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>	2023-10-30 23:03:23 +00:00
Kenneth Graunke	e6460fe66b	intel/compiler: Delete unused repclear shader uniform handling A long time ago, we used a uniform for the clear color. Back in 2014, we added support for using a flat input instead, as this was easier for Vulkan, but we left the option of using a uniform for OpenGL. Eventually nobody used the uniform approach anymore, but the compiler code to handle it remained. Drop the dead code. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>	2023-10-30 23:03:23 +00:00
Kenneth Graunke	b35f1fc910	intel/compiler: Delete unused emit_dummy_fs() This code is compiled out, but has been left in place in case we wanted to use it for debugging something. In the olden days, we'd use it for platform enabling. I can't think of the last time we did that, though. I also used to use it for debugging. If something was misrendering, I'd iterate through shaders 0..N, replacing them with "draw hot pink" until whatever shader was drawing the bad stuff was brightly illuminated. Once it was identified, I'd start investigating that shader. These days, we have frameretrace and renderdoc which are like, actual tools that let you highlight draws and replace shaders. So we don't need to resort iterative driver hacks anymore. Again, I can't think of the last time I actually did that. So, this code is basically just dead. And it's using legacy MRF paths, which we could update...or we could just delete it. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>	2023-10-30 23:03:23 +00:00
Lionel Landwerlin	3f973a4f45	Revert "intel/fs: limit register flag interaction of FIND_LIVE_CHANNEL" This reverts commit `c9739e8912`. We don't have a full understanding of what is going on but reverting definitely fixes a hang. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `c9739e8912` ("intel/fs: limit register flag interaction of FIND_LIVE_CHANNEL") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9868 Tested-By: Valentin Geyer <trayshar@t-online.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25563>	2023-10-13 08:37:28 +03:00
Alyssa Rosenzweig	c39896b17b	nir: Use getters for nir_src::parent_* First, we need to give the parent_instr field a unique name to be able to replace with a helper. We have parent_instr fields for both nir_src and nir_def, so let's rename nir_src::parent_instr in preparation for rework. This was done with a combination of sed and manual fix-ups. Then we use semantic patches plus manual fixups: @@ expression s; @@ -s->renamed_parent_instr +nir_src_parent_instr(s) @@ expression s; @@ -s.renamed_parent_instr +nir_src_parent_instr(&s) @@ expression s; @@ -s->parent_if +nir_src_parent_if(s) @@ expression s; @@ -s.renamed_parent_if +nir_src_parent_if(&s) @@ expression s; @@ -s->is_if +nir_src_is_if(s) @@ expression s; @@ -s.is_if +nir_src_is_if(&s) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671>	2023-10-10 04:58:05 -04:00
Ian Romanick	bac10ef4aa	intel/fs: Add DP4A to get_lowered_simd_width While working on cooperative matrix support, I noticed some invalid DP4A instructions being generated. dp4a(32) g33<1>UD g21<8,8,1>UD g1.0<0,1,0>UD g9<1,1,1>UD This violates the constraint that the destination or a source can only access two consecutive GRFs. I'm a little surprised that validation didn't catch this. Perhaps because it's a 3 source instruction? Either way, it seems like a bigger project to fix that. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Fixes: `0f809dbf40` ("intel/compiler: Basic support for DP4A instruction") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25554>	2023-10-07 02:27:53 +00:00
Caio Oliveira	81bc09bf97	intel/fs: Tweak default case of fs_inst::size_read() In the default case, there's a special case with a few conditions. Prefer the cheapest conditions first, so we can take advantage of short-circuiting. Effect is a small but still significant reduce in shader compilation times, as can be seen by: - Fossil replay for Rise of the Tomb Raider ``` Difference at 95.0% confidence -0.433333 +/- 0.028609 -1.42556% +/- 0.0941163% (Student's t, pooled s = 0.0337886) ``` - Fossil replay for Batman Arkham City ``` Difference at 95.0% confidence -8.84 +/- 0.146083 -1.65932% +/- 0.0274207% (Student's t, pooled s = 0.125423) ``` Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25549>	2023-10-06 09:16:56 +00:00
Sviatoslav Peleshko	8f23b45252	intel/fs: Fix "packed word exception" condition for register regioning Fixes: `a6bf5f88` ("i965/fs: Enforce common regioning restrictions by SIMD splitting.") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9432 Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25378>	2023-10-05 01:41:42 +00:00
Francisco Jerez	53d1d793cb	intel/fs: Delete manual 'inst->mlen' calculations from all uses of logical URB writes. Rework: * Marcin: update emit_urb_indirect_vec4_write Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25195>	2023-09-27 23:57:25 +00:00
Francisco Jerez	34a2c9ce35	intel/fs: Specify number of data components of logical URB writes via control immediate. This is what most logical SEND messages do when they take a variable number of components. 'inst->mlen' is expected to be zero for logical SEND opcodes, which are expected to behave like plain arithmetic operations, so certain automated transformations (like SIMD lowering) can manipulate them without opcode-specific special-casing. Guessing the number of components from 'inst->mlen' has other disadvantages, because it requires duplicating the logic that infers the message payload size in every use of the instruction -- Instead we can just do the computation once during logical send lowering. In addition on LNL platform this causes the 'inst->mlen' field of URB writes to have units inconsistent with every other SEND instruction, which is likely to lead to confusion and bugs down the road. Rework: * Marcin: update emit_urb_indirect_vec4_write Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25195>	2023-09-27 23:57:25 +00:00
Jordan Justen	c28539a2fe	intel/compiler: Use enum xe2_lsc_cache_load on xe2 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25195>	2023-09-27 23:57:25 +00:00
Ian Romanick	feec9166cd	intel/compiler/xe2: Handle new URB write messages Rework: * idr v1: Fix compilation error. * idr v2: Add support for per-channel offsets. * idr v3: get_lowered_simd_width is 16 on Xe2+. * idr v4: Add disassembly support. Add validation support. * Sqaushed in changes Marcin Ślusarz's patches: * "intel/compiler: skip adding 0 to payload address" * "intel/compiler/xe2: drop masking off top 8 bits of URB handle" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25195>	2023-09-27 23:57:25 +00:00
Caio Oliveira	c487ba26ca	intel/compiler: Don't store stage name and abbrev Those are used in the failure paths and are easily retriavable from the stage itself, so no need to store them. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25367>	2023-09-26 18:12:53 -07:00
Caio Oliveira	1cdc4be14b	intel/compiler: Don't allocate memory for SIMD select error handling The position in the error array already indicate the SIMD in question, so take off all the formatted printing from the errors -- which in some cases were just not needed. We lose a little bit of extra context but it is all easily derivable from the message and the SIMD. This also will remove the overhead when SIMD selection is being used to just to find the selected dispatch width -- at a point where the shaders were already compiled -- and the errors are not used at all. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9849 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25336>	2023-09-22 16:23:02 +00:00
Jordan Justen	9846dd798b	intel/compiler: Update opt_split_sends() for Xe2 reg size Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 23:06:04 -07:00
Jordan Justen	727ab2c11d	intel/compiler/fs: Support Xe2 reg size in assign_curb_setup Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Caio Oliveira	8944ac7d6c	intel/fs/xe2+: Update BS payload setup for Xe2 reg size. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Francisco Jerez	14e1b9ee69	intel/fs/xe2+: Update TES payload setup for Xe2 reg size. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Ian Romanick	0b23df3951	intel/compiler/xe2: Update fs_visitor::setup_vs_payload to account for Xe2 reg size [ Francisco Jerez: Simplify. ] Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Francisco Jerez	a573531785	intel/compiler/xe2+: Represent dispatch_grf_start_reg in native GRF units. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Francisco Jerez	17ef5e7ead	intel/fs/xe2+: Allow increased SIMD width for various get_fpu_lowered_simd_width() restrictions. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Francisco Jerez	421d43fe62	intel/fs/xe2+: Fixes for increased accumulator register width. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Francisco Jerez	bd98df5d8e	intel/compiler: Make MAX_VGRF_SIZE macro depend on devinfo and update it for Xe2. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Iván Briano	4eddeea7bf	intel/fs: handle URB setup for fast linked mesh pipelines Up until now, the mesh pipeline assumed it would be always linked to the fragment shader, and so the calculated MUE map would always be available. That is not the case for fast linked pipeline libraries, so the URB setup needs to account for this. We do this by replicating what's done for non-mesh pipelines, defining the URB based on the FS inputs, and always assuming they will be laid out in order of varying number, except that we also account for per-primitive attributes. Fixes all GPL using tests under dEQP-VK.mesh_shader.ext.smoke.* Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25047>	2023-09-12 02:51:31 +00:00
Lionel Landwerlin	c9739e8912	intel/fs: limit register flag interaction of FIND_*LIVE_CHANNEL Those instructions do not access the flag registers on Gfx8+. Removing the interaction enables CSE to remove more of those instructions. Results are a bit mixed (DG2 vulkan fossils): ACO: Totals from 127 (5.97% of 2128) affected shaders: Instrs: 139966 -> 138972 (-0.71%); split: -0.85%, +0.14% Cycles: 1685747 -> 1667480 (-1.08%); split: -2.35%, +1.26% Max live registers: 10582 -> 10544 (-0.36%) Max dispatch width: 1048 -> 1040 (-0.76%) Cyberpunk 2077: Totals from 2879 (27.95% of 10301) affected shaders: Instrs: 4264789 -> 4225666 (-0.92%); split: -1.01%, +0.09% Cycles: 72380209 -> 71619521 (-1.05%); split: -1.63%, +0.58% Subgroup size: 30624 -> 30632 (+0.03%) Spill count: 98 -> 101 (+3.06%) Fill count: 90 -> 93 (+3.33%) Scratch Memory Size: 8192 -> 9216 (+12.50%) Max live registers: 217807 -> 217098 (-0.33%); split: -0.59%, +0.26% Max dispatch width: 23792 -> 24112 (+1.34%) Gaining 40 SIMD16 shaders Rise Of The Tomb Raider: Totals from 622 (5.06% of 12289) affected shaders: Instrs: 437380 -> 434760 (-0.60%); split: -0.72%, +0.12% Cycles: 261843085 -> 261580703 (-0.10%); split: -0.73%, +0.63% Max live registers: 27731 -> 27766 (+0.13%); split: -1.01%, +1.14% Max dispatch width: 5832 -> 5432 (-6.86%); split: +0.27%, -7.13% Loosing 26 SIMD32 shaders Strange Brigade: Totals from 1298 (31.48% of 4123) affected shaders: Instrs: 1504408 -> 1487968 (-1.09%); split: -1.17%, +0.08% Cycles: 20735976 -> 20443216 (-1.41%); split: -1.60%, +0.19% Max live registers: 89911 -> 89957 (+0.05%) DG2 shader-db run: total instructions in shared programs: 23130895 -> 23130036 (<.01%) instructions in affected programs: 260956 -> 260097 (-0.33%) helped: 234 HURT: 101 helped stats (abs) min: 1 max: 54 x̄: 6.36 x̃: 4 helped stats (rel) min: 0.05% max: 8.16% x̄: 2.01% x̃: 1.90% HURT stats (abs) min: 1 max: 37 x̄: 6.23 x̃: 3 HURT stats (rel) min: 0.02% max: 5.67% x̄: 0.89% x̃: 0.55% 95% mean confidence interval for instructions value: -3.62 -1.51 95% mean confidence interval for instructions %-change: -1.33% -0.94% Instructions are helped. total loops in shared programs: 6071 -> 6071 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 898610645 -> 898557166 (<.01%) cycles in affected programs: 18308201 -> 18254722 (-0.29%) helped: 315 HURT: 48 helped stats (abs) min: 1 max: 19312 x̄: 404.23 x̃: 128 helped stats (rel) min: 0.02% max: 28.98% x̄: 3.92% x̃: 2.65% HURT stats (abs) min: 2 max: 14478 x̄: 1538.60 x̃: 409 HURT stats (rel) min: <.01% max: 23.24% x̄: 3.34% x̃: 0.41% 95% mean confidence interval for cycles value: -333.68 39.03 95% mean confidence interval for cycles %-change: -3.51% -2.41% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 5964 -> 5964 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 6909 -> 6909 (0.00%) fills in affected programs: 0 -> 0 helped: 0 HURT: 0 total sends in shared programs: 1040266 -> 1040266 (0.00%) sends in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 3 GAINED: 1 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24553>	2023-09-06 14:47:40 +00:00
Kenneth Graunke	08fc4603dd	intel/fs: Dump IR for pre-RA scheduler modes in DEBUG_OPTIMIZER This lets us more easily compare and contrast the various scheduling options that the compiler considered. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>	2023-08-23 21:34:38 +00:00
Kenneth Graunke	07f2ad32e4	intel/fs: Pick the lowest register pressure schedule when spilling We try various pre-RA scheduler modes and see if any of them allow us to register allocate without spilling. If all of them spill, however, we left it on the last mode: LIFO. This is unfortunately sometimes significantly worse than other modes (such as "none"). This patch makes us instead select the pre-RA scheduling mode that gives the lowest register pressure estimate, if none of them manage to avoid spilling. The hope is that this scheduling will spill the least out of all of them. fossil-db stats (on Alchemist) speak for themselves: Totals: Instrs: 197297092 -> 195326552 (-1.00%); split: -1.02%, +0.03% Cycles: 14291286956 -> 14303502596 (+0.09%); split: -0.55%, +0.64% Spill count: 190886 -> 129204 (-32.31%); split: -33.01%, +0.70% Fill count: 361408 -> 225038 (-37.73%); split: -39.17%, +1.43% Scratch Memory Size: 12935168 -> 10868736 (-15.98%); split: -16.08%, +0.10% Totals from 1791 (0.27% of 668386) affected shaders: Instrs: 7628929 -> 5658389 (-25.83%); split: -26.50%, +0.67% Cycles: 719326691 -> 731542331 (+1.70%); split: -10.95%, +12.65% Spill count: 110627 -> 48945 (-55.76%); split: -56.96%, +1.20% Fill count: 221560 -> 85190 (-61.55%); split: -63.89%, +2.34% Scratch Memory Size: 4471808 -> 2405376 (-46.21%); split: -46.51%, +0.30% Improves performance when using XeSS in Cyberpunk 2077 by 90% on A770. Improves performance of Borderlands 3 by 1.54% on A770. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>	2023-08-23 21:34:38 +00:00
Kenneth Graunke	158ac265df	intel/fs: Make helpers for saving/restoring instruction order This moves a bit of code out of a large function, but also lets us reuse it a few extra places in the next commit. I opted to stop using ralloc here since this is short-lived data that doesn't need to stick around for the rest of the compile, and it's easy enough to free. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>	2023-08-23 21:34:38 +00:00
Kenneth Graunke	2dd56921c9	intel/fs: Index scheduler mode string table by mode enum pre_modes[] is an array with the modes ordered in our desired preference. scheduler_mode_name[] was also in that order, and the two had to be kept in sync. This is a little silly; we should just have a mode enum -> string table and look it up via the enum. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>	2023-08-23 21:34:38 +00:00
Kenneth Graunke	7eba19245d	intel/compiler: Move SCHEDULE_NONE handling into schedule_instructions() I'm going to introduce another call site for this function, and just handling SCHEDULE_NONE in the scheduler itself makes more sense than duplicating the logic. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>	2023-08-23 21:34:38 +00:00

1 2 3 4 5 ...

663 commits