fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 18:00:13 +01:00

Author	SHA1	Message	Date
Ian Romanick	91f7e262e1	intel/fs: Silence unused parameter warning in filter_simd src/intel/compiler/brw_fs.cpp: In function ‘bool filter_simd(const nir_instr, const void)’: src/intel/compiler/brw_fs.cpp:8870:50: warning: unused parameter ‘_options’ [-Wunused-parameter] 8870 \| filter_simd(const nir_instr instr, const void _options) \| ~~~~~~~~~~~~~^~~~~~~~ Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7650>	2020-11-19 21:23:53 +00:00
Kenneth Graunke	31290f9806	intel/fs: Fix sampler message headers on Gen11+ when using scratch Icelake's sampler message header introduces a field in m0.3 bit 0 which controls whether the sampler state pointer should be relative to bindless sampler state base address or dynamic state base address. g0.3 bit 0 is part of the per-thread scratch space field. On older hardware, we were able to copy that along because the sampler ignored bits 4:0. Now, however, we need to mask them out. Fixes various textureGatherOffsets piglit tests when forcing the FS to run with 2048 bytes of per-thread scratch space (which is a per-thread scratch space encoding of 1, meaning bit 0 will be set). Cc: mesa-stable Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6735>	2020-11-18 23:32:09 +00:00
Caio Marcelo de Oliveira Filho	b3daf341d4	intel/fs: Add assert on the brw_STAGE_prog_data downcasts Motivation is to detect earlier certain bugs that can occur when missing a check for the stage before using the downcast. Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7540>	2020-11-16 12:40:59 -09:00
Jason Ekstrand	e9caba6ce5	intel/fs: Fix use of undefined value in fixup_nomask_control_flow Fixes: `a8ac0bd759` "intel/fs/gen12: Workaround unwanted SEND execution..." Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7536>	2020-11-11 17:42:47 +00:00
Caio Marcelo de Oliveira Filho	d372abe397	intel/fs: Add surface OWORD BLOCK opcodes Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7448>	2020-11-04 20:24:48 +00:00
Caio Marcelo de Oliveira Filho	d3d2b73fa3	intel/fs: Add A64 OWORD BLOCK opcodes Based on a patch for OWORD BLOCK READ from Jason Ekstrand. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7448>	2020-11-04 20:24:48 +00:00
Caio Marcelo de Oliveira Filho	e7e24d5039	intel/fs: Handle nir_intrinsic_terminate For terminate operation, jump the invocation without predicating on the rest of the quad being disabled -- which is what is done for demote and discard. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7150>	2020-10-15 21:40:09 +00:00
Kenneth Graunke	341f5bffb7	intel/compiler, anv: Delete cs_prog_data->slm_size cs_prog_data->slm_size is basically redundant with prog_data->total_shared, which is the field that we actually use for controlling the shared local memory size in all drivers. We were still using it in one place for VK_EXT_pipeline_executable_properties, but we should just fix that and delete the field. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7152>	2020-10-14 23:13:41 +00:00
Jason Ekstrand	06ebf23283	intel/fs: Add a SCRATCH_HEADER opcode This opcode is responsible for setting up the buffer base address and per-thread scratch space fields of a scratch message header. For the most part, it's a copy of g0 but some messages need us to zero out g0.2 and the bottom bits of g0.5. This may actually fix a bug when nir_load/store_scratch is used. The docs say that the DWORD scattered messages respect the per-thread scratch size specified in gN.3[3:0] in the message header but we've been leaving it zero. This may mean that we've been ignoring any scratch reads/writes from a load/store_scratch intrinsic above the 1KB mark. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>	2020-10-13 21:59:27 +00:00
Jason Ekstrand	24b64c8408	intel/fs: Copy the PTSS from g0 for scratch reads/writes In theory, this fixes a bug where we were dropping the PTSS bound on the floor. The hardware docs claim that the A32 DWORD and BYTE scattered read/write messages do a PTSS bounds check. However, in practice, it seems that the hardware ignores the bounds check so this doesn't actually matter. I verified this with the following couple of piglit tests: https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/399 In practice, this prevents the next commit from making a subtle behavioral change. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>	2020-10-13 21:59:27 +00:00
Jason Ekstrand	3d22de05ca	intel/fs: Add an option to use dataport messages for UBOs Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3932>	2020-10-08 01:17:06 -05:00
Jason Ekstrand	0d462dbee5	intel/fs: Add an alignment to VARYING_PULL_CONSTANT_LOAD_LOGICAL Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3932>	2020-10-08 01:14:46 -05:00
Marcin Ślusarz	9c25689287	intel: drop likely/unlikely around INTEL_DEBUG It's included in declaration of INTEL_DEBUG. Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6732>	2020-10-06 18:43:07 +00:00
Danylo Piliaiev	77486db867	intel/fs: Disable sample mask predication for scratch stores Scratch stores are being lowered to the instructions with side-effects, however they should be enabled in fs helper invocations, since they are produced from operations which don't imply side-effects. To fix this - we move the decision of whether the sample mask predication is enable to the point where logical brw instructions are created. GLSL example of the issue: int tmp[1024]; ... do { // changes to tmp } while (some_condition(tmp)) If `tmp` is lowered to scrach memory, `some_condition` would be undefined if scratch write is predicated on sample mask, making possible for the while loop to become infinite and hang the GPU. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3256 Fixes: `53bfcdeecf` Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6056>	2020-09-25 09:48:06 +00:00
Marcin Ślusarz	64b0b7c274	intel/compiler: fix typo in a comment Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6602>	2020-09-04 17:38:25 +00:00
Marcin Ślusarz	95ce619680	intel/compiler: print dispatch width when shader fails to compile Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6602>	2020-09-04 17:38:25 +00:00
Marcin Ślusarz	d4c6e3f196	intel/compiler: use the same name for nir shaders in brw_compile_* functions Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6602>	2020-09-04 17:38:25 +00:00
Jason Ekstrand	91becd84ae	intel/fs: Add support for a new load_reloc_const intrinsic Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>	2020-09-02 19:48:44 +00:00
Jason Ekstrand	90b6745bc8	intel/fs,vec4: Stuff the constant data from NIR in the end of the program Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>	2020-09-02 19:48:44 +00:00
Karol Herbst	70cbddc4a7	nir: use enum operator helper for nir_variable_mode and nir_metadata those are used quite a bit Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6520>	2020-09-01 17:45:08 +00:00
Jason Ekstrand	4d18e71fea	nir: Rename num_shared to shared_size This one is always a size in bytes. Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6524>	2020-09-01 17:30:51 +00:00
Jason Ekstrand	003b04e266	intel/compiler: Allow MESA_SHADER_KERNEL Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6280>	2020-08-12 10:11:06 +00:00
Mark Janes	cf52b40fb0	intel/fs: work around gen12 lower-precision source modifier limitation GEN:BUG:1604601757 prevents source modifiers for multiplication of lower precision integers. lower_mul_dword_inst() splits 32x32 multiplication into 32x16, and needs to eliminate source modifiers in this case. Closes: #3329 Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2020-08-10 13:30:45 -07:00
Jason Ekstrand	2956d53400	nir: Add nir_foreach_shader_in/out_variable helpers Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>	2020-07-29 17:38:57 +00:00
Jordan Justen	8dfa072ed8	intel/compiler/fs: Still attempt simd32 when INTEL_DEBUG=no16 is used If INTEL_DEBUG=no16 is used, then simd16 will not be attempted. This, in turn prevents simd32 from running, because we attempt to skip simd32 when simd16 fails to compile. This change more accurately recognizes when we attempted simd16, but simd16 failed. One easy way to cause an issue is to set both no8 and no16. Before this change, we would be left with no FS program, even though simd32 could still be generated in some cases. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5269>	2020-07-09 15:44:57 -07:00
Jordan Justen	1a4a2f563b	intel/compiler/cs: Allow simd32 in some more cases with no8 and/or no16 If no16 was specified, and the shader can't run in simd8 due to the local_size, then we need to generate a simd32 program. If both no8 and no16 are specified, then we need to generate a simd32 program. Rework: * Drop update of `if` that would have changed `do32` to try simd32 even if simd16 spilled registers. (Caio) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5269>	2020-07-09 15:44:34 -07:00
Jason Ekstrand	479797e130	intel/fs: Move more prog_data setup into populate_wm_prog_data Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5596>	2020-06-23 17:43:53 +00:00
Jason Ekstrand	fc519cad57	intel/fs: Break wm_prog_data setup into a helper Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5596>	2020-06-23 17:43:53 +00:00
Jason Ekstrand	2687ec5ee6	intel/fs: Expose a couple of NIR lowering helpers Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5596>	2020-06-23 17:43:53 +00:00
Arcady Goldmints-Orlov	04f77595f0	intel/compiler: Always apply sample mask on Vulkan. With OpenGL, shader writes to the sample mask are ignored when not rendering to a multisample render target. However, on Vulkan, writes to the sample mask have still have their effect in that case. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3016 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5156>	2020-06-19 20:24:11 -05:00
Sagar Ghuge	a0ef4971d0	intel/compiler: Remove unnecessary optimization for MUL 2 source instruction only support immediate for src1 operand, so no point in adding optimization condition for src0 oprand. v2: - Update commit message and don't remove ADD optimization (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5341>	2020-06-16 17:11:32 -07:00
Sagar Ghuge	d4f3f9390f	intel/compiler: Optimize integer add with 0 into mov Kaby Lake total instructions in shared programs: 326560 -> 323616 (-0.90%) instructions in affected programs: 178062 -> 175118 (-1.65%) helped: 129 HURT: 0 helped stats (abs) min: 1 max: 118 x̄: 22.82 x̃: 8 helped stats (rel) min: 0.35% max: 6.56% x̄: 2.57% x̃: 2.47% 95% mean confidence interval for instructions value: -27.71 -17.93 95% mean confidence interval for instructions %-change: -2.81% -2.32% Instructions are helped. total cycles in shared programs: 43741127 -> 45397851 (3.79%) cycles in affected programs: 40880261 -> 42536985 (4.05%) helped: 94 HURT: 34 helped stats (abs) min: 5 max: 6160 x̄: 598.91 x̃: 45 helped stats (rel) min: 0.20% max: 34.86% x̄: 2.52% x̃: 1.09% HURT stats (abs) min: 1 max: 76198 x̄: 50383.00 x̃: 69677 HURT stats (rel) min: 0.07% max: 48.41% x̄: 15.65% x̃: 6.49% 95% mean confidence interval for cycles value: 8023.10 17863.21 95% mean confidence interval for cycles %-change: <.01% 4.60% Cycles are HURT. total spills in shared programs: 1086 -> 978 (-9.94%) spills in affected programs: 897 -> 789 (-12.04%) helped: 24 HURT: 0 total fills in shared programs: 1686 -> 1584 (-6.05%) fills in affected programs: 1371 -> 1269 (-7.44%) helped: 24 HURT: 0 v2: - Use brw_reg_type_is_integer (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5341>	2020-06-16 16:54:27 -07:00
Matt Turner	66111bc95a	intel/compiler: Drop opt_sampler_eot() Gen9 and Cherryview have the ability to mark texture instructions with the End-of-thread bit under some conditions, which allows the texture result to be written to the render target directly, rather than returning to the EU. In order to handle overlapping primitives correctly, we have to use the 'sendc' instruction which stalls until other threads potentially writing to the same locations in the render target are retired. Unfortunately, this stall happens before the texture is sampled (rather than in parallel with stall), so for some literal edge cases (like the diagonal edge between two triangles forming a rectangle) there can be a performance penalty. As a result, it's probably not a good idea to use this optimization in general. I had planned to leave it enabled only for BLORP, where we use rectangle primitives and are typically clearing/blitting an entire render target without any overlapping primitives, but I noticed that the optimization wasn't applied in some normal cases anyway. For example, in the piglit test tests/shaders/glsl-fs-texture2d-bias.shader_test it is applied to one BLORP-blit shader but not another due to some kind of mishandling of register types (the destination register type of the texture operation is UD while the color source of the render target write is F). Additionally the instruction scheduler assumed that the combined texture and render target write operation took 0 cycles, leading to cycle estimates that are wildly inaccurate. Since the optimization was not implemented for SIMD32 and our decision whether to use the SIMD32 program is made by comparing the estimated performance with that of the SIMD16 shader, we wrongly threw out a bunch of SIMD32 programs that are likely profitable. total cycles in shared programs: 472807891 -> 473784245 (0.21%) cycles in affected programs: 108277 -> 1084631 (901.72%) helped: 0 HURT: 1290 total sends in shared programs: 998955 -> 1000245 (0.13%) sends in affected programs: 1400 -> 2690 (92.14%) helped: 0 HURT: 1290 LOST: 0 GAINED: 33 This patch shows no performance changes in Intel's Mesa performance CI. Given the problems, the lack of evidence that the pass improves performance, and the fact that the hardware feature was removed from subsequent GPU generations, I think that the pass is not valuable and should be removed. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5412>	2020-06-12 19:01:26 +00:00
Jason Ekstrand	94aa7997e4	intel/fs: Fix unused texture coordinate zeroing on Gen4-5 We were inserting the right number of MOVs but, thanks to the way we advanced msg_end earlier in the function, were often writing the zeros past the end of where we actually read in the register file. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5243>	2020-05-30 01:08:50 -05:00
Caio Marcelo de Oliveira Filho	90ec26a800	intel/fs: Generate multiple CS SIMD variants for variable group size This will make the GL drivers pick the right SIMD variant for a given group size set during dispatch. The heuristic implemented in brw_cs_simd_size_for_group_size() is the same as in brw_compile_cs(). The cs_prog_data::simd_size field was removed. The generated SIMD sizes are marked in a bitmask, which is already used via brw_cs_simd_size_for_group_size() by the drivers. When in variable group size, it is OK if larger SIMD shader spill, since we'd need it for the cases where the smaller one can't hold all the invocations. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5142>	2020-05-27 18:16:31 -07:00
Caio Marcelo de Oliveira Filho	cb26d9c311	intel/fs: Add helper to get prog_offset and simd_size This indirection will be used by the variable group size case in a later change. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5142>	2020-05-27 18:16:31 -07:00
Caio Marcelo de Oliveira Filho	5b5e77caa7	intel/fs: Support INTEL_DEBUG=no8,no32 in compute shaders The "no32" flag will have precedence over "do32", like is done for FS. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5142>	2020-05-27 18:16:31 -07:00
Caio Marcelo de Oliveira Filho	10d0f39beb	intel/fs: Remove min_dispatch_width spilling decision from RA Move the decision one level up, let brw_compile_() functions use the spilling information to decide whether or not a certain width compilation can spill (passed via run_() functions). The min_dispatch_width was used to compare with the dispatch_width and decide whether "a previous shader is already available, so don't accept spill". This is replaced by: - Not calling run_*() functions if it is know beforehand a smaller width already spilled -- since the larger width will spill and fail; - Explicitly passing whether or not a shader is allowed to spill. For the cases where the smaller width is available and haven't spilled, the larger width will be compiled but is only useful if it won't spill. Moving the decision to this level will be useful later for variable group size, which is a case where we want all the widths to be allowed to spill. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5142>	2020-05-27 18:16:31 -07:00
Caio Marcelo de Oliveira Filho	8cc7711924	intel/fs: Remove redundant assert() This is covered by the two previous similar asserts. Each time `v` is assigned this is asserted. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5213>	2020-05-26 20:35:03 +00:00
Caio Marcelo de Oliveira Filho	462bc408fe	intel/fs: Early return when can't satisfy explicit group size Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5213>	2020-05-26 20:35:03 +00:00
Eric Engestrom	444138d6d9	tree-wide: fix deprecated GitLab URLs They will stop working in the next GitLab release, so let's update them ASAP to make sure things are propagated to everyone by then. See: https://about.gitlab.com/releases/2020/05/06/gitlab-com-13-0-breaking-changes/#removal-of-deprecated-project-paths Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Eric Engestrom <eric@engestrom.ch> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5111>	2020-05-23 15:33:50 +00:00
Caio Marcelo de Oliveira Filho	6a6c36e977	intel/fs: Use writes_memory from shader_info Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4815>	2020-05-18 21:09:17 +00:00
Caio Marcelo de Oliveira Filho	e645bc6939	intel: Let drivers call brw_nir_lower_cs_intrinsics() The motivating factor is: this lowering may cause nir_intrinsic_load_local_group_size intrinsics to be added to the shader, and by moving this around we make possible for the drivers to lower that intrinsic by themselves. Iris will do just that in a later patch for implementing variable group size. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>	2020-05-01 12:50:37 -07:00
Caio Marcelo de Oliveira Filho	2663759af0	intel/fs: Add and use a new load_simd_width_intel intrinsic Intrinsic to get the SIMD width, which not always the same as subgroup size. Starting with a small scope (Intel), but we can rename it later to generalize if this turns out useful for other drivers. Change brw_nir_lower_cs_intrinsics() to use this intrinsic instead of a width will be passed as argument. The pass also used to optimized load_subgroup_id for the case that the workgroup fitted into a single thread (it will be constant zero). This optimization moved together with lowering of the SIMD. This is a preparation for letting the drivers call it before the brw_compile_cs() step. No shader-db changes in BDW, SKL, ICL and TGL. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>	2020-05-01 12:50:37 -07:00
Caio Marcelo de Oliveira Filho	0edb58a84e	intel/fs: Clean up variable group size handling in backend Just use the information from NIR shader_info. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>	2020-05-01 12:50:28 -07:00
Francisco Jerez	6579f562c3	intel/ir: Use brw::performance object instead of CFG cycle counts for codegen stats. These should be more accurate than the current cycle counts, since among other things they consider the effect of post-scheduling passes like the software scoreboard on TGL. In addition it will enable us to clean up some of the now redundant cycle-count estimation functionality in the instruction scheduler. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	65342be3ae	intel/fs: Add INTEL_DEBUG=no32 debugging flag. This is useful in order to identify codegen issues caused by SIMD32. It doesn't currently have any effect on compute shaders since SIMD32 dispatch is only enabled for CS when it's strictly necessary to do so in order to support the workgroup size requested for the shader -- That might change in the future though when we hook up the SIMD32 heuristic to CS compilation. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	14f0a5cf64	intel/fs: Implement performance analysis-based SIMD32 heuristic for fragment shaders. The heuristic enables the SIMD32 fragment shader based on whether the IR performance modeling pass predicts it to have greater throughput than the SIMD16 and SIMD8 variants of the same shader. It would be straightforward to do the same thing in order to control whether SIMD16 dispatch is enabled, but it's pending additional performance evaluation. The INTEL_DEBUG=do32 option is left around in order to force the SIMD32 shader to be used regardless of the result of the heuristic, since it's useful as a debugging aid e.g. in order to identify SIMD32-specific codegen issues which may be masked by the SIMD32 heuristic, or cases where the heuristic is incorrectly disabling SIMD32 shaders that offer a performance advantage. Currently this is only enabled on Gen6+, since SIMD32 codegen support is incomplete on earlier platforms. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	d6aa0c261f	intel/fs: Heap-allocate fs_visitors in brw_compile_fs(). This makes brw_compile_fs() look a bit more similar to brw_compile_cs(). It saves us three v*_shader_stats local variables, and will save us additional triplicated declarations as we start tracking IR performance analysis results. The triplicated cfg pointers are left around because they're set to NULL to mark specific dispatch modes as disabled (e.g. in order to enforce hardware restrictions). Doing the same thing with the visitor pointers would cause data leaks. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	6310a05f68	intel/fs: Rename half() helpers to quarter(), allow index up to 3. Makes more sense considering SIMD32. Relaxing the assertion in brw_ir_fs.h will be required in order to avoid assertion failures on SNB with SIMD32 fragment shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:29 -07:00

... 2 3 4 5 6 ...

487 commits