fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 20:10:14 +01:00

Author	SHA1	Message	Date
Caio Marcelo de Oliveira Filho	9f3d5e99ea	compiler: Use util/bitset.h for system_values_read It is currently a bitset on top of a uint64_t but there are already more than 64 values. Change to use BITSET to cover all the SYSTEM_VALUE_MAX bits. Cc: mesa-stable Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8585>	2021-01-26 20:20:47 +00:00
Lionel Landwerlin	65f7b93435	intel: silence unused var warnings in release builds v2: Use ASSERTED Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4162 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8681>	2021-01-25 09:04:32 +00:00
Jason Ekstrand	44571c6a68	intel/fs: Properly lower 64-bit MUL on 64-bit-incapable platforms There are two problems this commit solves: First, is that the 64x64 MUL lowering generates a Q MOV which, because of how late it runs in the compile pipeline, it never gets removed. Second, it generates 32x32 MULs and we have to run it a second time to lower those. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>	2021-01-22 18:38:38 +00:00
Jason Ekstrand	69a3559efd	intel/reg,fs: Handle immediates properly in subscript() Just returning the original type isn't what we want in basically any case. Mask and shift the immediate as needed. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>	2021-01-22 18:38:37 +00:00
Jason Ekstrand	369eab9420	intel/fs: Emit code for Gen12-HP indirect compute data Reworks: * Jordan: Apply to gen > 12 * Jordan: Adjust comment about loading constants Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8342>	2021-01-13 13:10:28 -08:00
Jason Ekstrand	b4ffbf1521	intel/fs: Allow compute dispatch without a pushed subgroup ID on Gen12-HP Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8342>	2021-01-13 13:10:27 -08:00
Jason Ekstrand	6992d2f625	intel/fs: Emit HALT_TARGET in emit_nir_code() Instead of making it a fragment-specific thing based on uses_kill, track whether or not we need one in fs_visitor and emit HALT_TARGET at the end of emit_nir_code() if needed. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>	2020-12-01 16:19:14 -06:00
Jason Ekstrand	4a7f0aa2e0	intel/fs: Remove unnecessary HALT_TARGET in opt_redundant_halt() This means the pass has to walk all the instructions but it was doing that in a bunch of cases anyway when it didn't have a HALT_TARGET. However, removing HALT_TARGET frees up the scheduler a bit because HALT_TARGET is considered a scheduling barrier. The shader-db results are kind-of a wash but we're about to add HALT_TARGET unconditionally so we want to be able to get rid of it. Shader-db results on Ice Lake: total instructions in shared programs: 19935623 -> 19935623 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 976758472 -> 976766135 (<.01%) cycles in affected programs: 11097707 -> 11105370 (0.07%) helped: 1750 HURT: 875 helped stats (abs) min: 1 max: 866 x̄: 26.39 x̃: 4 helped stats (rel) min: <.01% max: 39.24% x̄: 1.25% x̃: 0.46% HURT stats (abs) min: 1 max: 1678 x̄: 61.54 x̃: 10 HURT stats (rel) min: <.01% max: 65.69% x̄: 1.86% x̃: 0.42% 95% mean confidence interval for cycles value: -2.48 8.32 95% mean confidence interval for cycles %-change: -0.40% -0.03% Inconclusive result (value mean confidence interval includes 0). LOST: 62 GAINED: 46 All of the lost/gained programs are SIMD32 fragment shaders. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>	2020-12-01 16:19:10 -06:00
Jason Ekstrand	f9d549b2bf	intel/fs: Use BRW_OPCODE_HALT for discards We're about to start using it to implement nir_jump_halt which has nothing inherently to do with fragment shaders or discards. May as well name it for the HW instruction it generates. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>	2020-12-01 16:19:08 -06:00
Jason Ekstrand	e76e359007	intel/fs: Rename PLACEHOLDER_HALT to HALT_TARGET It's a bit more explicit and will play more nicely with what we're about to do. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>	2020-12-01 16:18:50 -06:00
Jason Ekstrand	75209d5bd1	intel/fs: Add and implement intel-specific ray-tracing intrinsics Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>	2020-11-25 05:37:10 +00:00
Jason Ekstrand	7280b0911d	intel/compiler: Add support for bindless shaders The Intel bindless thread dispatch model is very simple. When a compute shader is to be used for bindless dispatch, it can request a set of stack IDs. These are allocated per-dual-subslice by the hardware and recycled automatically when the stack ID is returned. Passed to the bindless dispatch are a global argument address, a stack ID, and an address of the BINDLESS_SHADER_RECORD to invoke. When the bindless shader is dispatched, it is passed its stack ID as well as the global and local argument pointers. The local argument pointer is the address of the BINDLESS_SHADER_RECORD plus some offset which is specified as part of the BINDLESS_SHADER_RECORD. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>	2020-11-25 05:37:09 +00:00
Ian Romanick	50fef61fa5	intel/fs: Add support for printing half-float immediate values v2: Remove offensive, extraneous 0 in hex constant. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7650>	2020-11-19 21:23:53 +00:00
Ian Romanick	91f7e262e1	intel/fs: Silence unused parameter warning in filter_simd src/intel/compiler/brw_fs.cpp: In function ‘bool filter_simd(const nir_instr, const void)’: src/intel/compiler/brw_fs.cpp:8870:50: warning: unused parameter ‘_options’ [-Wunused-parameter] 8870 \| filter_simd(const nir_instr instr, const void _options) \| ~~~~~~~~~~~~~^~~~~~~~ Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7650>	2020-11-19 21:23:53 +00:00
Kenneth Graunke	31290f9806	intel/fs: Fix sampler message headers on Gen11+ when using scratch Icelake's sampler message header introduces a field in m0.3 bit 0 which controls whether the sampler state pointer should be relative to bindless sampler state base address or dynamic state base address. g0.3 bit 0 is part of the per-thread scratch space field. On older hardware, we were able to copy that along because the sampler ignored bits 4:0. Now, however, we need to mask them out. Fixes various textureGatherOffsets piglit tests when forcing the FS to run with 2048 bytes of per-thread scratch space (which is a per-thread scratch space encoding of 1, meaning bit 0 will be set). Cc: mesa-stable Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6735>	2020-11-18 23:32:09 +00:00
Caio Marcelo de Oliveira Filho	b3daf341d4	intel/fs: Add assert on the brw_STAGE_prog_data downcasts Motivation is to detect earlier certain bugs that can occur when missing a check for the stage before using the downcast. Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7540>	2020-11-16 12:40:59 -09:00
Jason Ekstrand	e9caba6ce5	intel/fs: Fix use of undefined value in fixup_nomask_control_flow Fixes: `a8ac0bd759` "intel/fs/gen12: Workaround unwanted SEND execution..." Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7536>	2020-11-11 17:42:47 +00:00
Caio Marcelo de Oliveira Filho	d372abe397	intel/fs: Add surface OWORD BLOCK opcodes Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7448>	2020-11-04 20:24:48 +00:00
Caio Marcelo de Oliveira Filho	d3d2b73fa3	intel/fs: Add A64 OWORD BLOCK opcodes Based on a patch for OWORD BLOCK READ from Jason Ekstrand. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7448>	2020-11-04 20:24:48 +00:00
Caio Marcelo de Oliveira Filho	e7e24d5039	intel/fs: Handle nir_intrinsic_terminate For terminate operation, jump the invocation without predicating on the rest of the quad being disabled -- which is what is done for demote and discard. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7150>	2020-10-15 21:40:09 +00:00
Kenneth Graunke	341f5bffb7	intel/compiler, anv: Delete cs_prog_data->slm_size cs_prog_data->slm_size is basically redundant with prog_data->total_shared, which is the field that we actually use for controlling the shared local memory size in all drivers. We were still using it in one place for VK_EXT_pipeline_executable_properties, but we should just fix that and delete the field. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7152>	2020-10-14 23:13:41 +00:00
Jason Ekstrand	06ebf23283	intel/fs: Add a SCRATCH_HEADER opcode This opcode is responsible for setting up the buffer base address and per-thread scratch space fields of a scratch message header. For the most part, it's a copy of g0 but some messages need us to zero out g0.2 and the bottom bits of g0.5. This may actually fix a bug when nir_load/store_scratch is used. The docs say that the DWORD scattered messages respect the per-thread scratch size specified in gN.3[3:0] in the message header but we've been leaving it zero. This may mean that we've been ignoring any scratch reads/writes from a load/store_scratch intrinsic above the 1KB mark. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>	2020-10-13 21:59:27 +00:00
Jason Ekstrand	24b64c8408	intel/fs: Copy the PTSS from g0 for scratch reads/writes In theory, this fixes a bug where we were dropping the PTSS bound on the floor. The hardware docs claim that the A32 DWORD and BYTE scattered read/write messages do a PTSS bounds check. However, in practice, it seems that the hardware ignores the bounds check so this doesn't actually matter. I verified this with the following couple of piglit tests: https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/399 In practice, this prevents the next commit from making a subtle behavioral change. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>	2020-10-13 21:59:27 +00:00
Jason Ekstrand	3d22de05ca	intel/fs: Add an option to use dataport messages for UBOs Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3932>	2020-10-08 01:17:06 -05:00
Jason Ekstrand	0d462dbee5	intel/fs: Add an alignment to VARYING_PULL_CONSTANT_LOAD_LOGICAL Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3932>	2020-10-08 01:14:46 -05:00
Marcin Ślusarz	9c25689287	intel: drop likely/unlikely around INTEL_DEBUG It's included in declaration of INTEL_DEBUG. Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6732>	2020-10-06 18:43:07 +00:00
Danylo Piliaiev	77486db867	intel/fs: Disable sample mask predication for scratch stores Scratch stores are being lowered to the instructions with side-effects, however they should be enabled in fs helper invocations, since they are produced from operations which don't imply side-effects. To fix this - we move the decision of whether the sample mask predication is enable to the point where logical brw instructions are created. GLSL example of the issue: int tmp[1024]; ... do { // changes to tmp } while (some_condition(tmp)) If `tmp` is lowered to scrach memory, `some_condition` would be undefined if scratch write is predicated on sample mask, making possible for the while loop to become infinite and hang the GPU. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3256 Fixes: `53bfcdeecf` Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6056>	2020-09-25 09:48:06 +00:00
Marcin Ślusarz	64b0b7c274	intel/compiler: fix typo in a comment Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6602>	2020-09-04 17:38:25 +00:00
Marcin Ślusarz	95ce619680	intel/compiler: print dispatch width when shader fails to compile Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6602>	2020-09-04 17:38:25 +00:00
Marcin Ślusarz	d4c6e3f196	intel/compiler: use the same name for nir shaders in brw_compile_* functions Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6602>	2020-09-04 17:38:25 +00:00
Jason Ekstrand	91becd84ae	intel/fs: Add support for a new load_reloc_const intrinsic Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>	2020-09-02 19:48:44 +00:00
Jason Ekstrand	90b6745bc8	intel/fs,vec4: Stuff the constant data from NIR in the end of the program Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>	2020-09-02 19:48:44 +00:00
Karol Herbst	70cbddc4a7	nir: use enum operator helper for nir_variable_mode and nir_metadata those are used quite a bit Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6520>	2020-09-01 17:45:08 +00:00
Jason Ekstrand	4d18e71fea	nir: Rename num_shared to shared_size This one is always a size in bytes. Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6524>	2020-09-01 17:30:51 +00:00
Jason Ekstrand	003b04e266	intel/compiler: Allow MESA_SHADER_KERNEL Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6280>	2020-08-12 10:11:06 +00:00
Mark Janes	cf52b40fb0	intel/fs: work around gen12 lower-precision source modifier limitation GEN:BUG:1604601757 prevents source modifiers for multiplication of lower precision integers. lower_mul_dword_inst() splits 32x32 multiplication into 32x16, and needs to eliminate source modifiers in this case. Closes: #3329 Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2020-08-10 13:30:45 -07:00
Jason Ekstrand	2956d53400	nir: Add nir_foreach_shader_in/out_variable helpers Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>	2020-07-29 17:38:57 +00:00
Jordan Justen	8dfa072ed8	intel/compiler/fs: Still attempt simd32 when INTEL_DEBUG=no16 is used If INTEL_DEBUG=no16 is used, then simd16 will not be attempted. This, in turn prevents simd32 from running, because we attempt to skip simd32 when simd16 fails to compile. This change more accurately recognizes when we attempted simd16, but simd16 failed. One easy way to cause an issue is to set both no8 and no16. Before this change, we would be left with no FS program, even though simd32 could still be generated in some cases. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5269>	2020-07-09 15:44:57 -07:00
Jordan Justen	1a4a2f563b	intel/compiler/cs: Allow simd32 in some more cases with no8 and/or no16 If no16 was specified, and the shader can't run in simd8 due to the local_size, then we need to generate a simd32 program. If both no8 and no16 are specified, then we need to generate a simd32 program. Rework: * Drop update of `if` that would have changed `do32` to try simd32 even if simd16 spilled registers. (Caio) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5269>	2020-07-09 15:44:34 -07:00
Jason Ekstrand	479797e130	intel/fs: Move more prog_data setup into populate_wm_prog_data Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5596>	2020-06-23 17:43:53 +00:00
Jason Ekstrand	fc519cad57	intel/fs: Break wm_prog_data setup into a helper Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5596>	2020-06-23 17:43:53 +00:00
Jason Ekstrand	2687ec5ee6	intel/fs: Expose a couple of NIR lowering helpers Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5596>	2020-06-23 17:43:53 +00:00
Arcady Goldmints-Orlov	04f77595f0	intel/compiler: Always apply sample mask on Vulkan. With OpenGL, shader writes to the sample mask are ignored when not rendering to a multisample render target. However, on Vulkan, writes to the sample mask have still have their effect in that case. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3016 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5156>	2020-06-19 20:24:11 -05:00
Sagar Ghuge	a0ef4971d0	intel/compiler: Remove unnecessary optimization for MUL 2 source instruction only support immediate for src1 operand, so no point in adding optimization condition for src0 oprand. v2: - Update commit message and don't remove ADD optimization (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5341>	2020-06-16 17:11:32 -07:00
Sagar Ghuge	d4f3f9390f	intel/compiler: Optimize integer add with 0 into mov Kaby Lake total instructions in shared programs: 326560 -> 323616 (-0.90%) instructions in affected programs: 178062 -> 175118 (-1.65%) helped: 129 HURT: 0 helped stats (abs) min: 1 max: 118 x̄: 22.82 x̃: 8 helped stats (rel) min: 0.35% max: 6.56% x̄: 2.57% x̃: 2.47% 95% mean confidence interval for instructions value: -27.71 -17.93 95% mean confidence interval for instructions %-change: -2.81% -2.32% Instructions are helped. total cycles in shared programs: 43741127 -> 45397851 (3.79%) cycles in affected programs: 40880261 -> 42536985 (4.05%) helped: 94 HURT: 34 helped stats (abs) min: 5 max: 6160 x̄: 598.91 x̃: 45 helped stats (rel) min: 0.20% max: 34.86% x̄: 2.52% x̃: 1.09% HURT stats (abs) min: 1 max: 76198 x̄: 50383.00 x̃: 69677 HURT stats (rel) min: 0.07% max: 48.41% x̄: 15.65% x̃: 6.49% 95% mean confidence interval for cycles value: 8023.10 17863.21 95% mean confidence interval for cycles %-change: <.01% 4.60% Cycles are HURT. total spills in shared programs: 1086 -> 978 (-9.94%) spills in affected programs: 897 -> 789 (-12.04%) helped: 24 HURT: 0 total fills in shared programs: 1686 -> 1584 (-6.05%) fills in affected programs: 1371 -> 1269 (-7.44%) helped: 24 HURT: 0 v2: - Use brw_reg_type_is_integer (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5341>	2020-06-16 16:54:27 -07:00
Matt Turner	66111bc95a	intel/compiler: Drop opt_sampler_eot() Gen9 and Cherryview have the ability to mark texture instructions with the End-of-thread bit under some conditions, which allows the texture result to be written to the render target directly, rather than returning to the EU. In order to handle overlapping primitives correctly, we have to use the 'sendc' instruction which stalls until other threads potentially writing to the same locations in the render target are retired. Unfortunately, this stall happens before the texture is sampled (rather than in parallel with stall), so for some literal edge cases (like the diagonal edge between two triangles forming a rectangle) there can be a performance penalty. As a result, it's probably not a good idea to use this optimization in general. I had planned to leave it enabled only for BLORP, where we use rectangle primitives and are typically clearing/blitting an entire render target without any overlapping primitives, but I noticed that the optimization wasn't applied in some normal cases anyway. For example, in the piglit test tests/shaders/glsl-fs-texture2d-bias.shader_test it is applied to one BLORP-blit shader but not another due to some kind of mishandling of register types (the destination register type of the texture operation is UD while the color source of the render target write is F). Additionally the instruction scheduler assumed that the combined texture and render target write operation took 0 cycles, leading to cycle estimates that are wildly inaccurate. Since the optimization was not implemented for SIMD32 and our decision whether to use the SIMD32 program is made by comparing the estimated performance with that of the SIMD16 shader, we wrongly threw out a bunch of SIMD32 programs that are likely profitable. total cycles in shared programs: 472807891 -> 473784245 (0.21%) cycles in affected programs: 108277 -> 1084631 (901.72%) helped: 0 HURT: 1290 total sends in shared programs: 998955 -> 1000245 (0.13%) sends in affected programs: 1400 -> 2690 (92.14%) helped: 0 HURT: 1290 LOST: 0 GAINED: 33 This patch shows no performance changes in Intel's Mesa performance CI. Given the problems, the lack of evidence that the pass improves performance, and the fact that the hardware feature was removed from subsequent GPU generations, I think that the pass is not valuable and should be removed. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5412>	2020-06-12 19:01:26 +00:00
Jason Ekstrand	94aa7997e4	intel/fs: Fix unused texture coordinate zeroing on Gen4-5 We were inserting the right number of MOVs but, thanks to the way we advanced msg_end earlier in the function, were often writing the zeros past the end of where we actually read in the register file. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5243>	2020-05-30 01:08:50 -05:00
Caio Marcelo de Oliveira Filho	90ec26a800	intel/fs: Generate multiple CS SIMD variants for variable group size This will make the GL drivers pick the right SIMD variant for a given group size set during dispatch. The heuristic implemented in brw_cs_simd_size_for_group_size() is the same as in brw_compile_cs(). The cs_prog_data::simd_size field was removed. The generated SIMD sizes are marked in a bitmask, which is already used via brw_cs_simd_size_for_group_size() by the drivers. When in variable group size, it is OK if larger SIMD shader spill, since we'd need it for the cases where the smaller one can't hold all the invocations. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5142>	2020-05-27 18:16:31 -07:00
Caio Marcelo de Oliveira Filho	cb26d9c311	intel/fs: Add helper to get prog_offset and simd_size This indirection will be used by the variable group size case in a later change. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5142>	2020-05-27 18:16:31 -07:00
Caio Marcelo de Oliveira Filho	5b5e77caa7	intel/fs: Support INTEL_DEBUG=no8,no32 in compute shaders The "no32" flag will have precedence over "do32", like is done for FS. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5142>	2020-05-27 18:16:31 -07:00

1 2 3 4 5 ...

350 commits