fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-24 19:40:10 +01:00

Author	SHA1	Message	Date
Kenneth Graunke	26bb81f3f6	intel/compiler: Fix uncompaction of signed word immediates on Tigerlake This expression accidentally performs a 32-bit sign-extension when processing the second half of the expression (the low 16 bits). Consider -7W, which is represented as 0xfff9fff9 in our encoding (the 16-bit word is replicated to both halves of the 32-bit dword). Tigerlake's compaction stores the low 11-bits of an immediate as-is, and replicates the 12th bit. So here, compacted_imm will be 0xff9. ( (int)(0xff9 << 20) >> 4) \| ((short)(0xff9 << 4) >> 4)) 0xfff90000 \| (0xff90 >> 4) 0xfff90000 \| 0xfffffff9 ...oops... 0xfffffff9 By casting the second line of the expression to unsigned short, we prevent the sign-extension when it combines both parts, so we get: 0xfff90000 \| 0x0000fff9 0xfff9fff9 Fixes: `12d3b11908` ("intel/compiler: Add instruction compaction support on Gen12") Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16833>	2022-06-02 13:59:38 -07:00
Jason Ekstrand	dfedeccc13	intel: Only set VectorMaskEnable when needed For cases with lots of very small primitives, this may improve performance because we're not executing those dead channels all the time. Shader-db reports no instruction or cycle-count changes. However, by hacking up the driver to report when this optimization triggers, it appears to affect about 10% of shader-db. v2 (Kenneth Graunke): Always enable VMask prior to XeHP for now, because using VMask on those platforms allows us to perform the eliminate_find_live_channel() optimization. However, XeHP doesn't seem to have packed fragment shader dispatch, so we lose that optimization regardless, and there's no reason not to avoid vmask. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1054>	2022-05-27 21:52:48 +00:00
Jason Ekstrand	1b9248e761	intel/fs: Copy color_outputs_valid into wm_prog_data Fixes: `36ee2fd61c` ("anv: Implement the basic form of VK_EXT_transform_feedback") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16506>	2022-05-27 14:33:53 +00:00
Jason Ekstrand	8379993223	intel/fs: Drop fs_visitor::emit_alpha_to_coverage_workaround() It no longer exists. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16506>	2022-05-27 14:33:53 +00:00
Lionel Landwerlin	e666089082	intel/disasm: add missing handling of <1;1,0> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `7cd9adeb41` ("intel/compiler: In XeHP prefer <1;1,0> regions before compacting") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16704>	2022-05-26 06:42:16 +00:00
Kenneth Graunke	9886615958	intel/compiler: Move spill/fill tracking to the register allocator Originally, we had virtual opcodes for scratch access, and let the generator count spills/fills separately from other sends. Later, we started using the generic SHADER_OPCODE_SEND for spills/fills on some generations of hardware, and simply detected stateless messages there. But then we started using stateless messages for other things: - anv uses stateless messages for the buffer device address feature. - nir_opt_large_constants generates stateless messages. - XeHP curbe setup can generate stateless messages. So counting stateless messages is not accurate. Instead, we move the spill/fill accounting to the register allocator, as it generates such things, as well as the load/store_scratch intrinsic handling, as those are basically spill/fills, just at a higher level. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16691>	2022-05-25 06:56:01 +00:00
Kenneth Graunke	59bfc9c6cb	intel: Fix analysis invalidation in eliminate_find_live_channel If we saw a HALT instruction, we would forget to invalidate our analysis pass information before returning progress. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16677>	2022-05-24 22:36:39 +00:00
Timothy Arceri	d7a071a28f	gallium/drivers: set force_indirect_unrolling_sampler for all required drivers This is set to true for all drivers that have a GLSL level of support lower than 4.00. This matches the rule for setting the GLSL IR option EmitNoIndirectSampler. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16543>	2022-05-17 02:12:21 +00:00
Marcin Ślusarz	9acb30c8c4	intel/compiler: implement primitive shading rate for mesh Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16030>	2022-05-13 13:05:51 +00:00
Marcin Ślusarz	29a778fa6b	intel/compiler: print name of the unhandled intrinsic Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16493>	2022-05-13 09:43:02 +00:00
Marcin Ślusarz	65ff6932dc	intel/compiler: handle gl_Viewport and gl_Layer in FS URB setup Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16493>	2022-05-13 09:43:02 +00:00
Marcin Ślusarz	040062df41	intel/compiler: handle VARYING_SLOT_CULL_PRIMITIVE in mesh It's needed for gl_MeshPerPrimitiveNV[].gl_ViewportMask Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16493>	2022-05-13 09:43:02 +00:00
Vadym Shovkoplias	55c71217ec	driconf: Add a limit_trig_input_range option With this option enabled range of input values for fsin and fcos is limited to [-2pi : 2pi] by calculating the reminder after 2*pi modulo division. This helps to improve calculation precision for large input arguments on Intel. -v2: Add limit_trig_input_range option to prog_key to update shader cache (Lionel) Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16388>	2022-05-13 06:47:53 +00:00
Jason Ekstrand	352e32e5ba	nir/builder: Add a nir_trim_vector helper This pattern pops up a bunch and the semantics of nir_channels() aren't very convenient much of the time. Let's add a nir_trim_vector() which matches nir_pad_vector(). Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16309>	2022-05-11 14:47:33 +00:00
Karol Herbst	9c5fd100cc	nir: add a nir_remove_non_entrypoints helper This code just got duplicated a lot. There is still more, but the remaining instances do a bit more than just removing other functions. Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16348>	2022-05-10 03:37:44 +00:00
Emma Anholt	3a42e92a4f	glsl: Drop the dead MOD_TO_FLOOR path. It's now called lower_fmod in NIR. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8044>	2022-05-05 22:25:03 +00:00
Caio Oliveira	7cd9adeb41	intel/compiler: In XeHP prefer <1;1,0> regions before compacting Ken performed some tests with shader-db to evaluate the effects ``` Across all 145,848 shaders generated, the results were: Total bytes compacted before: 3,326,224 Total bytes compacted after: 60,963,280 ``` Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15399>	2022-05-02 18:03:01 +00:00
Emma Anholt	536c8ee96d	nir/lower_tex: Make the adding a 0 LOD to nir_op_tex in the VS optional. This controls the whole lowering of "make tex ops with implicit derivatives on non-implicit-derivative stages be tex ops with an explicit lod of 0 instead", but it's really hard to describe that in a git commit summary. All existing callers get it added except: - nir_to_tgsi which didn't want it. - nouveau, which didn't want it (fixes regressions in shadowcube and shadow2darray with NIR, since the shading languages don't expose txl of those sampler types and thus it's not supported in HW) - optional lowering passes in mesa/st (lower_rect, YUV lowering, etc) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16156>	2022-04-28 21:26:08 +00:00
Lionel Landwerlin	8ef8e72aac	intel/fs: tidy up lower of ray queries We already expect a single function. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15946>	2022-04-19 12:56:06 +00:00
Marcin Ślusarz	5dace41c10	intel/compiler: invalidate metadata in brw_nir_initialize_mue New "if" blocks may have been inserted. Fixes: `bc4f8c073a` ("intel/compiler: inject MUE initialization") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15924>	2022-04-19 11:43:55 +00:00
Marcin Ślusarz	4fddef33d5	intel/compiler: invalidate all metadata in brw_nir_lower_intersection_shader New "if" blocks were inserted. Fixes: `303378e1dd` ("intel/rt: Add lowering for combined intersection/any-hit shaders") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15924>	2022-04-19 11:43:55 +00:00
Alexey Bozhenko	2d7d907ad1	intel/compiler: fix singleton pointer coverity warning fix brw_kernel::stats member that was declared as a variable but used as a pointer to array of 3 elements CID: 1503279 Signed-off-by: Bozhenko Alexey <oleksii.bozhenko@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15975>	2022-04-19 12:36:10 +03:00
Lionel Landwerlin	04bd007757	intel/fs: require memory fence commit bit on Gfx9 Fixes a hang on Gfx9 GT1 : dEQP-VK.compute.zero_initialize_workgroup_memory.max_workgroup_memory.128 Tested-by: Mark Janes <markjanes@swizzler.org> Acked-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15596>	2022-04-17 21:24:17 +00:00
Jason Ekstrand	ad0dc8e4ab	intel/compiler: Set lower_fisnormal Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15985>	2022-04-16 00:26:43 +00:00
Lionel Landwerlin	e11bedb9f5	intel/fs: add a note on possible optimization of root node address Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15910>	2022-04-13 11:24:49 +00:00
Lionel Landwerlin	9c0805ef91	intel/fs: fix metadata preserve on trace_ray intrinsic `c78be5da30` ("intel/fs: lower ray query intrinsics") introduced a helper function using nir_(push\|pop)_if which invalidated dominance & block_index for the replacement of nir_intrinsic_rt_trace_ray. We can still keep dominance/block_index metadata for the lowering of nir_intrinsic_rt_execute_callable though. This change uses 2 different lowering function with correct metadata preservation. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `c78be5da30` ("intel/fs: lower ray query intrinsics") Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15910>	2022-04-13 11:24:49 +00:00
Jason Ekstrand	69b5424ea4	intel/nir: Lower 8 and 16-bit bitwise unops Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15829>	2022-04-12 23:19:38 +00:00
Jason Ekstrand	a482877c70	intel/fs: Implement 16-bit [ui]mul_high Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15829>	2022-04-12 23:19:38 +00:00
Mykhailo Skorokhodov	9c7e750ffe	intel/fs: Enable b2f(inot(a)) and b2i(inot(a)) optimization for Gfx12+ The commit enables the optimization for Intel Gfx12+ graphics. Tigerlake ``` total instructions in shared programs: 1289326 -> 1289015 (-0.02%) instructions in affected programs: 37841 -> 37530 (-0.82%) helped: 78 HURT: 9 helped stats (abs) min: 1 max: 26 x̄: 4.69 x̃: 3 helped stats (rel) min: 0.10% max: 12.50% x̄: 2.07% x̃: 1.21% HURT stats (abs) min: 1 max: 18 x̄: 6.11 x̃: 4 HURT stats (rel) min: 0.16% max: 1.95% x̄: 0.94% x̃: 0.61% 95% mean confidence interval for instructions value: -4.95 -2.20 95% mean confidence interval for instructions %-change: -2.34% -1.18% Instructions are helped. total cycles in shared programs: 105606388 -> 105606442 (<.01%) cycles in affected programs: 620119 -> 620173 (<.01%) helped: 49 HURT: 28 helped stats (abs) min: 2 max: 3618 x̄: 228.63 x̃: 12 helped stats (rel) min: 0.02% max: 23.31% x̄: 4.60% x̃: 1.11% HURT stats (abs) min: 1 max: 2142 x̄: 402.04 x̃: 29 HURT stats (rel) min: 0.01% max: 36.42% x̄: 5.01% x̃: 0.46% 95% mean confidence interval for cycles value: -151.80 153.20 95% mean confidence interval for cycles %-change: -3.00% 0.79% Inconclusive result (value mean confidence interval includes 0). ``` Related-to: `7725d60938` Signed-off-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14017>	2022-04-12 10:55:05 +00:00
Ian Romanick	b5fa43952a	intel/fs: Better handle constant sources of FS_OPCODE_PACK_HALF_2x16_SPLIT I noticed that a LOT of fragment shaders in Shadow of the Tomb Raider, for instance, end up with a sequence of NIR like: vec1 32 ssa_2 = load_const (0x00000000 = 0.000000) ... vec1 32 ssa_191 = pack_half_2x16_split ssa_188, ssa_2 vec1 32 ssa_192 = pack_half_2x16_split ssa_189, ssa_2 vec1 32 ssa_193 = pack_half_2x16_split ssa_190, ssa_2 This results in an assembly sequence like: mov(8) g28<1>UD 0x00000000UD mov(8) g21<2>HF g28<8,8,1>F shl(8) g21<1>UD g21<8,8,1>UD 0x00000010UD mov(8) g21<2>HF g25<8,8,1>F mov(8) g19<2>HF g28<8,8,1>F shl(8) g19<1>UD g19<8,8,1>UD 0x00000010UD mov(8) g19<2>HF g23<8,8,1>F mov(8) g20<2>HF g28<8,8,1>F shl(8) g20<1>UD g20<8,8,1>UD 0x00000010UD mov(8) g20<2>HF g24<8,8,1>F After this commit, the generated assembly is: mov(8) g21<1>UD 0x00000000UD mov(8) g21<2>HF g23<8,8,1>F mov(8) g19<1>UD 0x00000000UD mov(8) g19<2>HF g17<8,8,1>F mov(8) g20<1>UD 0x00000000UD mov(8) g20<2>HF g18<8,8,1>F Tiger Lake, Ice Lake, Skylake, and Haswell had similar results. (Ice Lake shown) total instructions in shared programs: 20119086 -> 20119034 (<.01%) instructions in affected programs: 9056 -> 9004 (-0.57%) helped: 8 HURT: 0 helped stats (abs) min: 2 max: 16 x̄: 6.50 x̃: 4 helped stats (rel) min: 0.29% max: 1.75% x̄: 1.00% x̃: 0.98% 95% mean confidence interval for instructions value: -11.01 -1.99 95% mean confidence interval for instructions %-change: -1.56% -0.44% Instructions are helped. total cycles in shared programs: 861019414 -> 861021044 (<.01%) cycles in affected programs: 279862 -> 281492 (0.58%) helped: 4 HURT: 2 helped stats (abs) min: 6 max: 936 x̄: 239.00 x̃: 7 helped stats (rel) min: 0.03% max: 8.13% x̄: 2.09% x̃: 0.09% HURT stats (abs) min: 18 max: 2568 x̄: 1293.00 x̃: 1293 HURT stats (rel) min: 0.36% max: 1.14% x̄: 0.75% x̃: 0.75% 95% mean confidence interval for cycles value: -972.56 1515.89 95% mean confidence interval for cycles %-change: -4.77% 2.49% Inconclusive result (value mean confidence interval includes 0). Broadwell total instructions in shared programs: 17812327 -> 17812263 (<.01%) instructions in affected programs: 9867 -> 9803 (-0.65%) helped: 8 HURT: 0 helped stats (abs) min: 2 max: 28 x̄: 8.00 x̃: 4 helped stats (rel) min: 0.32% max: 1.80% x̄: 1.00% x̃: 0.95% 95% mean confidence interval for instructions value: -15.46 -0.54 95% mean confidence interval for instructions %-change: -1.54% -0.47% Instructions are helped. total cycles in shared programs: 904768620 -> 904773291 (<.01%) cycles in affected programs: 454799 -> 459470 (1.03%) helped: 4 HURT: 4 helped stats (abs) min: 36 max: 586 x̄: 344.50 x̃: 378 helped stats (rel) min: 0.47% max: 4.04% x̄: 2.01% x̃: 1.77% HURT stats (abs) min: 1 max: 5572 x̄: 1512.25 x̃: 238 HURT stats (rel) min: <.01% max: 2.77% x̄: 1.46% x̃: 1.53% 95% mean confidence interval for cycles value: -1122.40 2290.15 95% mean confidence interval for cycles %-change: -2.26% 1.71% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 18581 -> 18579 (-0.01%) spills in affected programs: 323 -> 321 (-0.62%) helped: 1 HURT: 0 total fills in shared programs: 24985 -> 24981 (-0.02%) fills in affected programs: 1348 -> 1344 (-0.30%) helped: 1 HURT: 0 Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown) Instructions in all programs: 143585431 -> 143513657 (-0.0%) Instructions helped: 14403 Cycles in all programs: 8439312778 -> 8439371578 (+0.0%) Cycles helped: 10570 Cycles hurt: 3290 Gained: 146 Lost: 74 All of the lost and gained fossil-db shaders are SIMD32 fragment shaders. 14,247 of the affected shaders are from Shadow of the Tomb Raider. 154 are from Batman Arkham Origins, and the remaining two are from Octopath Traveler. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15089>	2022-04-07 18:26:23 +00:00
Ian Romanick	c08302670b	intel/compiler: Fix sample_d messages on DG2 DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces. The maximum number of gradient components and coordinate components should be 2. In spite of this limitation, the Bspec lists a mysterious R component before the min_lod, so the maximum coordinate components is 3. Fixes the following Vulkan CTS failures on DG2: dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1d_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2d_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_fixed_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_float_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_fixed_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_float_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1d_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2d_fragment The Fixes: tag below is a bit misleading. This commit fixes some test cases similar to ones fixed by the Fixes: commit. I just want to make sure this commit gets applied everywhere that commit was also applied. Fixes: `635ed58e52` ("intel/compiler: Lower txd for 3D samplers on XeHP.") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15781>	2022-04-07 17:09:28 +00:00
Lionel Landwerlin	b5031bd6f7	intel/nir: don't report progress on rayqueries if no queries Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `c78be5da30` ("intel/fs: lower ray query intrinsics") Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15769>	2022-04-07 08:24:19 +00:00
Ian Romanick	7fd1955412	nir: intel/compiler: Lower TXD on array surfaces on DG2+ DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces. Cube maps and 3D surfaces were already handled, but 1D array and 2D array surfaces were not. Fixes the following Vulkan CTS failures on DG2: dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1darray_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2darray_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_fixed_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_float_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_fixed_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_float_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1darray_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2darray_fragment The Fixes: tag below is a bit misleading. This commit adds another lowering, similar to the one in the Fixes: commit, that probably should have been added at the same time. I just want to make sure this commit gets applied everywhere that commit was also applied. Fixes: `635ed58e52` ("intel/compiler: Lower txd for 3D samplers on XeHP.") Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15681>	2022-03-31 12:59:18 -07:00
Lionel Landwerlin	684a4ea30c	intel/clc: fix missing pointer write Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `346a7f14fb` ("intel/compiler: Add code for compiling CL-style SPIR-V kernels") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15611>	2022-03-30 07:56:25 +00:00
Kenneth Graunke	1967fd3b10	intel/compiler: Call inst->resize_sources before setting the sources You should probably resize the sources array before accessing entries that might be out of bounds. inst->resize_sources() always allocates enough space for at least 3 sources, so this is really only an issue when there are 4+ sources. Fixes: `a920979d4f` ("intel/fs: Use split sends for surface writes on gen9+") Fixes: `4f86a70599` ("intel/fs: Lower DW untyped r/w messages to LSC when available") Fixes: `d372abe397` ("intel/fs: Add surface OWORD BLOCK opcodes") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15632>	2022-03-29 13:06:17 -07:00
Georg Lehmann	922916bf64	nir: Move lower_usub_sat64 to nir_lower_int64_options. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15421>	2022-03-28 20:02:52 +00:00
Kenneth Graunke	823745dc27	intel/compiler: Use nir_opt_uniform_atomics() In general, an atomic intrinsic may perform separate atomics for every enabled SIMD channel, as each channel may operate on different memory. However, an extremely common case is for all channels to access the same memory location. In this case, we can simply perform a reduction/scan across the subgroup, and perform one atomic for the whole subgroup, rather than one per channel. For example, if an intrinsic says to take the minimum value of the existing memory and the value in each channel, we can do a thread-local minimum of all enabled channels, then do a single atomic to take the minimum of that and the existing memory. Our hardware doesn't optimize the case where multiple channels ask for atomics on the same memory location; it assumes the compiler will do so. nir_opt_uniform_atomics() uses divergence analysis to detect this case, adds the necessary subgroup operations, and moves the atomic inside a conditional that disables all but a single invocation. It even detects cases where the shader code already performs this kind of optimization, and avoids doing it a second time. This may not be the optimal solution for us. In the backend, we could detect this case and emit send(1) instructions with NoMask, rather than generating if...send(16)...endif, and a lot of unnecessary ALU ops. But it's simple to do, reuses the same path as ACO, and still provides most of the benefit by cutting up to 16x atomics down to a single atomic, which is more merciful to the memory bus. Improves performance of Shadow of the Tomb Raider by 5.5% on XeHP. Improves performance of a customer-internal benchmark on XeHP at 3840x2160 and low settings by approximately 30%. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>	2022-03-26 00:28:19 +00:00
Kenneth Graunke	49ef23f4a6	intel/compiler: Convert to LCSSA and use divergence analysis. We'll use this more shortly. For now, enable it to separately in case anything bisects to this. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>	2022-03-26 00:28:19 +00:00
Kenneth Graunke	b3942beecf	intel/compiler: Set divergence analysis options Although we don't use divergence analysis yet, we've had several work-in-progress series that make use of it. We may as well set our options so that those series can assume they're in place. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>	2022-03-26 00:28:19 +00:00
Kenneth Graunke	6fa66ac228	intel/compiler: Implement nir_intrinsic_last_invocation We haven't exposed this intrinsic as it doesn't directly correspond to anything in SPIR-V. However, it's used internally by some NIR passes, namely nir_opt_uniform_atomics(). We reuse most of the infrastructure in brw_find_live_channel, but with LZD/ADD instead of FBL. A new SHADER_OPCODE_FIND_LAST_LIVE_CHANNEL is like SHADER_OPCODE_FIND_LIVE_CHANNEL but from the other side. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>	2022-03-26 00:28:19 +00:00
Caio Oliveira	c32d386ce2	intel/compiler: Inline TUE map computation into TUE Input lowering Refactor since the TUE compute function is simpler now and the comments make sense being near the lowering. Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15022>	2022-03-25 23:29:19 +00:00
Caio Oliveira	c36ae42e4c	intel/compiler: Use nir_var_mem_task_payload Instead of reusing the in/out slot mechanism, use a separated NIR variable mode. This will make easier later to implement staging the output in shared memory (and storing all at the end to the URB). Note to get 64-bit type support we currently rely on the brw_nir_lower_mem_access_bit_sizes() pass. Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15022>	2022-03-25 23:29:19 +00:00
Caio Oliveira	f82731d0d7	intel/fs: Fix IsHelperInvocation for the case no discard/demote are used Use emit_predicate_on_sample_mask() helper that does check where to get the correct mask depending on whether discard/demote was used or not. Fixes: `45f5db5a84` ("intel/fs: Implement "demote to helper invocation"") Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15400>	2022-03-25 08:20:27 +00:00
Caio Oliveira	bb311c22df	intel/fs: Initialize the sample mask in flags register when using demote Without this change, a check for "is helper invocation" could read uninitialized values. Fixes: `45f5db5a84` ("intel/fs: Implement "demote to helper invocation"") Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15400>	2022-03-25 08:20:27 +00:00
Mark Janes	85e314db5d	Revert "intel/fs: handle interpolation modes for at_sample and at_offset too" This reverts commit `5afbb0e730`. Closes: #6198 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15534>	2022-03-23 11:03:47 -07:00
Daniel Schürmann	832d67e99d	nir: rename nir_src_is_dynamically_uniform to nir_src_is_always_uniform As this function doesn't check for any control-flow dependence, it only returns true for statically (or globally) uniform values. The same holds true for is_binding_dynamically_uniform() in nir_opt_gcm(). Rename to better reflect that property. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14994>	2022-03-23 14:02:08 +00:00
Lionel Landwerlin	df059c6781	intel/clc: deal with SPIRV-Tools linker new behavior We're now required to provide all modules to link at the same SPIRV version. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15486>	2022-03-23 10:24:31 +00:00
Lionel Landwerlin	21aa1d3de1	intel/clc: fixup shared memory offsets We're running the io lowering twice so need to reset some fields so the offset don't go over what is really needed. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15486>	2022-03-23 10:24:31 +00:00
Lionel Landwerlin	de9c2312ea	intel/clc: compile fix Fixes: `c15bf88f01` ("intel: Add a little OpenCL C compiler binary") Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15486>	2022-03-23 10:24:31 +00:00
Lionel Landwerlin	a7f264f33a	intel/clc: add option to printout kernel prog_data Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15486>	2022-03-23 10:24:31 +00:00

... 2 3 4 5 6 ...

2233 commits