fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 18:00:13 +01:00

Author	SHA1	Message	Date
Jason Ekstrand	d25e5310bc	intel/nir: Lower barycentrics to per-sample in a dedicated pass This is more similar to what we do for single-sample and it should be more clear going forward once our lowering gets more complex. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>	2023-02-06 09:12:17 +00:00
Kenneth Graunke	90a2137cd5	intel/compiler: Use LSC opcode enum rather than legacy BRW_AOPs This gets our logical atomic messages using the lsc_opcode enum rather than the legacy BRW_AOP_* defines. We have to translate one way or another, and using the modern set makes sense going forward. One advantage is that the lsc_opcode encoding has opcodes for both integer and floating point atomics in the same enum, whereas the legacy encoding used overlapping values (BRW_AOP_AND == 1 == BRW_AOP_FMAX), which made it impossible to handle both sensibly in common code. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Lionel Landwerlin	94bb4a13fa	intel/fs: make Wa_1806565034 conditional to non robust access Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20280>	2022-12-13 18:05:19 +00:00
Kenneth Graunke	8c2448d4e6	intel/compiler: Delete sampler key handling for planar format stuff i965 used these, but Gallium drivers do this lowering via a separate nir_lower_tex call from st/mesa. Vulkan drivers don't use these at all. Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20223>	2022-12-09 10:18:25 +00:00
Jason Ekstrand	b4dd3df227	intel/nir: Set has_base_workgroup_id for lower_compute_system_values This option didn't exist half a decade ago when I first implemented base workgroup support in ANV. It's cleaner to just have split system values like all the other zero_base+base things do. We currently only do this for COMPUTE and not KERNEL because it lets us avoid changing intel_clc for now. We can add KERNEL later if needed. We also don't do this lowering for task/mesh. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20068>	2022-12-01 04:56:48 +00:00
Lionel Landwerlin	6f2dbe6da1	anv: enable lower_shader_calls vectorizing On Q2RTX RT shaders : Totals from 7 (22.58% of 31) affected shaders: Instrs: 15453 -> 14418 (-6.70%) Cycles: 232647 -> 224959 (-3.30%) Send messages: 574 -> 481 (-16.20%) Spill count: 118 -> 106 (-10.17%) Fill count: 156 -> 140 (-10.26%) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20058>	2022-11-30 07:23:30 +00:00
Caio Oliveira	fbe40720e0	intel/compiler: Remove redundant argument from brw_nir_create_passthrough_tcs Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19831>	2022-11-19 00:35:56 +00:00
Lionel Landwerlin	bdf680cd3f	intel/fs: use nir_opt_ray_query_ranges Results on DG2 q2rtx shaders: Totals from 6 (12.24% of 49) affected shaders: Instrs: 88927 -> 54088 (-39.18%) Cycles: 4115088 -> 2536902 (-38.35%) Send messages: 2639 -> 1609 (-39.03%) Spill count: 1321 -> 613 (-53.60%) Fill count: 3130 -> 1104 (-64.73%) Scratch Memory Size: 22528 -> 18432 (-18.18%) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16593>	2022-11-11 15:17:08 +00:00
Ian Romanick	351b8c6aec	intel/fs: Enable nir_op_imul_32x16 and nir_op_umul_32x16 on pre-Gfx7 Even though Intel's CI doesn't test these old platforms anymore, the validation added in "intel/eu/validate: Validate integer multiplication source size restrictions" combined with full shader-db runs gives me confidence in the changes. Sandy Bridge total instructions in shared programs: 13902341 -> 13902167 (<.01%) instructions in affected programs: 30771 -> 30597 (-0.57%) helped: 66 / HURT: 0 total cycles in shared programs: 741795500 -> 741791931 (<.01%) cycles in affected programs: 987602 -> 984033 (-0.36%) helped: 28 / HURT: 5 Iron Lake total instructions in shared programs: 8365806 -> 8365754 (<.01%) instructions in affected programs: 1766 -> 1714 (-2.94%) helped: 10 / HURT: 0 total cycles in shared programs: 248542694 -> 248542378 (<.01%) cycles in affected programs: 29836 -> 29520 (-1.06%) helped: 9 / HURT: 0 GM45 total instructions in shared programs: 5187127 -> 5187101 (<.01%) instructions in affected programs: 891 -> 865 (-2.92%) helped: 5 / HURT: 0 total cycles in shared programs: 163643914 -> 163643750 (<.01%) cycles in affected programs: 22206 -> 22042 (-0.74%) helped: 5 / HURT: 0 Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19602>	2022-11-09 21:34:26 +00:00
Ian Romanick	f90d71055b	intel/compiler: Add and use a pass to generate imul_32x16 instructions Gfx8 and Gfx9 platforms are helped for cycles because now many instructions like mul(8) g12<1>D g10<8,8,1>D 6D become mul(8) g12<1>D g10<8,8,1>D 6W It is the same number of instructions, but the 32x16 multiply is a little faster. v2: Fix transposed hi and lo in "(hi >= INT16_MIN && lo <= INT16_MAX)". Noticed by Caio. Use nir_src_is_const instead of open coding it. Suggested by Caio. Broadwell and Skylake had similar results. (Skylake shown) total cycles in shared programs: 845748380 -> 845145547 (-0.07%) cycles in affected programs: 446346348 -> 445743515 (-0.14%) helped: 6017 HURT: 0 helped stats (abs) min: 2 max: 7380 x̄: 100.19 x̃: 8 helped stats (rel) min: <.01% max: 3.72% x̄: 0.41% x̃: 0.39% 95% mean confidence interval for cycles value: -113.37 -87.00 95% mean confidence interval for cycles %-change: -0.42% -0.41% Cycles are helped. Skylake Cycles in all programs: 8844820715 -> 8828897462 (-0.2%) Cycles helped: 47914 Cycles hurt: 1 No shader-db or fossil-db changes on any other Intel platform. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>	2022-11-08 00:02:16 +00:00
Francisco Jerez	5d4df3ac23	intel/compiler: Run extra fp64 lowering pass on devices that don't support int64. In some cases nir_lower_int64 will emit fp64 operations which aren't natively supported on any Intel hardware (e.g. ftrunc, frem). An extra pass of nir_opt_algebraic (for frem) and nir_lower_doubles is required in order to take care of them. This fixes several int64 test-cases on MTL hardware. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19390>	2022-11-07 07:35:22 +00:00
Kenneth Graunke	88756cee8d	intel/compiler: Run nir_opt_large_constants before scalarizing consts nir_opt_large_constants balks at seeing a store_deref of a variable where the source is a vecN operation of multiple load_consts, and thinks that isn't a constant, so it should not bother promoting it. Unfortunately, we were running nir_lower_load_const_to_scalar before nir_opt_large_constants, so this prevented a ton of constant promotion. This commit /used to help/ some shaders in shader-db. Presumably since !16770 landed, those shaders were already helped. Currently ther are no shader-db changes on any Intel platform. Fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 141998227 -> 141421756 (-0.4%) Instructions helped: 12515 Instructions hurt: 237 SENDs in all programs: 7437925 -> 7468033 (+0.4%) SENDs hurt: 12806 Cycles in all programs: 9161655753 -> 9132869800 (-0.3%) Cycles helped: 10163 Cycles hurt: 2637 Spills in all programs: 19977 -> 18678 (-6.5%) Spills helped: 384 Spills hurt: 40 Fills in all programs: 32863 -> 31396 (-4.5%) Fills helped: 385 Fills hurt: 42 Lost: 1 Lots of Shadow of the Tomb Raider fragment shaders and Batman Arkham Origins vertex shaders were hurt for SENDs in this commit. A couple Aztec Ruins compute shaders and Spaceship shaders (multiple stages) were also hurt. All of the shaders hurt for spills or fills were Spaceship compute shaders. Nearly all of the shaders helped were Shadow of the Tomb Raider fragmenet shaders. One Spaceship shader was reall, REALLY helped: Spills helped fossils/fossil-db/Spaceship.run.9f90a2a226fcc57f.1.foz/0b507d3abe2e3c28/compute: 321 -> 13 (-96.0%) Fills helped fossils/fossil-db/Spaceship.run.9f90a2a226fcc57f.1.foz/0b507d3abe2e3c28/compute: 279 -> 21 (-92.5%) Overall this seems like an improvement, but we may want to actually run these few benchmarks before landing. Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16539>	2022-11-01 14:55:21 -07:00
Alyssa Rosenzweig	941c37c085	nir/lower_idiv: Remove imprecise_32bit_lowering NIR has two implementations of lower_idiv, keyed on the imprecise_32bit_lowering flag. This flag is misleading: the results when setting this flag "imprecise", they're completely wrong for some values. If a backend has a native implementation of umul_high, the correct path isn't that much more expensive. If it doesn't, it's substantially slower for highp integer divison... but in practice, non-constant highp integer division is pretty rare. After a painful migration of the tree, this code path has no more users. Remove it so nobody else gets the bright idea of using it again. Closes: #6555 Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19303>	2022-10-27 19:37:14 +00:00
Tapani Pälli	1e51383258	intel/compiler: run nir_opt_idiv_const before nir_lower_idiv Integer div lowering can potentially create a lot of code that is not removed later on. Running const lowering pass first can be used to eliminate that code. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19157>	2022-10-20 15:35:48 +03:00
Kenneth Graunke	2dfab687ec	intel/compiler: Vectorize gl_TessLevelInner/Outer[] writes [v2] Setting the NIR options takes care of iris thanks to the common st/mesa linking code, and updating brw_nir_link_shaders should handle anv. The main effort here is updating remap_tess_levels, which needs to handle vector stores, writemasking, and swizzling. Unfortunately, we also need to continue handling the existing single-component access because it's used for TES inputs, which we don't vectorize. We could try to vectorize TES inputs too, but they're all pushed anyway, so it wouldn't buy us much other than deleting this code. Also, we do have opt_combine_stores, but not one for loads. One limitation of using nir_vectorize_tess_levels is that it works on variables, and so isn't able to combine outer/inner writes that happen to live in the same vec4 slot (for triangle domains). That said, it's still better than before. For writes, we allow the intrinsics to supply up to the full size of the variable (vec4 for outer, vec2 for inner) even if the domain only requires a subset of those components (i.e. triangles needs 3). shader-db results on Icelake: total instructions in shared programs: 19600314 -> 19597528 (-0.01%) instructions in affected programs: 65338 -> 62552 (-4.26%) helped: 271 / HURT: 0 helped stats (abs) min: 6 max: 24 x̄: 10.28 x̃: 12 helped stats (rel) min: 1.30% max: 18.18% x̄: 5.80% x̃: 7.59% 95% mean confidence interval for instructions value: -10.71 -9.85 95% mean confidence interval for instructions %-change: -6.17% -5.43% Instructions are helped. total cycles in shared programs: 851842332 -> 851808165 (<.01%) cycles in affected programs: 618577 -> 584410 (-5.52%) helped: 271 / HURT: 0 helped stats (abs) min: 64 max: 540 x̄: 126.08 x̃: 111 helped stats (rel) min: 2.57% max: 37.97% x̄: 6.12% x̃: 5.06% 95% mean confidence interval for cycles value: -135.35 -116.80 95% mean confidence interval for cycles %-change: -6.67% -5.57% Cycles are helped. total sends in shared programs: 1025238 -> 1024308 (-0.09%) sends in affected programs: 6454 -> 5524 (-14.41%) helped: 271 / HURT: 0 helped stats (abs) min: 2 max: 8 x̄: 3.43 x̃: 4 helped stats (rel) min: 5.71% max: 25.00% x̄: 14.98% x̃: 17.39% 95% mean confidence interval for sends value: -3.57 -3.29 95% mean confidence interval for sends %-change: -15.42% -14.54% Sends are helped. According to Felix DeGrood, this results in a 10% improvement in the draw call time for certain draw calls from Strange Brigade. v2: Fix assertions about number of components and add more of them. Combine the quads and triangles handling as it's nearly identical. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v1] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19061>	2022-10-13 11:38:21 -07:00
Kenneth Graunke	b61b1d5a4c	Revert "intel/compiler: Vectorize gl_TessLevelInner/Outer[] writes" This reverts commit `abba55382f`. The assertions I added late in the process broke shader-db, and my quick fix broke CI, so let's just revert it for now and I'll resubmit this later when it's working better. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7385 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18895>	2022-09-29 17:39:18 -07:00
Lionel Landwerlin	23c7142cd6	anv: disable SIMD16 for RT shaders Since divergence is a lot more likely in RT than compute, it makes sense to limit ourselves to SIMD8. The trampoline shader defaults to SIMD16 since this one is uniform. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>	2022-09-28 05:38:37 +00:00
Lionel Landwerlin	8fc7a98e31	intel/fs: disable split_array_vars on opencl kernels Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>	2022-09-28 05:38:36 +00:00
Kenneth Graunke	abba55382f	intel/compiler: Vectorize gl_TessLevelInner/Outer[] writes Setting the NIR options takes care of iris thanks to the common st/mesa linking code, and updating brw_nir_link_shaders should handle anv. The main effort here is updating remap_tess_levels, which needs to handle vector stores, writemasking, and swizzling. Unfortunately, we also need to continue handling the existing single-component access because it's used for TES inputs, which we don't vectorize. We could try to vectorize TES inputs too, but they're all pushed anyway, so it wouldn't buy us much other than deleting this code. Also, we do have opt_combine_stores, but not one for loads. One limitation of using nir_vectorize_tess_levels is that it works on variables, and so isn't able to combine outer/inner writes that happen to live in the same vec4 slot (for triangle domains). That said, it's still better than before. For writes, we allow the intrinsics to supply up to the full size of the variable (vec4 for outer, vec2 for inner) even if the domain only requires a subset of those components (i.e. triangles needs 3). shader-db results on Icelake: total instructions in shared programs: 19605070 -> 19602284 (-0.01%) instructions in affected programs: 65338 -> 62552 (-4.26%) helped: 271 / HURT: 0 helped stats (abs) min: 6 max: 24 x̄: 10.28 x̃: 12 helped stats (rel) min: 1.30% max: 18.18% x̄: 5.80% x̃: 7.59% 95% mean confidence interval for instructions value: -10.71 -9.85 95% mean confidence interval for instructions %-change: -6.17% -5.43% Instructions are helped. total cycles in shared programs: 851854659 -> 851820320 (<.01%) cycles in affected programs: 618749 -> 584410 (-5.55%) helped: 271 / HURT: 0 helped stats (abs) min: 69 max: 540 x̄: 126.71 x̃: 108 helped stats (rel) min: 2.57% max: 37.97% x̄: 6.17% x̃: 5.06% 95% mean confidence interval for cycles value: -135.89 -117.54 95% mean confidence interval for cycles %-change: -6.72% -5.63% Cycles are helped. total sends in shared programs: 1025285 -> 1024355 (-0.09%) sends in affected programs: 6454 -> 5524 (-14.41%) helped: 271 / HURT: 0 helped stats (abs) min: 2 max: 8 x̄: 3.43 x̃: 4 helped stats (rel) min: 5.71% max: 25.00% x̄: 14.98% x̃: 17.39% 95% mean confidence interval for sends value: -3.57 -3.29 95% mean confidence interval for sends %-change: -15.42% -14.54% Sends are helped. According to Felix DeGrood, this results in a 10% improvement in the draw call time for certain draw calls from Strange Brigade. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>	2022-09-27 18:17:56 -07:00
Marcin Ślusarz	3c96959bbc	intel/compiler: print shader after successful brw_nir_lower_shading_rate_output Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18702>	2022-09-20 17:23:45 +00:00
Pierre-Eric Pelloux-Prayer	70891edd97	nir: add a nir_opt_if_options enum And don't enable nir_opt_if_optimize_phi_true_false on radeonsi with LLVM 14 because it crashes Blender. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6976 Cc: mesa-stable Reviewed-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17949>	2022-08-10 12:55:39 +00:00
Jason Ekstrand	87ab287436	vulkan: Call lower_clip_cull_distance_arrays in vk_spirv_to_nir Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17644>	2022-07-21 21:18:48 +00:00
Jason Ekstrand	fd17aaf430	intel/fs: Use nir_lower_single_sampled This lets us drop demote_sample_qualifiers as well as a back-end check for key->multisample_fbo. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>	2022-07-13 20:28:42 +00:00
Marcin Ślusarz	585d81e3ec	intel/compiler: print shaders after nir_remove_unused_varyings Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17516>	2022-07-13 15:50:02 +00:00
Jason Ekstrand	530de844ef	intel,anv,iris,crocus: Drop subgroup size from the shader key Use nir->info.subgroup_size instead. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17337>	2022-07-08 22:47:22 +00:00
Lionel Landwerlin	f1dd487531	intel/nir: temporarly disable opt_uniform_atomics for RT/CL Not had time to investiguate what is going is on but it's definitely a contributor to failures. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16104>	2022-07-07 10:21:48 +00:00
Marcin Ślusarz	008163f382	intel/compiler: vectorize task payload loads/stores Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17000>	2022-06-20 17:38:20 +00:00
Jason Ekstrand	844a70f439	intel/compiler: Use NIR_PASS(_, ...) I don't know when this was added but it's really neat and we should use it instead of NIR_PASS_V since NIR_DEBUG=print and a few validation things will work better. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17014>	2022-06-13 22:31:25 +00:00
Vadym Shovkoplias	55c71217ec	driconf: Add a limit_trig_input_range option With this option enabled range of input values for fsin and fcos is limited to [-2pi : 2pi] by calculating the reminder after 2*pi modulo division. This helps to improve calculation precision for large input arguments on Intel. -v2: Add limit_trig_input_range option to prog_key to update shader cache (Lionel) Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16388>	2022-05-13 06:47:53 +00:00
Emma Anholt	536c8ee96d	nir/lower_tex: Make the adding a 0 LOD to nir_op_tex in the VS optional. This controls the whole lowering of "make tex ops with implicit derivatives on non-implicit-derivative stages be tex ops with an explicit lod of 0 instead", but it's really hard to describe that in a git commit summary. All existing callers get it added except: - nir_to_tgsi which didn't want it. - nouveau, which didn't want it (fixes regressions in shadowcube and shadow2darray with NIR, since the shading languages don't expose txl of those sampler types and thus it's not supported in HW) - optional lowering passes in mesa/st (lower_rect, YUV lowering, etc) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16156>	2022-04-28 21:26:08 +00:00
Jason Ekstrand	69b5424ea4	intel/nir: Lower 8 and 16-bit bitwise unops Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15829>	2022-04-12 23:19:38 +00:00
Ian Romanick	7fd1955412	nir: intel/compiler: Lower TXD on array surfaces on DG2+ DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces. Cube maps and 3D surfaces were already handled, but 1D array and 2D array surfaces were not. Fixes the following Vulkan CTS failures on DG2: dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1darray_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2darray_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_fixed_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_float_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_fixed_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_float_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1darray_fragment dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2darray_fragment The Fixes: tag below is a bit misleading. This commit adds another lowering, similar to the one in the Fixes: commit, that probably should have been added at the same time. I just want to make sure this commit gets applied everywhere that commit was also applied. Fixes: `635ed58e52` ("intel/compiler: Lower txd for 3D samplers on XeHP.") Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15681>	2022-03-31 12:59:18 -07:00
Kenneth Graunke	823745dc27	intel/compiler: Use nir_opt_uniform_atomics() In general, an atomic intrinsic may perform separate atomics for every enabled SIMD channel, as each channel may operate on different memory. However, an extremely common case is for all channels to access the same memory location. In this case, we can simply perform a reduction/scan across the subgroup, and perform one atomic for the whole subgroup, rather than one per channel. For example, if an intrinsic says to take the minimum value of the existing memory and the value in each channel, we can do a thread-local minimum of all enabled channels, then do a single atomic to take the minimum of that and the existing memory. Our hardware doesn't optimize the case where multiple channels ask for atomics on the same memory location; it assumes the compiler will do so. nir_opt_uniform_atomics() uses divergence analysis to detect this case, adds the necessary subgroup operations, and moves the atomic inside a conditional that disables all but a single invocation. It even detects cases where the shader code already performs this kind of optimization, and avoids doing it a second time. This may not be the optimal solution for us. In the backend, we could detect this case and emit send(1) instructions with NoMask, rather than generating if...send(16)...endif, and a lot of unnecessary ALU ops. But it's simple to do, reuses the same path as ACO, and still provides most of the benefit by cutting up to 16x atomics down to a single atomic, which is more merciful to the memory bus. Improves performance of Shadow of the Tomb Raider by 5.5% on XeHP. Improves performance of a customer-internal benchmark on XeHP at 3840x2160 and low settings by approximately 30%. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>	2022-03-26 00:28:19 +00:00
Kenneth Graunke	49ef23f4a6	intel/compiler: Convert to LCSSA and use divergence analysis. We'll use this more shortly. For now, enable it to separately in case anything bisects to this. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>	2022-03-26 00:28:19 +00:00
Jason Ekstrand	d1bddfba6b	intel/nir: Add optimizations to help OpenCL-style kernels Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13171>	2022-03-21 11:26:44 +00:00
Marcin Ślusarz	81df66bfff	intel/compiler: mark some variables as per-primitive in FS if they come from MS Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15303>	2022-03-09 16:52:59 +00:00
Daniel Schürmann	2a92452a0e	nir/opt_shrink_vectors: Remove shrinking of store intrinsics data source This is done via nir_opt_shrink_stores. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14480>	2022-02-11 11:50:47 +01:00
Lionel Landwerlin	c78be5da30	intel/fs: lower ray query intrinsics v2: Add helper for acceleration->root_node computation (Caio) v3: Update comment on "done" bit (Caio) Remove progress bool value for impl function (Caio) Don't use nir_shader_instructions_pass to search the shader (Caio) v4: Rename variable for if/else block (Caio) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>	2022-02-08 12:55:25 +00:00
Lionel Landwerlin	cebf284ac1	intel/compiler: add a new pass to lower shading rate into HW format Rework: * Jason: Modernize brw_nir_lower_shading_rate_output: 1. Use nir_shader_instructions_pass() 2. Use *_imm builder helpers. 3. Use nir_intrinsic_base() instead of ->const_index[0] v2: Also lower loads (Caio) v3: Update stage check to trigger lowering (Caio) v4: Assert on != MESH (Caio) v5: Fixup instruction insertion (Caio) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13739>	2022-02-02 17:09:46 +00:00
Connor Abbott	913bec10c4	nir/lower_subgroups: Rename lower_shuffle to lower_relative_shuffle This option only applies to relative shuffles (up/down/xor), and in a moment we're going to add an option to lower normal shuffles, so rename it. While we're here, rename lower_shuffle() to lower_to_shuffle() for similar reasons. Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14412>	2022-02-01 16:27:45 +00:00
Dave Airlie	d54c07b4c4	mesa/*: use an internal enum for tessellation primitive types. To avoid dragging gl.h into places it has no business being, defined tessellation primitive mode to an enum. This has a lot of fallout all over the place. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>	2022-01-19 21:54:58 +00:00
Dave Airlie	2692a5f8db	intel/compiler: don't lower swizzles in backend. These are lowered by crocus in the frontend, the key entries are still used. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14202>	2021-12-22 21:37:55 +00:00
Jordan Justen	52a55f097f	intel/compiler: Use nir_lower_tex_options::lower_offset_filter for tg4 on XeHP Based on Rafael's: * "nir/lower_tex: Add option to lower offset for tg4 too." * "intel/compiler: Lower offsets for tg4 on gen9+." * "WIP: Do not lower basic offsets." * "WIP: intel/compiler: Enable lowering offsets restriction." But, with these changes: * Fixed range checking to be signed 4 bits * Converted to filter * Apply only to gfx12.5+ * Use nir_src_is_const / nir_src_comp_as_int (s-b Jason) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14142>	2021-12-13 16:59:37 -08:00
Marcin Ślusarz	28e0c63a4c	intel/compiler: extract brw_nir_load_global_const out of rt code Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>	2021-12-04 00:41:46 +00:00
Caio Oliveira	fcc1ccf541	intel/compiler: Don't lower Mesh/Task I/O to temporaries Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>	2021-12-04 00:41:46 +00:00
Timur Kristóf	5aa39253cb	nir: Rename nir_get_io_vertex_index_src and include per-primitive I/O. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>	2021-11-16 07:46:55 +00:00
Lionel Landwerlin	361b3fee3c	intel: move away from booleans to identify platforms v2: Drop changes around GFX_VERx10 == 75 (Luis) v3: Replace (GFX_VERx10 < 75 && devinfo->platform != INTEL_PLATFORM_BYT) by (devinfo->platform == INTEL_PLATFORM_IVB) Replace (devinfo->ver >= 5 \|\| devinfo->platform == INTEL_PLATFORM_G4X) by (devinfo->verx10 >= 45) Replace (devinfo->platform != INTEL_PLATFORM_G4X) by (devinfo->verx10 != 45) v4: Fix crocus typo v5: Rebase v6: Add GFX3, ILK & I965 platforms (Jordan) Move ifdef to code expressions (Jordan) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12981>	2021-11-08 16:48:06 +00:00
Caio Oliveira	858424bd2e	intel/compiler: Use gl_shader_stage_uses_workgroup() helpers Instead of checking for MESA_SHADER_COMPUTE (and KERNEL). Where appropriate, also use gl_shader_stage_is_compute(). This allows most of the workgroup-related lowering to be applied to Task and Mesh shaders. These will be added later and "inherit" from cs_prog_data structure. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13629>	2021-11-03 11:09:48 -07:00
Jason Ekstrand	7b21def9c2	intel/fs: Add support for atomic_fadd Rework: - Enable float32 atomic add with LSC (Sagar) - disassemble new opcode (Caio) Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12566>	2021-09-09 23:34:33 +00:00
Ian Romanick	5ce3bfcdf3	intel/compiler: Lower 8-bit ops to 16-bit in NIR on all platforms This fixes the Crucible func.shader.shift.int8_t test on Gen8 and Gen9. See https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/76. With the previous optimizations in place, this change seems to improve the quality of the generated code. Comparing a couple Vulkan CTS tests on Skylake had the following results. dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag: SIMD8 shader: 36 instructions. 1 loops. 3822 cycles. 0:0 spills:fills, 5 sends SIMD8 shader: 27 instructions. 1 loops. 2742 cycles. 0:0 spills:fills, 5 sends dEQP-VK.spirv_assembly.type.vec3.i8.max_frag: SIMD8 shader: 39 instructions. 1 loops. 3922 cycles. 0:0 spills:fills, 5 sends SIMD8 shader: 37 instructions. 1 loops. 3682 cycles. 0:0 spills:fills, 5 sends Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>	2021-08-18 22:03:37 +00:00

1 2 3 4 5 ...

353 commits