fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-20 04:48:07 +02:00

Author	SHA1	Message	Date
Andres Gomez	42351c21bb	glsl/linker: always validate explicit locations for first and last interfaces Until now, we were only doing this when linking a SSO program. However, nothing avoids linking a non SSO program which doesn't have both a VS and FS. In those cases, we also need to report the usual linking errors, if happening. v2: Use a better name for the renamed function (Timothy). Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-15 22:34:50 +00:00
Dylan Baker	95aefc94a9	Delete autotools Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Matt Turner <mattst88@gmail.com>	2019-04-15 13:44:29 -07:00
Rhys Perry	082d180a22	mesa, glsl: add support for EXT_shader_image_load_formatted v3: rebase Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2) Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-04-15 16:18:07 -04:00
Rhys Perry	8671cfe2a2	nir,ac/nir: fix cube_face_coord Seems it was missing the "/ ma + 0.5" and the order was swapped. Fixes: `a1a2a8dfda` ('nir: add AMD_gcn_shader extended instructions') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-15 17:22:47 +01:00
Timothy Arceri	8f74a60c43	nir: fix packing components with arrays When gathering info for unmovable types we need to handle arrays. While we dont support packing/moving arrays we do support packing scalar components with these arrays. Fixes piglit: tests/spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-interleave-range.shader_test Fixes: `5eb17506e1` ("nir: do not pack varying with different types") Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-15 19:25:12 +10:00
Samuel Pitoiset	bbe8febd93	spirv: add SpvCapabilityFloat16 support Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-15 10:43:52 +02:00
Jason Ekstrand	47709ca146	nir/validate: Require unused bits of nir_const_value to be zero Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-04-14 22:25:56 +02:00
Jason Ekstrand	c4b28d1730	nir/load_const_to_scalar: Get rid of a bit size switch statement Now that nir_const_value is a scalar, we don't need the switch on bit size in order to pluck off components properly. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-04-14 22:25:56 +02:00
Jason Ekstrand	893dd34702	spirv: Drop some unneeded bit size switch statements Now that nir_const_value is a scalar, we don't need the switch on bit size in order copy components around properly. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-04-14 22:25:56 +02:00
Jason Ekstrand	b8197a01a9	nir/constant_folding: Get rid of a bit size switch statement Now that nir_const_value is a scalar, we don't need the switch on bit size in order to swizzle them properly. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-04-14 22:25:56 +02:00
Karol Herbst	14531d676b	nir: make nir_const_value scalar v2: remove & operator in a couple of memsets add some memsets v3: fixup lima Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)	2019-04-14 22:25:56 +02:00
Karol Herbst	73d883037d	spirv: reduce array size in vtn_handle_constant we already assert above that there are no more than 3 sources, so it doesn't make sense to use an array of 4 sources Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-14 22:25:56 +02:00
Karol Herbst	e72beacb95	nir/loop_analyze: use nir_const_value.b for boolean results, not u32 Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-14 22:25:56 +02:00
Jason Ekstrand	10602db78c	nir/print: Use nir_src_as_int for array indices Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-04-14 22:25:56 +02:00
Jason Ekstrand	9b1e4bab6b	nir/builder: Add a nir_imm_zero helper v2: replace nir_zero_vec with nir_imm_zero (Karol Herbst) Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-04-14 22:25:56 +02:00
Karol Herbst	daaf777376	nir/builder: Move nir_imm_vec2 from blorp into the builder While we're here, fix a typo which caused it to actually return a vec4 with the third and fourth components zero. Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-14 22:25:56 +02:00
Alyssa Rosenzweig	2ce4adefa5	nir: Add nir_lower_viewport_transform On Mali hardware (supported by Panfrost and Lima), the fixed-function transformation from world-space to screen-space coordinates is done in the vertex shader prior to writing out the gl_Position varying, rather than in dedicated hardware. This commit adds a shared NIR pass for implementing coordinate transformation and lowering gl_Position writes into screen-space gl_Position writes. v2: Run directly on derefs before io/vars are lowered to cleanup the code substantially. Thank you to Qiang for this suggestion! v3: Bikeshed continues. v4: Add to Makefile.sources (per Jason's comment). Bikeshed comment. Ian and Qiang's reviews are from v3, but no real functional changes from v4. Rob's review is from v4. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Suggested-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-04-14 19:15:13 +00:00
Christian Gmeiner	b6bed115a5	nir: add lower_ftrunc Port TGSI TRUNC lowering to nir Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-13 17:54:48 +00:00
Jason Ekstrand	18ed82b084	nir: Add a pass for selectively lowering variables to scratch space This commit adds new nir_load/store_scratch opcodes which read and write a virtual scratch space. It's up to the back-end to figure out what to do with it and where to put the actual scratch data. v2: Drop const_index comments (by anholt) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-12 15:59:31 -07:00
Eric Anholt	b88ef3bd76	nir: Add a comment about how intrinsic definitions work. I was thinking about a refactor, and needed to read this first. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-12 15:56:12 -07:00
Eric Anholt	35355b4860	nir: Drop remaining references to const_index in favor of the call to use. Please don't make me read a const_index[] expression ever again. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-12 15:56:04 -07:00
Eric Anholt	6e4d3d0a2f	nir: Drop comments about the constant_index slots for load/stores. The constant_index slots are named right there in the intrinsic definition, and the comment is just a chance to get out of sync. Noticed while reviewing the lower_to_scratch changes that copy-and-pasted wrong comments, and load_ubo and load_per_vertex_output had incorrect comments currently. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-12 15:55:55 -07:00
Kenneth Graunke	9e0c744f07	glsl: Set location on structure-split sampler uniform variables gl_nir_lower_samplers_as_deref splits structure uniform variables, creating new variables for individual fields. As part of that, it calculates a new location. It then never set this on the new variables. Thanks to Michael Fiano for finding this bug. Fixes crashes on i965 with Piglit's new tests/spec/glsl-1.10/execution/samplers/uniform-struct test, which was reduced from the failing case in Michael's app. Fixes: `f003859f97` nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-12 10:35:08 -07:00
Marek Olšák	bd2995c8b7	glsl: allow the #extension directive within code blocks for the dri option for Viewperf 13 Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-12 11:34:39 -04:00
Karol Herbst	4a3c04a11f	glsl/nir: add support for lowering bindless images_derefs v2: handle atomics as well make use of nir_rewrite_image_intrinsic v3: remove call to nir_remove_dead_derefs v4: (Timothy Arceri) dont actually call lowering yet Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v3) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Karol Herbst	0b2e8d9e17	glsl/nir: fetch the type for images from the deref instruction fixes retrieving the sampler type for bindless images stored inside structs. Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Karol Herbst	d7bbb3caf1	glsl_to_nir: handle bindless textures v2: add support for AMD Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1) Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Timothy Arceri	035759b61b	nir/i965/freedreno/vc4: add a bindless bool to type size functions This required to calculate sizes correctly when we have bindless samplers/images. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Karol Herbst	3b2a9ffd60	nir: move brw_nir_rewrite_image_intrinsic into common code Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Timothy Arceri	9e3740c47f	nir: initialise some variables in opt_if_loop_last_continue() Fixes a couple of Coverity warnings CID 1444626. Fixes: `e30804c602` ("nir/radv: remove restrictions on opt_if_loop_last_continue()") Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-04-11 20:38:03 +10:00
Juan A. Suarez Romero	83f1b0e95b	nir/xfb: do not use bare interface type In commit `3b3653c4cf` we decided not to use bare types; hence do not use bare type when comparing with interface type to find out if the xfb variable is an array block. This fixes dEQP-VK.transform_feedback.* tests. Fixes: `3b3653c4cf` ("nir/spirv: don't use bare types, remove assert in split vars for testing") CC: Dave Airlie <airlied@redhat.com> CC: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-11 11:52:45 +02:00
Marek Olšák	53f715fafb	Revert "glsl: fix shader_storage_blocks_write_access for SSBO block arrays" This reverts commit `b7ca074cc0`. It broke a lot of tests.	2019-04-10 10:48:56 -04:00
Karol Herbst	0c4706563a	glsl/standalone: add GLES3.1 and GLES3.2 compatibility also set some constants for SSBOs. With that it can compile the shader from: dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.18 Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-10 16:16:36 +02:00
Bas Nieuwenhuizen	282bacab4a	nir: Add access qualifiers on load_ubo intrinsic. Otherwise nir_lower_non_uniform_access crashes when it tries to get the access of a load_ubo. Fixes: `8ed583fe52` "spirv: Handle the NonUniformEXT decoration" Fixes: `e50ab2c0f2` "nir: Add access flags to deref and SSBO atomics" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-10 02:04:04 +02:00
Marek Olšák	b7ca074cc0	glsl: fix shader_storage_blocks_write_access for SSBO block arrays CTS: GL45-CTS.compute_shader.resources-max Fixes: `4e1e8f684b` "glsl: remember which SSBOs are not read-only and pass it to gallium" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-09 19:25:35 -04:00
Andres Gomez	75a3dd97aa	glsl/linker: location aliasing requires types to have the same width From the OpenGL 4.60.5 spec, section 4.4.1 Input Layout Qualifiers, Page 67, (Location aliasing): " Further, when location aliasing, the aliases sharing the location must have the same underlying numerical type and bit width (floating-point or integer, 32-bit versus 64-bit, etc.) and the same auxiliary storage and interpolation qualification." Additionally, we have improved the linker error descriptions. Specifically, when taking structs into account we were producing a linker error because we assumed that all components in each location were used and that would cause component aliasing. This is not accurate of the actual problem. Now, the failure specifies that the underlying numerical type incompatibility is the cause for the failure. Fixes the following piglit test: tests/spec/arb_enhanced_layouts/linker/component-layout/vs-to-fs-width-mismatch-double-float.shader_test v2: - Do not assert if we see invalid numerical types. These come straight from shader code, so we should produce linker errors if shaders attempt to do location aliasing on variables that are not numerical such as records. - While we are at it, improve error reporting for the case of numerical type mismatch to include the shader stage. v3: - Allow location aliasing of images and samplers. If we get these it means bindless support is active and they should be handled as 64-bit integers (Ilia) - Make sure we produce link errors for any non-numerical type for which we attempt location aliasing, not just structs. v4: - Rebased with minor fixes (Andres). - Added fixing tag to the commit log (Andres). v5: - Remove the helper function and check individually for the underlying numerical type and bit width (Timothy). - Implicitly, assume that any non-treated type which is checked for its underlying numerical type is either integer or float and has a defined bit width (Timothy). - Implicitly, assume that structs are the only non-treated non-numerical type (Timothy). - Improve the linker error descriptions and commit log (Andres). Fixes: `13652e7516` ("glsl/linker: Fix type checks for location aliasing") Cc: Ilia Mirkin <imirkin@alum.mit.edu> Cc: Timothy Arceri <tarceri@itsqueeze.com> Cc: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-09 12:56:50 +02:00
Jason Ekstrand	6279074de1	nir: Get rid of global registers We have a pass to lower global registers to locals and many drivers dutifully call it. However, no one ever creates a global register ever so it's all dead code. It's time we bury it. Acked-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-09 00:29:36 -05:00
Jason Ekstrand	b28bad89b9	nir: Get rid of nir_register::is_packed All we ever do is initialize it to zero, clone it, print it, and validate it. No one ever sets or uses it. Acked-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-09 00:29:36 -05:00
Caio Marcelo de Oliveira Filho	bd73531677	spirv: Add support for DerivativeGroup capabilities As defined in SPV_NV_compute_shader_derivatives. These control how the invocations are arranged in a CS when doing derivative and related operations (which are also enabled by the extension). Since we expect valid SPIR-V, we don't need to do more work at SPIR-V level to enable the derivative and related operations to be called. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	fcbc5ccaae	nir: Don't set LOD=0 for compute shader that has derivative group When using NV_compute_shader_derivatives to set a derivative group, a compute shader supports texture with implicit LOD calculation, so don't set an explicit LOD. Note if the extension is used but the derivative group is not specified, it will default to LOD=0 as before. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	d08a74d2bf	nir/algebraic: Lower CS derivatives to zero when no group defined In compute shaders if no derivative group is defined, the derivatives will always be zero. Specified in NV_compute_shader_derivatives. To make the check more convenient, add a "info" local variable to the generated code so we can refer to it in the Python rules. (Jason) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho	3c5ddaeacd	glsl: Parse and propagate derivative_group to shader_info NV_compute_shader_derivatives allow selecting between two possible arrangements (quads and linear) when calculating derivatives and certain subgroup operations in case of Vulkan. So parse and propagate those up to shader_info.h. v2: Do not fail when ARB_compute_variable_group_size is being used, since we are still clarifying what is the right thing to do here. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho	ca60f0b7ba	glsl: Enable texture builtins for NV_compute_shader_derivatives Renamed a few predicates from "fs_only" to be "derivative_only" (or similar pairs). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho	09a3273fe7	glsl: Enable derivative builtins for NV_compute_shader_derivatives Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho	289478ea89	glsl: Remove redundant conditions when asserting in_qualifier As the code evolved, we ended up with a redundant conditions. Clean this up. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho	163655b33e	mesa: Extension boilerplate for NV_compute_shader_derivatives Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:32 -07:00
Timothy Arceri	e30804c602	nir/radv: remove restrictions on opt_if_loop_last_continue() When I implemented opt_if_loop_last_continue() I had restricted this pass from moving other if-statements inside the branch opposite the continue. At the time it was causing a bunch of spilling in shader-db for i965. However Samuel Pitoiset noticed that making this pass more aggressive significantly improved the performance of Doom on RADV. Below are the statistics he gathered. 28717 shaders in 14931 tests Totals: SGPRS: 1267317 -> 1267549 (0.02 %) VGPRS: 896876 -> 895920 (-0.11 %) Spilled SGPRs: 24701 -> 26367 (6.74 %) Code Size: 48379452 -> 48507880 (0.27 %) bytes Max Waves: 241159 -> 241190 (0.01 %) Totals from affected shaders: SGPRS: 23584 -> 23816 (0.98 %) VGPRS: 25908 -> 24952 (-3.69 %) Spilled SGPRs: 503 -> 2169 (331.21 %) Code Size: 2471392 -> 2599820 (5.20 %) bytes Max Waves: 586 -> 617 (5.29 %) The codesize increases is related to Wolfenstein II it seems largely due to an increase in phis rather than the existing jumps. This gives +10% FPS with Doom on my Vega56. Rhys Perry also benchmarked Doom on his VEGA64: Before: 72.53 FPS After: 80.77 FPS v2: disable pass on non-AMD drivers Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1) Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-09 11:29:41 +10:00
Jason Ekstrand	50f3535d1f	nir/search: Search for all combinations of commutative ops Consider the following search expression and NIR sequence: ('iadd', ('imul', a, b), b) ssa_2 = imul ssa_0, ssa_1 ssa_3 = iadd ssa_2, ssa_0 The current algorithm is greedy and, the moment the imul finds a match, it commits those variable names and returns success. In the above example, it maps a -> ssa_0 and b -> ssa_1. When we then try to match the iadd, it sees that ssa_0 is not b and fails to match. The iadd match will attempt to flip itself and try again (which won't work) but it cannot ask the imul to try a flipped match. This commit instead counts the number of commutative ops in each expression and assigns an index to each. It then does a loop and loops over the full combinatorial matrix of commutative operations. In order to keep things sane, we limit it to at most 4 commutative operations (16 combinations). There is only one optimization in opt_algebraic that goes over this limit and it's the bitfieldReverse detection for some UE4 demo. Shader-db results on Kaby Lake: total instructions in shared programs: 15310125 -> 15302469 (-0.05%) instructions in affected programs: 1797123 -> 1789467 (-0.43%) helped: 6751 HURT: 2264 total cycles in shared programs: 357346617 -> 357202526 (-0.04%) cycles in affected programs: 15931005 -> 15786914 (-0.90%) helped: 6024 HURT: 3436 total loops in shared programs: 4360 -> 4360 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 23675 -> 23666 (-0.04%) spills in affected programs: 235 -> 226 (-3.83%) helped: 5 HURT: 1 total fills in shared programs: 32040 -> 32032 (-0.02%) fills in affected programs: 190 -> 182 (-4.21%) helped: 6 HURT: 2 LOST: 18 GAINED: 5 Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-04-08 21:38:48 +00:00
Jason Ekstrand	ad8c145658	nir/algebraic: Add some logical OR and AND patterns The new OR pattern has been seen in the wild and can end up being generated by GLSLang. Not sure about the other two new patterns but we may as well throw them in for completeness. While we're here, we can drop the '@bool' specifier from the one pattern because specifying True already implies 1-bit which basically implies boolean. Shader-db results on Kaby Lake: total instructions in shared programs: 15321227 -> 15321129 (<.01%) instructions in affected programs: 3594 -> 3496 (-2.73%) helped: 6 HURT: 0 total cycles in shared programs: 357481321 -> 357479725 (<.01%) cycles in affected programs: 44109 -> 42513 (-3.62%) helped: 6 HURT: 0 VkPipeline-DB results on Kaby Lake: total instructions in shared programs: 3770504 -> 3769734 (-0.02%) instructions in affected programs: 19058 -> 18288 (-4.04%) helped: 163 HURT: 0 total cycles in shared programs: 1417583701 -> 1417569727 (<.01%) cycles in affected programs: 750958 -> 736984 (-1.86%) helped: 158 HURT: 1 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-05 18:39:06 -05:00
Jason Ekstrand	03a72d96d8	nir/algebraic: Drop some @bool specifiers Now that we have one-bit booleans, we don't need to rely on looking at parent instructions in order to figure out if a value is a Boolean most of the time. We can drop these specifiers and now the optimizations will apply more generally. Shader-DB results on Kaby Lake: total instructions in shared programs: 15321168 -> 15321227 (<.01%) instructions in affected programs: 8836 -> 8895 (0.67%) helped: 1 HURT: 31 total cycles in shared programs: 357481781 -> 357481321 (<.01%) cycles in affected programs: 146524 -> 146064 (-0.31%) helped: 22 HURT: 10 total spills in shared programs: 23675 -> 23673 (<.01%) spills in affected programs: 11 -> 9 (-18.18%) helped: 1 HURT: 0 total fills in shared programs: 32040 -> 32036 (-0.01%) fills in affected programs: 27 -> 23 (-14.81%) helped: 1 HURT: 0 No change in VkPipeline-DB Looking at the instructions hurt, a bunch of them seem to be a case where doing exactly the right thing in NIR ends up doing the wrong-ish thing in the back-end because flags are dumb. In particular, there's a case where we have a MUL followed by a CMP followed by a SEL and when we turn that SEL into an OR, it uses the GRF result of the CMP rather than the flag result so the CMP can't be merged with the MUL. Those shaders appear to schedule better according to the cycle estimates so I guess it's a win? Also it helps spilling in one Car Chase compute shader. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-05 18:39:00 -05:00

1 2 3 4 5 ...

3572 commits