fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-24 06:40:11 +01:00

Author	SHA1	Message	Date
Karol Herbst	71c66c254b	nir: add support for gather offsets Values inside the offsets parameter of textureGatherOffsets are required to be constants in the range of [GL_MIN_PROGRAM_TEXTURE_GATHER_OFFSET, GL_MAX_PROGRAM_TEXTURE_GATHER_OFFSET]. As this range is never outside [-32, 31] for all existing drivers inside mesa, we can simply store the offsets as a int8_t[4][2] array inside nir_tex_instr. Right now only Nvidia hardware supports this in hardware, so we can turn this on inside Nouveau for the NIR path as it is already enabled with the TGSI one. v2: use memcpy instead of for loops add missing bits to nir_instr_set don't show offsets if they are all 0 v3: default offsets aren't all 0 v4: rename offsets -> tg4_offsets rename nir_tex_instr_has_explicit_offsets -> nir_tex_instr_has_explicit_tg4_offsets Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-21 02:58:41 +00:00
Dave Airlie	b95b33a5c7	nir/deref: remove casts of casts which are likely redundant (v3) Not sure how ptr_stride should be taken into account if at all here v2: reorder check to avoid src walking (Jason) v3: remove is_cast_cast checks, keep going afterwards (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-21 10:58:06 +10:00
Dave Airlie	3b3653c4cf	nir/spirv: don't use bare types, remove assert in split vars for testing For OpenCL we never want to strip the info from the types, and it makes type comparisons easier in later stages. We might later need a nir pass to strip this for GLSL, but so far the only regression is the assert and Jason said removing that is fine. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Dave Airlie <airlied@redhat.com>	2019-03-21 10:25:40 +10:00
Juan A. Suarez Romero	efcf9c9f9f	nir: deref only for OpTypePointer Fixes dEQP-VK.binding_model.buffer_device_address.* and dEQP-VK.ssbo.phys.layout* Vulkan CTS tests. v2: set val->type->stride in the section below (Jason) v3: restore val->type->type to original place (Jason) Fixes: `d0ba326f23` ("nir/spirv: support physical pointers") CC: Karol Herbst <kherbst@redhat.com> CC: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-20 19:26:32 +00:00
Jason Ekstrand	0b7e5bdbd4	nir: Constant values are per-column not per-component Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-03-20 09:26:56 -05:00
Andres Gomez	ab28dca033	Revert "glsl: relax input->output validation for SSO programs" This reverts commit `1aa5738e66`. This patch incorrectly asumed that for SSOs no inner interface matching check was needed. From the ARB_separate_shader_objects spec v.25: " With separable program objects, interfaces between shader stages may involve the outputs from one program object and the inputs from a second program object. For such interfaces, it is not possible to detect mismatches at link time, because the programs are linked separately. When each such program is linked, all inputs or outputs interfacing with another program stage are treated as active. The linker will generate an executable that assumes the presence of a compatible program on the other side of the interface. If a mismatch between programs occurs, no GL error will be generated, but some or all of the inputs on the interface will be undefined." This completes the fix from commit: `3be05dd267` ("glsl/linker: don't fail non static used inputs without matching outputs") Fixes: `1aa5738e66` ("glsl: relax input->output validation for SSO programs") Cc: Tapani Pälli <tapani.palli@intel.com> Cc: Timothy Arceri <tarceri@itsqueeze.com> Cc: Ilia Mirkin <imirkin@alum.mit.edu> Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-19 17:36:20 +02:00
Andres Gomez	422882e78f	glsl/linker: simplify xfb_offset vs xfb_stride overflow check Current implementation uses a complicated calculation which relies in an implicit conversion to check the integral part of 2 division results. However, the calculation actually checks that the xfb_offset is smaller or a multiplier of the xfb_stride. For example, while this is expected to fail, it actually succeeds: " ... layout(xfb_buffer = 2, xfb_stride = 12) out block3 { layout(xfb_offset = 0) vec3 c; layout(xfb_offset = 12) vec3 d; // ERROR, requires stride of 24 }; ... " Fixes: `2fab85aaea` ("glsl: add xfb_stride link time validation") Cc: Timothy Arceri <tarceri@itsqueeze.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-19 17:23:27 +02:00
Andres Gomez	3be05dd267	glsl/linker: don't fail non static used inputs without matching outputs If there is no Static Use of an input variable, the linker shouldn't fail whenever there is no defined matching output variable in the previous stage. From page 47 (page 51 of the PDF) of the GLSL 4.60 v.5 spec: " Only the input variables that are statically read need to be written by the previous stage; it is allowed to have superfluous declarations of input variables." Now, we complete this exception whenever the input variable has an explicit location. Previously, `18004c338f` ("glsl: fail when a shader's input var has not an equivalent out var in previous") took care of the cases in which the input variable didn't have an explicit location. v2: do the location based interface matching check regardless on whether it is a separable program or not (Ilia). Fixes: `1aa5738e66` ("glsl: relax input->output validation for SSO programs") Cc: Timothy Arceri <tarceri@itsqueeze.com> Cc: Iago Toral Quiroga <itoral@igalia.com> Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: Tapani Pälli <tapani.palli@intel.com> Cc: Ian Romanick <ian.d.romanick@intel.com> Cc: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-19 17:23:27 +02:00
Andres Gomez	de1bc2d19a	glsl/linker: always validate explicit location among inputs Outputs are always validated when having explicit locations and we were trusting its outcome to catch similar problems with the inputs since, in case of having undefined outputs for existing inputs, we would be already reporting a linker error. However, consider this case: " Shader stage n: --------------- ... layout(location = 0) out float a; ... Shader stage n+1: ----------------- ... layout(location = 0) in float b; layout(location = 0) in float c; ... " Currently, this won't report a linker error even though location aliasing is happening for the inputs. Therefore, we also need to validate the inputs independently from the outcome of the outputs validation. Cc: Timothy Arceri <tarceri@itsqueeze.com> Cc: Iago Toral Quiroga <itoral@igalia.com> Cc: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-19 17:23:27 +02:00
Andres Gomez	a96093136b	glsl: correctly validate component layout qualifier for dvec{3,4} From page 62 (page 68 of the PDF) of the GLSL 4.50 v.7 spec: " A dvec3 or dvec4 can only be declared without specifying a component." Therefore, using the "component" qualifier with a dvec3 or dvec4 should result in a compiling error. v2: enhance the error message (Timothy). Fixes: `94438578d2` ("glsl: validate and store component layout qualifier in GLSL IR") Cc: Timothy Arceri <tarceri@itsqueeze.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-19 17:23:27 +02:00
Jason Ekstrand	cbfe31ccbe	Revert "nir: const `nir_call_instr::callee`" This reverts commit `db57db5317`. When building IR, nothing is really immutable and, since C has no concept of constness propagating beyond the first pointer, we have to be vary careful with how we use it. To just throw const into a function like this is a lie. Instead, we should just drop the unneeded const in spirv_to_nir which this commit does along with the revert.	2019-03-19 10:19:42 -05:00
Eric Engestrom	db57db5317	nir: const `nir_call_instr::callee` Fixes: `c95afe56a8` "nir/spirv: handle kernel function parameters" Cc: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Tapani Pälli <tapani.palli@intel.com> Acked-by: Karol Herbst <kherbst@redhat.com>	2019-03-19 12:51:53 +00:00
Karol Herbst	d0ba326f23	nir/spirv: support physical pointers v2: add load_kernel_input Signed-off-by: Karol Herbst <kherbst@redhat.com> squash! nir/spirv: support physical pointers	2019-03-19 04:08:07 +00:00
Karol Herbst	c95afe56a8	nir/spirv: handle kernel function parameters the idea here is to generate an entry point stub function wrapping around the actual kernel function and turn all parameters into shader inputs with byte addressing instead of vec4. This gives us several advantages: 1. calling kernel functions doesn't differ from calling any other function 2. CL inputs match uniforms in most ways and we can just take advantage of most of nir_lower_io v2: move code into a seperate function v3: verify the entry point got a name fix minor typo v4: make vtn_emit_kernel_entry_point_wrapper take the old entry point as an arg Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-19 04:08:07 +00:00
Karol Herbst	0ccdf23a57	nir/lower_locals_to_regs: cast array index to 32 bit local memory is too small to require 64 bit pointers, so cast the array index to a 32 bit value to save up on 64 bit operations. Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-19 04:08:07 +00:00
Karol Herbst	44d32e62fb	glsl: add cl_size and cl_alignment Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-03-19 04:08:07 +00:00
Karol Herbst	659f333b3a	glsl: add packed for struct types We need this for OpenCL kernels because we have to apply C rules for alignment and padding inside structs and for this we also have to know if a struct is packed or not. v2: fix for kernel params Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-03-19 04:08:07 +00:00
Jason Ekstrand	35b8f6f40b	nir: Add a new pass to lower array dereferences on vectors This pass was originally written for lowering TCS output reads and writes but it is also applicable just about anything including UBOs, SSBOs, and shared variables. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 23:10:27 -05:00
Jason Ekstrand	fe9a6c0f14	nir/builder: Add a vector extract helper This one's a tiny bit better than what we had in spirv_to_nir because it emits a binary tree rather than a linear walk. It also doesn't leave around unneeded bcsel instructions for a constant index and returns an undef for constant OOB access. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 23:10:26 -05:00
Alejandro Piñeiro	34b3b92bbe	nir/xfb: move varyings info out of nir_xfb_info When varyings was added we moved to use to dynamycally allocated pointers, instead of allocating just one block for everything. That breaks some assumptions of some vulkan drivers (like anv), that make serialization and copying easier. And at the same time, varyings are not needed for vulkan. So this commit moves them out. Although it seems a little an overkill, fixing the anv side would require a similar, or more, changes, so in the end it is about to decide where do we want to put our effort. v2: (from Jason review) * Don't use a temp variable on the _create methods, just return result of rzalloc_size * Wrap some lines too long. Fixes: `cf0b2ad486` ("nir/xfb: adding varyings on nir_xfb_info and gather_info") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-15 11:59:32 +01:00
Jason Ekstrand	810dde2a6b	glsl/nir: Add a pass to lower UBO and SSBO access Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	77e5ec394e	glsl/nir: Handle unlowered SSBO atomic and array_length intrinsics We didn't have any of these before because all NIR consumers always called lower_ubo_references. Soon, we want to pass the derefs straight through to NIR so we need to handle these intrinsics directly. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	76ba225184	glsl/nir: Set explicit types on UBO/SSBO variables We want to be able to use variables and derefs for UBO/SSBO access in NIR. In order to do this, the rest of NIR needs to know the type layout information. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	8f3ab8aa78	glsl: Don't lower vector derefs for SSBOs, UBOs, and shared All of these are backed by some sort of memory so if you have multiple threads writing to different components of the same vector at the same time, the load-vec-store pattern that GLSL IR emits won't work. This shouldn't affect any drivers today as they all call GLSL IR lowering which lowers access to these variables to index+offset intrinsics before we get to this point. However, NIR will start handling the derefs itself and won't want the lowering. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	3c11fc7654	nir/lower_io: Add a new buffer_array_length intrinsic and lowering Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	c8d42c8cf6	nir: Rename nir_address_format_vk_index_offset to not be vk It's just a 32-bit index and offset. We're going to want to use it in GL as well so stop talking about Vulkan. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	60af3a93e9	nir/deref: Consider COHERENT decorated var derefs as aliasing If we get to two deref_var paths with different variables, we usually know they don't alias. However, if both of the paths are marked coherent, we don't have to worry about it. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	8b073832ff	compiler/types: Add helpers to get explicit types for standard layouts We also need to modify the current size/align helpers to not blow up when they encounter an explicitly laid out type. Previously we considered using the size/align helpers mutually exclusive with standard layouts but now we just assert that they match. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	5b2b144566	compiler/types: Add a C wrapper to get full struct field data Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	ef4ca44780	compiler/types: Add a new is_interface C wrapper Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	b315f6f82b	nir/validate: Allow 32-bit boolean load/store intrinsics With UBOs and SSBOs we have boolean types but they're actually 32-bit values. Make the validator a little less strict so that we can do a 32-bit load/store on boolean types. We're about to add a lowering pass called gl_nir_lower_buffers which will lower boolean load/store operations to 32-bit and insert i2b and b2i instructions to convert to/from 1-bit booleans. We want that to be legal. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	5d26f2d3d5	nir/validate: Only require bare types to match for copy_deref If we want to be able to use copy_deref instructions on explicitly laid out types, we have to be a little more flexible about what types we allow. Instead, of requiring the types to exactly match, only require the bare types to match. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	2b76de9b5d	nir/algebraic: Add a couple optimizations for iabs and ishr Shader-db results on Kaby Lake: total instructions in shared programs: 15225213 -> 15222365 (-0.02%) instructions in affected programs: 43524 -> 40676 (-6.54%) helped: 203 HURT: 0 Lots of shaders in Shadow Warrior had this pattern along with Deus Ex, Civ, Shadow of Mordor, and several others. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-03-15 01:02:19 +00:00
Eduardo Lima Mitev	6ff50a488a	nir: Add ir3-specific version of most SSBO intrinsics These are ir3 specific versions of SSBO intrinsics that add an extra source to hold the element offset (dword), which is what the backend instructions need. The original byte-offset source provided by NIR is not replaced because on a4xx and a5xx the backend still needs it. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-13 21:19:44 +01:00
Caio Marcelo de Oliveira Filho	822a8865e4	nir: Add a pass to combine store_derefs to same vector v2: (all from Jason) Reuse existing function for the end of the block combinations. Check the SSA values are coming from the right place in tests. Document the case when the store to array_deref is reused. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-13 08:39:16 -07:00
Jason Ekstrand	bd17bdc56b	glsl/lower_vector_derefs: Don't use a temporary for TCS outputs Tessellation control shader outputs act as if they have memory backing them and you can have multiple writes to different components of the same vector in-flight at the same time. When this happens, the load vec store pattern that gets used by ir_triop_vector_insert doesn't yield the correct results. Instead, just emit a sequence of conditional assignments. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Cc: mesa-stable@lists.freedesktop.org	2019-03-13 02:10:31 +00:00
Jason Ekstrand	20c4578c55	glsl/list: Add a list variant of insert_after Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-13 02:10:31 +00:00
Jason Ekstrand	83fdefc062	nir/loop_unroll: Fix out-of-bounds access handling The previous code was completely broken when it came to constructing the undef values. I'm not sure how it ever worked. For the case of a copy that reads an undefined value, we can just delete the copy because the destination is a valid undefined value. This saves us the effort of trying to construct a value for an arbitrary copy_deref intrinsic. Fixes: `e8a8937a04` "nir: add partial loop unrolling support" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-12 21:06:39 -05:00
Jason Ekstrand	5ef2b8f1f2	nir: Add a pass for lowering IO back to vector when possible This pass tries to turn scalar and array-of-scalar IO variables into vector IO variables whenever possible. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Cc: "19.0" <mesa-stable@lists.freedesktop.org>	2019-03-12 15:34:06 +00:00
Connor Abbott	5b2ec9c81e	nir: Add a stripping pass for improved cacheability Oftentimes various nir shaders after lowering will be the same, or almost the same. For example, this can happen when the same shader is linked with different shaders to form different pipelines and cross-stage optimizations don't kick in to change it. We want to avoid running the backend twice on these shaders. We were already doing this with radeonsi, but we were storing a few extra pieces of information that made this much less effective compared to TGSI. The worse offender by far was the program name, which caused most of the cache misses. This pass strips out these pieces of information, controlled by the NIR_STRIP debug env variable. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-12 10:49:48 +01:00
Brian Paul	02c2863df5	nir: silence a couple new compiler warnings [33/630] Compiling C object 'src/compiler/nir/nir@sta/nir_loop_analyze.c.o'. ../src/compiler/nir/nir_loop_analyze.c: In function ‘try_find_trip_count_vars_in_iand’: ../src/compiler/nir/nir_loop_analyze.c:846:29: warning: suggest parentheses around ‘&&’ within ‘\|\|’ [-Wparentheses] if (ind == NULL \|\| ind && (ind)->type != basic_induction \|\| ^ [85/630] Compiling C object 'src/compiler/nir/nir@sta/nir_opt_loop_unroll.c.o'. ../src/compiler/nir/nir_opt_loop_unroll.c: In function ‘complex_unroll_single_terminator’: ../src/compiler/nir/nir_opt_loop_unroll.c:494:17: warning: unused variable ‘unroll_loc’ [-Wunused-variable] nir_cf_node unroll_loc = ^ Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-12 14:34:51 +11:00
Timothy Arceri	3235a942c1	nir: find induction/limit vars in iand instructions This will be used to help find the trip count of loops that look like the following: while (a < x && i < 8) { ... i++; } Where the NIR will end up looking something like this: vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */) loop { ... vec1 1 ssa_12 = ilt ssa_225, ssa_11 vec1 1 ssa_17 = ilt ssa_226, ssa_1 vec1 1 ssa_18 = iand ssa_12, ssa_17 vec1 1 ssa_19 = inot ssa_18 if ssa_19 { ... break } else { ... } } On RADV this unrolls a bunch of loops in F1-2017 shaders. Totals from affected shaders: SGPRS: 4112 -> 4136 (0.58 %) VGPRS: 4132 -> 4052 (-1.94 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 515444 -> 587720 (14.02 %) bytes LDS: 2 -> 2 (0.00 %) blocks Max Waves: 194 -> 196 (1.03 %) Wait states: 0 -> 0 (0.00 %) It also unrolls a couple of loops in shader-db on radeonsi. Totals from affected shaders: SGPRS: 128 -> 128 (0.00 %) VGPRS: 64 -> 64 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 6880 -> 9504 (38.14 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 16 -> 16 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	67c3478482	nir: pass nir_op to calculate_iterations() Rather than getting this from the alu instruction this allows us some flexibility. In the following pass we instead pass the inverse op. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	11e8f8a166	nir: add get_induction_and_limit_vars() helper to loop analysis This helps make find_trip_count() a little easier to follow but will also be used by a following patch. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	f219f6114d	nir: add helper to return inversion op of a comparison This will be used to help find the trip count of loops that look like the following: while (a < x && i < 8) { ... i++; } Where the NIR will end up looking something like this: vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */) loop { ... vec1 1 ssa_12 = ilt ssa_225, ssa_11 vec1 1 ssa_17 = ilt ssa_226, ssa_1 vec1 1 ssa_18 = iand ssa_12, ssa_17 vec1 1 ssa_19 = inot ssa_18 if ssa_19 { ... break } else { ... } } So in order to find the trip count we need to find the inverse of ilt. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	090feaacdc	nir: simplify the loop analysis trip count code a little Here we create a helper is_supported_terminator_condition() and use that rather than embedding all the trip count code inside a switch. The new helper will also be used in a following patch. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	7571de8eaa	nir: unroll some loops with a variable limit For some loops can have a single terminator but the exact trip count is still unknown. For example: for (int i = 0; i < imin(x, 4); i++) ... Shader-db results radeonsi (all affected are from Tropico 5): Totals from affected shaders: SGPRS: 144 -> 152 (5.56 %) VGPRS: 124 -> 108 (-12.90 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 5180 -> 6640 (28.19 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 17 -> 21 (23.53 %) Wait states: 0 -> 0 (0.00 %) Shader-db results i965 (SKL): total loops in shared programs: 3808 -> 3802 (-0.16%) loops in affected programs: 6 -> 0 helped: 6 HURT: 0 vkpipeline-db results RADV (Unrolls some Skyrim VR shaders): Totals from affected shaders: SGPRS: 304 -> 304 (0.00 %) VGPRS: 296 -> 292 (-1.35 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 15756 -> 25884 (64.28 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 29 -> 29 (0.00 %) Wait states: 0 -> 0 (0.00 %) v2: fix bug where last iteration would get optimised away by mistake. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	68ce0ec222	nir: calculate trip count for more loops This adds support to loop analysis for loops where the induction variable is compared to the result of min(variable, constant). For example: for (int i = 0; i < imin(x, 4); i++) ... We add a new bool to the loop terminator struct in order to differentiate terminators with this exit condition. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	e8a8937a04	nir: add partial loop unrolling support This adds partial loop unrolling support and makes use of a guessed trip count based on array access. The code is written so that we could use partial unrolling more generally, but for now it's only use when we have guessed the trip count. We use partial unrolling for this guessed trip count because its possible any out of bounds array access doesn't otherwise affect the shader e.g the stores/loads to/from the array are unused. So we insert a copy of the loop in the innermost continue branch of the unrolled loop. Later on its possible for nir_opt_dead_cf() to then remove the loop in some cases. A Renderdoc capture from the Rise of the Tomb Raider benchmark, reports the following change in an affected compute shader: GPU duration: 350 -> 325 microseconds shader-db results radeonsi VEGA (NIR backend): SGPRS: 1008 -> 816 (-19.05 %) VGPRS: 684 -> 432 (-36.84 %) Spilled SGPRs: 539 -> 0 (-100.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 39708 -> 45812 (15.37 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 105 -> 144 (37.14 %) Wait states: 0 -> 0 (0.00 %) shader-db results i965 SKL: total instructions in shared programs: 13098265 -> 13103359 (0.04%) instructions in affected programs: 5126 -> 10220 (99.38%) helped: 0 HURT: 21 total cycles in shared programs: 332039949 -> 331985622 (-0.02%) cycles in affected programs: 289252 -> 234925 (-18.78%) helped: 12 HURT: 9 vkpipeline-db results VEGA: Totals from affected shaders: SGPRS: 184 -> 184 (0.00 %) VGPRS: 448 -> 448 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 26076 -> 24428 (-6.32 %) bytes LDS: 6 -> 6 (0.00 %) blocks Max Waves: 5 -> 5 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	fba5d275db	nir: add new partially_unrolled bool to nir_loop In order to stop continuously partially unrolling the same loop we add the bool partially_unrolled to nir_loop, we add it here rather than in nir_loop_info because nir_loop_info is only set via loop analysis and is intended to be cleared before each analysis. Also nir_loop_info is never cloned. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00

1 2 3 4 5 ...

3470 commits