fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-24 11:00:11 +01:00

Author	SHA1	Message	Date
Karol Herbst	71c66c254b	nir: add support for gather offsets Values inside the offsets parameter of textureGatherOffsets are required to be constants in the range of [GL_MIN_PROGRAM_TEXTURE_GATHER_OFFSET, GL_MAX_PROGRAM_TEXTURE_GATHER_OFFSET]. As this range is never outside [-32, 31] for all existing drivers inside mesa, we can simply store the offsets as a int8_t[4][2] array inside nir_tex_instr. Right now only Nvidia hardware supports this in hardware, so we can turn this on inside Nouveau for the NIR path as it is already enabled with the TGSI one. v2: use memcpy instead of for loops add missing bits to nir_instr_set don't show offsets if they are all 0 v3: default offsets aren't all 0 v4: rename offsets -> tg4_offsets rename nir_tex_instr_has_explicit_offsets -> nir_tex_instr_has_explicit_tg4_offsets Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-21 02:58:41 +00:00
Dave Airlie	b95b33a5c7	nir/deref: remove casts of casts which are likely redundant (v3) Not sure how ptr_stride should be taken into account if at all here v2: reorder check to avoid src walking (Jason) v3: remove is_cast_cast checks, keep going afterwards (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-21 10:58:06 +10:00
Dave Airlie	3b3653c4cf	nir/spirv: don't use bare types, remove assert in split vars for testing For OpenCL we never want to strip the info from the types, and it makes type comparisons easier in later stages. We might later need a nir pass to strip this for GLSL, but so far the only regression is the assert and Jason said removing that is fine. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Dave Airlie <airlied@redhat.com>	2019-03-21 10:25:40 +10:00
Jason Ekstrand	0b7e5bdbd4	nir: Constant values are per-column not per-component Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-03-20 09:26:56 -05:00
Jason Ekstrand	cbfe31ccbe	Revert "nir: const `nir_call_instr::callee`" This reverts commit `db57db5317`. When building IR, nothing is really immutable and, since C has no concept of constness propagating beyond the first pointer, we have to be vary careful with how we use it. To just throw const into a function like this is a lie. Instead, we should just drop the unneeded const in spirv_to_nir which this commit does along with the revert.	2019-03-19 10:19:42 -05:00
Eric Engestrom	db57db5317	nir: const `nir_call_instr::callee` Fixes: `c95afe56a8` "nir/spirv: handle kernel function parameters" Cc: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Tapani Pälli <tapani.palli@intel.com> Acked-by: Karol Herbst <kherbst@redhat.com>	2019-03-19 12:51:53 +00:00
Karol Herbst	d0ba326f23	nir/spirv: support physical pointers v2: add load_kernel_input Signed-off-by: Karol Herbst <kherbst@redhat.com> squash! nir/spirv: support physical pointers	2019-03-19 04:08:07 +00:00
Karol Herbst	0ccdf23a57	nir/lower_locals_to_regs: cast array index to 32 bit local memory is too small to require 64 bit pointers, so cast the array index to a 32 bit value to save up on 64 bit operations. Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-19 04:08:07 +00:00
Jason Ekstrand	35b8f6f40b	nir: Add a new pass to lower array dereferences on vectors This pass was originally written for lowering TCS output reads and writes but it is also applicable just about anything including UBOs, SSBOs, and shared variables. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 23:10:27 -05:00
Jason Ekstrand	fe9a6c0f14	nir/builder: Add a vector extract helper This one's a tiny bit better than what we had in spirv_to_nir because it emits a binary tree rather than a linear walk. It also doesn't leave around unneeded bcsel instructions for a constant index and returns an undef for constant OOB access. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 23:10:26 -05:00
Alejandro Piñeiro	34b3b92bbe	nir/xfb: move varyings info out of nir_xfb_info When varyings was added we moved to use to dynamycally allocated pointers, instead of allocating just one block for everything. That breaks some assumptions of some vulkan drivers (like anv), that make serialization and copying easier. And at the same time, varyings are not needed for vulkan. So this commit moves them out. Although it seems a little an overkill, fixing the anv side would require a similar, or more, changes, so in the end it is about to decide where do we want to put our effort. v2: (from Jason review) * Don't use a temp variable on the _create methods, just return result of rzalloc_size * Wrap some lines too long. Fixes: `cf0b2ad486` ("nir/xfb: adding varyings on nir_xfb_info and gather_info") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-15 11:59:32 +01:00
Jason Ekstrand	3c11fc7654	nir/lower_io: Add a new buffer_array_length intrinsic and lowering Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	c8d42c8cf6	nir: Rename nir_address_format_vk_index_offset to not be vk It's just a 32-bit index and offset. We're going to want to use it in GL as well so stop talking about Vulkan. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	60af3a93e9	nir/deref: Consider COHERENT decorated var derefs as aliasing If we get to two deref_var paths with different variables, we usually know they don't alias. However, if both of the paths are marked coherent, we don't have to worry about it. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	b315f6f82b	nir/validate: Allow 32-bit boolean load/store intrinsics With UBOs and SSBOs we have boolean types but they're actually 32-bit values. Make the validator a little less strict so that we can do a 32-bit load/store on boolean types. We're about to add a lowering pass called gl_nir_lower_buffers which will lower boolean load/store operations to 32-bit and insert i2b and b2i instructions to convert to/from 1-bit booleans. We want that to be legal. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	5d26f2d3d5	nir/validate: Only require bare types to match for copy_deref If we want to be able to use copy_deref instructions on explicitly laid out types, we have to be a little more flexible about what types we allow. Instead, of requiring the types to exactly match, only require the bare types to match. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	2b76de9b5d	nir/algebraic: Add a couple optimizations for iabs and ishr Shader-db results on Kaby Lake: total instructions in shared programs: 15225213 -> 15222365 (-0.02%) instructions in affected programs: 43524 -> 40676 (-6.54%) helped: 203 HURT: 0 Lots of shaders in Shadow Warrior had this pattern along with Deus Ex, Civ, Shadow of Mordor, and several others. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-03-15 01:02:19 +00:00
Eduardo Lima Mitev	6ff50a488a	nir: Add ir3-specific version of most SSBO intrinsics These are ir3 specific versions of SSBO intrinsics that add an extra source to hold the element offset (dword), which is what the backend instructions need. The original byte-offset source provided by NIR is not replaced because on a4xx and a5xx the backend still needs it. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-13 21:19:44 +01:00
Caio Marcelo de Oliveira Filho	822a8865e4	nir: Add a pass to combine store_derefs to same vector v2: (all from Jason) Reuse existing function for the end of the block combinations. Check the SSA values are coming from the right place in tests. Document the case when the store to array_deref is reused. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-13 08:39:16 -07:00
Jason Ekstrand	83fdefc062	nir/loop_unroll: Fix out-of-bounds access handling The previous code was completely broken when it came to constructing the undef values. I'm not sure how it ever worked. For the case of a copy that reads an undefined value, we can just delete the copy because the destination is a valid undefined value. This saves us the effort of trying to construct a value for an arbitrary copy_deref intrinsic. Fixes: `e8a8937a04` "nir: add partial loop unrolling support" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-12 21:06:39 -05:00
Jason Ekstrand	5ef2b8f1f2	nir: Add a pass for lowering IO back to vector when possible This pass tries to turn scalar and array-of-scalar IO variables into vector IO variables whenever possible. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Cc: "19.0" <mesa-stable@lists.freedesktop.org>	2019-03-12 15:34:06 +00:00
Connor Abbott	5b2ec9c81e	nir: Add a stripping pass for improved cacheability Oftentimes various nir shaders after lowering will be the same, or almost the same. For example, this can happen when the same shader is linked with different shaders to form different pipelines and cross-stage optimizations don't kick in to change it. We want to avoid running the backend twice on these shaders. We were already doing this with radeonsi, but we were storing a few extra pieces of information that made this much less effective compared to TGSI. The worse offender by far was the program name, which caused most of the cache misses. This pass strips out these pieces of information, controlled by the NIR_STRIP debug env variable. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-12 10:49:48 +01:00
Brian Paul	02c2863df5	nir: silence a couple new compiler warnings [33/630] Compiling C object 'src/compiler/nir/nir@sta/nir_loop_analyze.c.o'. ../src/compiler/nir/nir_loop_analyze.c: In function ‘try_find_trip_count_vars_in_iand’: ../src/compiler/nir/nir_loop_analyze.c:846:29: warning: suggest parentheses around ‘&&’ within ‘\|\|’ [-Wparentheses] if (ind == NULL \|\| ind && (ind)->type != basic_induction \|\| ^ [85/630] Compiling C object 'src/compiler/nir/nir@sta/nir_opt_loop_unroll.c.o'. ../src/compiler/nir/nir_opt_loop_unroll.c: In function ‘complex_unroll_single_terminator’: ../src/compiler/nir/nir_opt_loop_unroll.c:494:17: warning: unused variable ‘unroll_loc’ [-Wunused-variable] nir_cf_node unroll_loc = ^ Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-12 14:34:51 +11:00
Timothy Arceri	3235a942c1	nir: find induction/limit vars in iand instructions This will be used to help find the trip count of loops that look like the following: while (a < x && i < 8) { ... i++; } Where the NIR will end up looking something like this: vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */) loop { ... vec1 1 ssa_12 = ilt ssa_225, ssa_11 vec1 1 ssa_17 = ilt ssa_226, ssa_1 vec1 1 ssa_18 = iand ssa_12, ssa_17 vec1 1 ssa_19 = inot ssa_18 if ssa_19 { ... break } else { ... } } On RADV this unrolls a bunch of loops in F1-2017 shaders. Totals from affected shaders: SGPRS: 4112 -> 4136 (0.58 %) VGPRS: 4132 -> 4052 (-1.94 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 515444 -> 587720 (14.02 %) bytes LDS: 2 -> 2 (0.00 %) blocks Max Waves: 194 -> 196 (1.03 %) Wait states: 0 -> 0 (0.00 %) It also unrolls a couple of loops in shader-db on radeonsi. Totals from affected shaders: SGPRS: 128 -> 128 (0.00 %) VGPRS: 64 -> 64 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 6880 -> 9504 (38.14 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 16 -> 16 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	67c3478482	nir: pass nir_op to calculate_iterations() Rather than getting this from the alu instruction this allows us some flexibility. In the following pass we instead pass the inverse op. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	11e8f8a166	nir: add get_induction_and_limit_vars() helper to loop analysis This helps make find_trip_count() a little easier to follow but will also be used by a following patch. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	f219f6114d	nir: add helper to return inversion op of a comparison This will be used to help find the trip count of loops that look like the following: while (a < x && i < 8) { ... i++; } Where the NIR will end up looking something like this: vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */) loop { ... vec1 1 ssa_12 = ilt ssa_225, ssa_11 vec1 1 ssa_17 = ilt ssa_226, ssa_1 vec1 1 ssa_18 = iand ssa_12, ssa_17 vec1 1 ssa_19 = inot ssa_18 if ssa_19 { ... break } else { ... } } So in order to find the trip count we need to find the inverse of ilt. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	090feaacdc	nir: simplify the loop analysis trip count code a little Here we create a helper is_supported_terminator_condition() and use that rather than embedding all the trip count code inside a switch. The new helper will also be used in a following patch. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	7571de8eaa	nir: unroll some loops with a variable limit For some loops can have a single terminator but the exact trip count is still unknown. For example: for (int i = 0; i < imin(x, 4); i++) ... Shader-db results radeonsi (all affected are from Tropico 5): Totals from affected shaders: SGPRS: 144 -> 152 (5.56 %) VGPRS: 124 -> 108 (-12.90 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 5180 -> 6640 (28.19 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 17 -> 21 (23.53 %) Wait states: 0 -> 0 (0.00 %) Shader-db results i965 (SKL): total loops in shared programs: 3808 -> 3802 (-0.16%) loops in affected programs: 6 -> 0 helped: 6 HURT: 0 vkpipeline-db results RADV (Unrolls some Skyrim VR shaders): Totals from affected shaders: SGPRS: 304 -> 304 (0.00 %) VGPRS: 296 -> 292 (-1.35 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 15756 -> 25884 (64.28 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 29 -> 29 (0.00 %) Wait states: 0 -> 0 (0.00 %) v2: fix bug where last iteration would get optimised away by mistake. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	68ce0ec222	nir: calculate trip count for more loops This adds support to loop analysis for loops where the induction variable is compared to the result of min(variable, constant). For example: for (int i = 0; i < imin(x, 4); i++) ... We add a new bool to the loop terminator struct in order to differentiate terminators with this exit condition. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	e8a8937a04	nir: add partial loop unrolling support This adds partial loop unrolling support and makes use of a guessed trip count based on array access. The code is written so that we could use partial unrolling more generally, but for now it's only use when we have guessed the trip count. We use partial unrolling for this guessed trip count because its possible any out of bounds array access doesn't otherwise affect the shader e.g the stores/loads to/from the array are unused. So we insert a copy of the loop in the innermost continue branch of the unrolled loop. Later on its possible for nir_opt_dead_cf() to then remove the loop in some cases. A Renderdoc capture from the Rise of the Tomb Raider benchmark, reports the following change in an affected compute shader: GPU duration: 350 -> 325 microseconds shader-db results radeonsi VEGA (NIR backend): SGPRS: 1008 -> 816 (-19.05 %) VGPRS: 684 -> 432 (-36.84 %) Spilled SGPRs: 539 -> 0 (-100.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 39708 -> 45812 (15.37 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 105 -> 144 (37.14 %) Wait states: 0 -> 0 (0.00 %) shader-db results i965 SKL: total instructions in shared programs: 13098265 -> 13103359 (0.04%) instructions in affected programs: 5126 -> 10220 (99.38%) helped: 0 HURT: 21 total cycles in shared programs: 332039949 -> 331985622 (-0.02%) cycles in affected programs: 289252 -> 234925 (-18.78%) helped: 12 HURT: 9 vkpipeline-db results VEGA: Totals from affected shaders: SGPRS: 184 -> 184 (0.00 %) VGPRS: 448 -> 448 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 26076 -> 24428 (-6.32 %) bytes LDS: 6 -> 6 (0.00 %) blocks Max Waves: 5 -> 5 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	fba5d275db	nir: add new partially_unrolled bool to nir_loop In order to stop continuously partially unrolling the same loop we add the bool partially_unrolled to nir_loop, we add it here rather than in nir_loop_info because nir_loop_info is only set via loop analysis and is intended to be cleared before each analysis. Also nir_loop_info is never cloned. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	03a452b7d0	nir: add guess trip count support to loop analysis This detects an induction variable used as an array index to guess the trip count of the loop. This enables us to do a partial unroll of the loop, which can eventually result in the loop being eliminated. v2: check if the induction var is used to index more than a single array and if so get the size of the smallest array. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Connor Abbott	d086d16b81	nir/serialize: Prevent writing uninitialized state_slot data The nir_state_slot struct had some padding that was never initialized. Serializing the individual parts of the struct is more robust and avoids the overhead of zeroing it at creation, so just do that. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-11 15:17:41 +01:00
Kenneth Graunke	da51e3f1b0	Revert MR 369 (Fix extract_i8 and extract_u8 for 64-bit integers) This broke piles of image load store tests (179 failures on CI, mesa_master build #15546, previous build right before this landed was green). I'd rather not leave the tree on fire over the weekend, so let's revert for now, and we can figure out what happened next week.	2019-03-09 01:42:16 -08:00
Ian Romanick	18e4bf65de	nir/algebraic: Add missing 16-bit extract_[iu]8 patterns No shader-db changes on any Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-08 22:24:19 -08:00
Ian Romanick	55c1ac4b75	nir/algebraic: Add missing 64-bit extract_[iu]8 patterns No shader-db changes on any Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-08 22:24:19 -08:00
Ian Romanick	9aaaac6080	nir/algebraic: Remove redundant extract_[iu]8 patterns No shader-db changes on any Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-08 22:24:19 -08:00
Ian Romanick	37ee462e03	nir/algebraic: Fix up extract_[iu]8 after loop unrolling Skylake, Broadwell, and Haswell had similar results. (Skylake shown) total instructions in shared programs: 15256840 -> 15256837 (<.01%) instructions in affected programs: 4713 -> 4710 (-0.06%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.06% max: 0.08% x̄: 0.06% x̃: 0.06% total cycles in shared programs: 372286583 -> 372286583 (0.00%) cycles in affected programs: 198516 -> 198516 (0.00%) helped: 1 HURT: 1 helped stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 0.01% max: 0.01% x̄: 0.01% x̃: 0.01% No changes on any other Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-08 22:24:19 -08:00
Alejandro Piñeiro	b2a212ac2e	nir/xfb: handle arrays and AoA of basic types On OpenGL, a array of a simple type adds just one varying. So gl_transform_feedback_varying_info struct defined at mtypes.h includes the parameters Type (base_type) and Size (number of elements). This commit checks this when the recursive add_var_xfb_outputs call handles arrays, to ensure that just one is addded. We also need to take into account AoA here v2: use glsl_type_is_leaf from nir_types (Timothy Arceri) v3: simplified aoa check, without the need ot using glsl_type_is_leaf, using glsl_types_is_struct (Timothy Arceri) Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-08 15:00:50 +01:00
Alejandro Piñeiro	8d693746e9	nir/xfb: sort varyings too Right now we are only re-sorting outputs. But it is better to sort too varyings, as linker expect them to be sorted out (as it was done on GLSL). For varyings, and to make easier to compute buffer_index, we sort also by buffer. We could do the same for outputs, but we lack a reason for that, so we left it as it is (just offset). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-08 15:00:50 +01:00
Alejandro Piñeiro	cf0b2ad486	nir/xfb: adding varyings on nir_xfb_info and gather_info In order to be used for OpenGL (right now for ARB_gl_spirv). This commit adds two new structures: * nir_xfb_varying_info: that identifies each individual varying. For each one, we need to know the type, buffer and xfb_offset * nir_xfb_buffer_info: as now for each buffer, in addition to the stride, we need to know how many varyings are assigned to it. For this patch, the only case where num_outputs != num_varyings is with the case of doubles, that for dvec3/4 could require more than one output. There are more cases though (like aoa), that will be handled on following patches. v2: updated after new nir general XFB support introduced for "anv: Add support for VK_EXT_transform_feedback" v3: compute num_varyings beforehand for allocating, instead of relying on num_outputs as approximate value (Timothy Arceri) Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-08 15:00:50 +01:00
Alejandro Piñeiro	b62a8149ab	nir/xfb: add component_offset at nir_xfb_info Where component_offset here is the offset when accessing components of a packed variable. Or in other words, location_frac on nir.h. Different places of mesa use different names for it. Technically nir_xfb_info consumer can get the same from the component_mask, it seems somewhat forced to make it to compute it, instead of providing it. v2: rename local location_frac for comp_offset, more similar to the intended use (Timothy Arceri) Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-08 15:00:50 +01:00
Jason Ekstrand	1664de5924	nir/builder: Add a build_deref_array_imm helper Unlike most of the cases in which we do this by hand, the new helper properly handles non-32-bit pointers. Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-07 21:20:30 +00:00
Jason Ekstrand	fcf2a0122e	nir/builder: Cast array indices in build_deref_follower There's no guarantee when build_deref_follower is called that the two derefs have the same bit size destination. Insert a cast on the array index in case we have differing bit sizes. While we're here, insert some asserts in build_deref_array and build_deref_ptr_as_array. The validator will catch violations here but they're easier to debug if we catch them while building. Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-07 21:20:30 +00:00
Jason Ekstrand	cd4c1458ba	nir/builder: Emit better code for iadd/imul_imm Because we already know the immediate right-hand parameter, we can potentially save the optimizer a bit of work. Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-07 21:20:30 +00:00
Tapani Pälli	8b010f3557	nir: free dead_ctx in case of no progress Fixes a leak: ==7576== 320 (48 direct, 272 indirect) bytes in 1 blocks are definitely lost in loss record 26 of 26 ==7576== at 0x4C2EE3B: malloc (vg_replace_malloc.c:309) ==7576== by 0x53EF0E4: ralloc_size (ralloc.c:119) ==7576== by 0x53EF0C2: ralloc_context (ralloc.c:113) ==7576== by 0x5471F64: nir_split_per_member_structs (nir_split_per_member_structs.c:176) ==7576== by 0x51288CF: anv_shader_compile_to_nir (anv_pipeline.c:216) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-07 07:40:19 +02:00
Jason Ekstrand	e02959f442	nir/lower_doubles: Inline functions directly in lower_doubles Instead of trusting the caller to already have created a softfp64 function shader and added all its functions to our shader, we simply take the softfp64 shader as an argument and do the function inlining ouselves. This means that there's no more nasty functions lying around that the caller needs to worry about cleaning up. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	f25ca337b4	nir/deref: Expose nir_opt_deref_impl Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	de8d80f9cc	nir/inline_functions: Break inlining into a builder helper This pulls the guts of function inlining into a builder helper so that it can be used elsewhere. The rest of the infrastructure is still needed for most inlining cases to ensure that everything gets inlined and only ever once. However, there are use-cases where you just want to inline one little thing. This new helper also has a neat trick where it can seamlessly inline a function from one nir_shader into another. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00

1 2 3 4 5 ...

1426 commits