fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 20:10:14 +01:00

Author	SHA1	Message	Date
Jason Ekstrand	80e8dfe9de	nir: Rename Boolean-related opcodes to include 32 in the name This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Ian Romanick	e639d39faf	i965/fs: Implement nir_op_uadd_sat Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-13 17:49:48 +00:00
Jason Ekstrand	cb98e0755f	intel/fs: Support min_lod parameters on texture instructions We have to lower some shadow instructions because they don't exist in hardware and we have to lower txb+offset+clamp because the message gets too big and we run into the sampler message length limit of 11 regs. Acked-by: Ian Romanick <ian.d.romanick@intel.com>	2018-12-11 21:26:23 -06:00
Jason Ekstrand	dca6cd9ce6	nir: Make boolean conversions sized just like the others Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:03:07 -06:00
Matt Turner	017199d2d2	mesa: Revert INTEL_fragment_shader_ordering support This extension is not properly tested (testing for GL_ARB_fragment_shader_interlock is not sufficient), and since this was noted in review on August 28th no tests have been sent. Revert "i965: Add INTEL_fragment_shader_ordering support." Revert "mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering" This reverts commit `03ecec9ed2`. This reverts commit `119435c877`. Cc: mesa-stable@lists.freedesktop.org Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Eric Anholt <eric@anholt.net>	2018-12-03 15:37:37 -08:00
Jason Ekstrand	dca35c598d	intel/fs,vec4: Fix a compiler warning ../src/intel/compiler/brw_fs_nir.cpp:3534:46: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘int’ [-Wsign-compare] assert(nir_intrinsic_write_mask(instr) == ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ (1 << instr->num_components) - 1); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This was caused by `6339aba775` which added these completely valid checks. However clang likes to complain about signedness mismatches. Fixes: `6339aba775` "intel/compiler: Lower SSBO and shared..." Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2018-11-19 09:57:41 -06:00
Jason Ekstrand	6339aba775	intel/compiler: Lower SSBO and shared loads/stores in NIR We have a bunch of code to do this in the back-end compiler but it's fairly specific to typed surface messages and the way we emit them. This breaks it out into NIR were it's easier to do things a bit more generally. It also means we can easily share the code between the vec4 and FS back-ends if we wish. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-15 19:59:49 -06:00
Jason Ekstrand	d28bc35ece	intel/fs: Add an assert to optimize_frontfacing_ternary Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-08 10:09:25 -06:00
Jason Ekstrand	344cfe6980	intel/fs: Use the new nir_src_is_const and friends As of this commit, all uses of const sources either go through a nir_src_as_<type> helper which handles bit sizes correctly or else are accompanied by a nir_src_bit_size() == 32 assertion to assert that we have the size we think we have. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-08 10:09:20 -06:00
Jason Ekstrand	6b2918709a	intel/fs,vec4: Clean up a repeated pattern with SSBOs Everywhere we handle SSBO intrinsics, we have exactly the same pattern for computing the index so we may as well make a helper for it. We also add a get_nir_src_imm to vec4 and use it for SSBO offsets. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-08 10:09:06 -06:00
Jason Ekstrand	d3a0d8b750	intel/compiler: Stop assuming the entrypoint is called "main" This isn't true for Vulkan so we have to whack it to "main" in anv which is silly. Instead of walking the list of functions and asserting that everything is named "main" and hoping there's only one function named "main", just use the nir_shader_get_entrypoint() helper which has better assertions anyway. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-10-30 20:14:52 -05:00
Jason Ekstrand	497675c21e	intel/fs: Fix nir_op_b2[fi] with 64-bit result on Gen8 LP and Gen9 LP Several of the Atom GPUs have additional restrictions on alignment when moving < 64-bit source to a 64-bit destination. All of the nir_op_264 code generation paths respected this, but nir_op_b2[fi] did not. Previous to commit `a68dd47b91` it was not possible to generate such an instruction from the GLSL path. It may have been possible from SPIR-V, but it's not clear. The aforementioned patch converts a 64-bit nir_op_fsign into a sequence of operations including a nir_op_b2f with a 64-bit result. This "just works" everywhere except these Atom parts. This problem was not detected during normal CI testing because the Atom parts are not included in developer builds. v2 (idr): Make the patch compile, and make some cosmetic changes. Add a commit message. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108319 Fixes: `a68dd47b91` "nir/algebraic: Simplify fsat of fsign" Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-11 15:21:19 -05:00
Ian Romanick	b44c9292b7	intel/compiler: Don't handle fsign.sat No shader-db or CI changes on any Intel platform. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-10-09 13:56:42 -07:00
Dylan Baker	8396043f30	Replace uses of _mesa_bitcount with util_bitcount and _mesa_bitcount_64 with util_bitcount_64. This fixes a build problem in nir for platforms that don't have popcount or popcountll, such as 32bit msvc. v2: - Fix additional uses of _mesa_bitcount added after this was originally written Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1) Acked-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-09-07 10:21:26 -07:00
Eric Engestrom	07ff56791d	intel/compiler: remove unused get_image_base_type() Unused since `09f1de97a7` "anv,i965: Lower away image derefs in the driver". Cc: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Eric Engestrom <eric@engestrom.ch> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-06 15:22:24 +01:00
Alejandro Piñeiro	4e1f8d82c2	i965/fs: include multisamplers on image_intrinsic_coord_components This is the second patch needed to fix the following piglit tests: tests/spec/arb_gl_spirv/linker/uniform/multisampler.shader_test tests/spec/arb_gl_spirv/linker/uniform/multisampler-array.shader_test Although in this case it doesn't affect so many borrowed tests, as there aren't too many tests using multisamplers on Intel. It is worth to note that this patch is also needed when those tests are run on GLSL mode (using the --glsl option). Although most Intel drivers would not be able to run/execute tests using multisamplers, as GL_MAX_IMAGE_SAMPLES is zero, technically those tests are expected to link correctly, so linking tests should pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-05 17:02:28 +02:00
Jason Ekstrand	3cbc02e469	intel: Use TXS for image_size when we have a typed surface Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:03 -05:00
Jason Ekstrand	09f1de97a7	anv,i965: Lower away image derefs in the driver Previously, the back-end compiler turn image access into magic uniform reads and there was a complex contract between back-end compiler and driver about setting up and filling out those params. As of this commit, both drivers now lower image_deref_load_param_intel intrinsics to load_uniform intrinsics controlled by the driver and lower the other image_deref_* intrinsics to image_* intrinsics which take an actual binding table index. There are still "magic" uniforms but they are now added and controlled entirely by the driver and that contract no longer spans components. This also has the side-effect of making most image use compile-time binding table indices. Previously, all image access pulled the binding table index from a uniform. Part of the reason for this was that the magic uniforms made it difficult to decouple binding table indices from the uniforms and, since they are indexed completely differently (especially in Vulkan), it was hard to pull them apart. Now that the driver is handling both, it's trivial to decouple the two and provide actual binding table indices. Shader-db results on Kaby Lake: total instructions in shared programs: 15166872 -> 15164293 (-0.02%) instructions in affected programs: 115834 -> 113255 (-2.23%) helped: 191 HURT: 0 total cycles in shared programs: 571311495 -> 571196465 (-0.02%) cycles in affected programs: 4757115 -> 4642085 (-2.42%) helped: 73 HURT: 67 total spills in shared programs: 10951 -> 10926 (-0.23%) spills in affected programs: 742 -> 717 (-3.37%) helped: 7 HURT: 0 total fills in shared programs: 22226 -> 22201 (-0.11%) fills in affected programs: 1146 -> 1121 (-2.18%) helped: 7 HURT: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:03 -05:00
Jason Ekstrand	37f7983bcc	intel/compiler: Do image load/store lowering to NIR This commit moves our storage image format conversion codegen into NIR instead of doing it in the back-end. This has the advantage of letting us run it through NIR's optimizer which is pretty effective at shrinking things down. In the common case of rgba8, the number of instructions emitted after NIR is done with it is half of what it was with the lowering happening in the back-end. On the downside, the back-end's lowering is able to directly use predicates and the NIR lowering has to use IFs. Shader-db results on Kaby Lake: total instructions in shared programs: 15166910 -> 15166872 (<.01%) instructions in affected programs: 5895 -> 5857 (-0.64%) helped: 15 HURT: 0 Clearly, we don't have that much image_load_store happening in the shaders in shader-db.... Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Ian Romanick	c856403868	i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for imageAtomicAdd of +1 or -1 v2: Refactor selection of atomic opcode to a separate function. Suggested by Jason. No changes on any other Intel platforms. Skylake total instructions in shared programs: 14304261 -> 14304241 (<.01%) instructions in affected programs: 1625 -> 1605 (-1.23%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5 helped stats (rel) min: 1.01% max: 14.29% x̄: 5.86% x̃: 4.07% 95% mean confidence interval for instructions value: -10.66 0.66 95% mean confidence interval for instructions %-change: -15.91% 4.19% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 527531226 -> 527531194 (<.01%) cycles in affected programs: 92204 -> 92172 (-0.03%) helped: 2 HURT: 0 Haswell and Broadwell had similar results. (Broadwell shown) total instructions in shared programs: 14615730 -> 14615710 (<.01%) instructions in affected programs: 1838 -> 1818 (-1.09%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5 helped stats (rel) min: 0.89% max: 13.04% x̄: 5.37% x̃: 3.78% 95% mean confidence interval for instructions value: -10.66 0.66 95% mean confidence interval for instructions %-change: -14.59% 3.85% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-28 15:35:46 -07:00
Ian Romanick	b6e247cf0e	i965/fs: Refactor image atomics to be a bit more like other atomics This greatly simplifies the next patch. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-28 15:35:46 -07:00
Ian Romanick	fabe3ead57	i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1 Funny story... a single shader was hurt for instructions, spills, fills. That same shader was also the most helped for cycles. #GPUsAreWeird No changes on any other Intel platform. v2: Refactor selection of atomic opcode to a separate function. Suggested by Jason. Haswell, Broadwell, and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14304116 -> 14304261 (<.01%) instructions in affected programs: 12776 -> 12921 (1.13%) helped: 19 HURT: 1 helped stats (abs) min: 1 max: 16 x̄: 2.32 x̃: 1 helped stats (rel) min: 0.05% max: 7.27% x̄: 0.92% x̃: 0.55% HURT stats (abs) min: 189 max: 189 x̄: 189.00 x̃: 189 HURT stats (rel) min: 4.87% max: 4.87% x̄: 4.87% x̃: 4.87% 95% mean confidence interval for instructions value: -12.83 27.33 95% mean confidence interval for instructions %-change: -1.57% 0.31% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 527552861 -> 527531226 (<.01%) cycles in affected programs: 1459195 -> 1437560 (-1.48%) helped: 16 HURT: 2 helped stats (abs) min: 2 max: 21328 x̄: 1353.69 x̃: 6 helped stats (rel) min: 0.01% max: 5.29% x̄: 0.36% x̃: 0.03% HURT stats (abs) min: 12 max: 12 x̄: 12.00 x̃: 12 HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03% 95% mean confidence interval for cycles value: -3699.81 1295.92 95% mean confidence interval for cycles %-change: -0.94% 0.30% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 8025 -> 8033 (0.10%) spills in affected programs: 208 -> 216 (3.85%) helped: 1 HURT: 1 total fills in shared programs: 10989 -> 11040 (0.46%) fills in affected programs: 444 -> 495 (11.49%) helped: 1 HURT: 1 Ivy Bridge total instructions in shared programs: 11709181 -> 11709153 (<.01%) instructions in affected programs: 3505 -> 3477 (-0.80%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 23 x̄: 9.33 x̃: 4 helped stats (rel) min: 0.11% max: 1.16% x̄: 0.63% x̃: 0.61% total cycles in shared programs: 254741126 -> 254738801 (<.01%) cycles in affected programs: 919067 -> 916742 (-0.25%) helped: 3 HURT: 0 helped stats (abs) min: 21 max: 2144 x̄: 775.00 x̃: 160 helped stats (rel) min: 0.03% max: 0.90% x̄: 0.32% x̃: 0.03% total spills in shared programs: 4536 -> 4533 (-0.07%) spills in affected programs: 40 -> 37 (-7.50%) helped: 1 HURT: 0 total fills in shared programs: 4819 -> 4813 (-0.12%) fills in affected programs: 94 -> 88 (-6.38%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v1] Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-28 15:35:38 -07:00
Kevin Rogovin	03ecec9ed2	i965: Add INTEL_fragment_shader_ordering support. Adds suppport for INTEL_fragment_shader_ordering. We achieve the fragment ordering by using the same instruction as for beginInvocationInterlockARB() which is by issuing a memory fence via sendc. Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com> Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>	2018-08-28 17:15:10 +03:00
Ian Romanick	d515c75463	intel/compiler: Implement untyped atomic float min, max, and compare-swap dataport messages v2: Split changes to the message type field to another patch. Suggested by Caio. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Ian Romanick	d628642a34	intel/compiler: Silence unused parameter warnings src/intel/compiler/brw_disasm_info.c: In function ‘nir_print_instr’: src/intel/compiler/brw_disasm_info.c:30:61: warning: unused parameter ‘instr’ [-Wunused-parameter] __attribute__((weak)) void nir_print_instr(const nir_instr instr, FILE fp) {} ^~~~~ src/intel/compiler/brw_disasm_info.c:30:74: warning: unused parameter ‘fp’ [-Wunused-parameter] __attribute__((weak)) void nir_print_instr(const nir_instr instr, FILE fp) {} ^~ src/intel/compiler/brw_disasm.c: In function ‘src_ia1’: src/intel/compiler/brw_disasm.c:850:18: warning: unused parameter ‘_reg_file’ [-Wunused-parameter] unsigned _reg_file, ^~~~~~~~~ src/intel/compiler/brw_fs_surface_builder.cpp: In function ‘void brw::surface_access::emit_byte_scattered_write(const brw::fs_builder&, const fs_reg&, const fs_reg&, const fs_reg&, unsigned int, unsigned int, unsigned int, brw_predicate)’: src/intel/compiler/brw_fs_surface_builder.cpp:193:57: warning: unused parameter ‘size’ [-Wunused-parameter] unsigned dims, unsigned size, ^~~~ v2: Update commit message. brw_fs_generator.cpp warnings were already fixed by another patch. Noticed by Caio. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Iago Toral Quiroga	471bce5689	intel/compiler: implement 8-bit constant load Fixes VK-GL-CTS CL#2567 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-01 08:08:15 +02:00
Iago Toral Quiroga	7e6c8b0cb7	intel/compiler: add setup_imm_(u)b helpers The hardware doesn't support byte immediates, so similar to setup_imm_df() for doubles, these helpers work by loading the constant value into a VGRF. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-01 08:08:15 +02:00
Kenneth Graunke	8794fe3e30	intel/compiler: Delete dead VS intrinsic handling. These are lowered by brw_nir_lower_vs_inputs(). If they weren't, we would have already hit the unreachable() in emit_system_values_block(). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-07-26 11:45:34 -07:00
Karol Herbst	7f95564a22	nir: rename f2f16_undef to f2f16 we need rounding modes on other conversions involving floats and it is easier to rename f2f16_undef than renaming all the other ones. v2: rebased on master Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Karol Herbst <kherbst@redhat.com>	2018-07-24 20:40:05 +02:00
Iago Toral Quiroga	213491600a	intel/compiler: emit actual barriers for working-group level barriers Until now we have assumed that we could skip emitting these barriers in the general case based on empirical testing and a few assumptions detailed in a comment in the driver code, however, recent CTS tests have showed that we actually need them to produce correct behavior. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-07-10 07:46:34 +02:00
Jose Maria Casanova Crespo	cd0afab99b	i965/fs: Enable store_ssbo for 8-bit types. v2: Update comment according to this patch. (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-07-10 00:14:50 +02:00
Jose Maria Casanova Crespo	87fc9af3fc	i965/fs: Enable conversions to 8-bit integers Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-07-10 00:14:49 +02:00
Jose Maria Casanova Crespo	030472c1f0	i965: Support for 8-bit base types in helper functions Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-07-10 00:14:49 +02:00
Ian Romanick	88bd37c010	i965/fs: Properly handle sign(-abs(x)) Fixes new piglit tests: - glsl-1.10/execution/fs-sign-neg-abs.shader_test - glsl-1.10/execution/fs-sign-sat-neg-abs.shader_test - glsl-1.10/execution/vs-sign-neg-abs.shader_test - glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-07-06 16:20:04 -07:00
Neil Roberts	2d5ddbe960	i965: Fix output register sizes when variable ranges are interleaved In `6f5abf3146` this code was fixed to calculate the maximum size of an attribute in a seperate pass and then allocate the registers to that size. However this wasn’t taking into account ranges that overlap but don’t have the same starting location. For example: layout(location = 0, component = 0) out float a[4]; layout(location = 2, component = 1) out float b[4]; Previously, if ‘a’ was processed first then it would allocate a register of size 4 for location 0 and it wouldn’t allocate another register for location 2 because it would already be covered by the range of 0. Then if something tries to write to b[2] it would try to write past the end of the register allocated for ‘a’ and it would hit an assert. This patch changes it to scan for any overlapping ranges that start within each range to calculate the maximum extent and allocate that instead. Fixed Piglit’s arb_enhanced_layouts/execution/component-layout/ vs-fs-array-interleave-range.shader_test Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Fixes: `6f5abf3146` "i965: Fix output register sizes when multiple variables share a slot."	2018-07-04 10:57:51 +02:00
Francisco Jerez	bcbc7d3a17	intel/fs: Fix nir_intrinsic_load_helper_invocation for SIMD32. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	73d60455e9	intel/fs: Rework INTERPOLATE_AT_PER_SLOT_OFFSET This reworks INTERPOLATE_AT_PER_SLOT_OFFSET to work more like an ALU operation and less like a send. This is less code over-all and, as a side-effect, it now properly handles execution groups and lowering so SIMD32 support just falls out. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	1650442026	intel/fs: Disable SIMD32 dispatch for fragment shaders with discard. Current discard handling requires dedicating the second flag register to discard. However, control-flow in SIMD32 requires both flag registers so it's incompatible with the current discard handling. Just don't support SIMD32+discard for now. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	1811cbdc25	intel/fs: Disable SIMD32 dispatch on Gen4-6 with control flow The hardware's control flow logic is 16-wide so we're out of luck here. We could, in theory, support SIMD32 if we know the control-flow is uniform but we don't have that information at this point. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	71cd9ebed9	intel/fs: Use image_deref intrinsics instead of image_var Since we had to rewrite the deref walking loop anyway, I took the opportunity to make it a bit clearer and more efficient. In particular, in the AoA case, we will now emit one minmax instead of one per array level. Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-06-22 20:54:00 -07:00
Jose Maria Casanova Crespo	b8e099e7d5	intel/fs: shuffle_64bit_data_for_32bit_write is not used anymore Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	a4965842d6	intel/fs: Use new shuffle_32bit_write for all 64-bit storage writes Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	a4d445b93c	intel/fs: shuffle_32bit_load_result_to_64bit_data is not used anymore Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	71b319a285	intel/fs: Use shuffle_from_32bit_read for 64-bit FS load_input As the previous use of shuffle_32bit_load_result_to_64bit_data had a source/destination overlap for 64-bit. Now a temporary destination is used for 64-bit cases to use shuffle_from_32bit_read that doesn't handle src/dst overlaps. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	8003ae87f4	intel/fs: shuffle_from_32bit_read at load_per_vertex_input at TCS/TES Previously, the shuffle function had a source/destination overlap that needs to be avoided to use shuffle_from_32bit_read. As we can use for the shuffle destination the destination of removed MOVs. This change also avoids the internal MOVs done by the previous shuffle to deal with possible overlaps. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	5565630f85	intel/fs: Use shuffle_from_32bit_read at VS load_input shuffle_from_32bit_read manages 32-bit reads to 32-bit destination in the same way that the previous loop so now we just call the new function for all bitsizes, simplifying also the 64-bit load_input. v2: Add comment about future 16-bit support (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	152bffb69b	intel/fs: Use shuffle_from_32bit_read for 64-bit gs_input_load This implementation avoids two unneeded MOVs for each 64-bit component. One was done in the old shuffle, to avoid cases of src/dst overlap but this is not the case. And the removed MOV was already being being done in the shuffle. Copy propagation wasn't able to remove them because shuffle destination values are defined with partial writes because they have stride == 2. v2: Reword commit log summary (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	8b26a2d96d	intel/fs: shuffle_from_32bit_read for 64-bit do_untyped_vector_read do_untyped_vector_read is used at load_ssbo and load_shared. The previous MOVs are removed because shuffle_from_32bit_read can handle storing the shuffle results in the expected destination just using the proper offset. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	c2297bdf19	intel/fs: Remove old 16-bit shuffle/unshuffle functions Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	fd3d8a8f79	intel/fs: Use shuffle_for_32bit_write for 16-bits store_ssbo Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00

... 4 5 6 7 8

394 commits