fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-22 04:28:10 +02:00

Author	SHA1	Message	Date
Jason Ekstrand	82d9a37a59	glsl/nir: Add a shared helper for building float64 shaders Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	9314084237	nir: Teach loop unrolling about 64-bit instruction lowering The lowering we do for 64-bit instructions can cause a single NIR ALU instruction to blow up into hundreds or thousands of instructions potentially with control flow. If loop unrolling isn't aware of this, it can unroll a loop 20 times which contains a nir_op_fsqrt which we then lower to a full software implementation based on integer math. Those 20 invocations suddenly get a lot more expensive than NIR loop unrolling currently expects. By giving it an approximate estimate function, we can prevent loop unrolling from going to town when it shouldn't. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	ebb3695376	nir: Expose double and int64 op_to_options_mask helpers We already have one internally for int64 but we don't have a similar one for doubles so we'll have to make one. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Iago Toral Quiroga	ca2b5e9069	compiler/nir: add an is_conversion field to nir_op_info This is set to True only for numeric conversion opcodes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Timothy Arceri	54522d0506	nir: rename glsl_type_is_struct() -> glsl_type_is_struct_or_ifc() Replace done using: find ./src -type f -exec sed -i -- \ 's/glsl_type_is_struct(/glsl_type_is_struct_or_ifc(/g' {} \; Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 13:10:02 +11:00
Timothy Arceri	e16a27fcf8	glsl: rename record_types -> struct_types Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 13:10:02 +11:00
Timothy Arceri	8294295dbd	glsl: rename record_location_offset() -> struct_location_offset() Replace done using: find ./src -type f -exec sed -i -- \ 's/record_location_offset(/struct_location_offset(/g' {} \; Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 13:10:02 +11:00
Timothy Arceri	88d8c4e290	glsl: rename get_record_instance() -> get_struct_instance() Replace done using: find ./src -type f -exec sed -i -- \ 's/get_record_instance(/get_struct_instance(/g' {} \; Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 13:10:02 +11:00
Timothy Arceri	81ee2cd8ba	glsl: rename is_record() -> is_struct() Replace was done using: find ./src -type f -exec sed -i -- \ 's/is_record(/is_struct(/g' {} \; Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 13:10:02 +11:00
Karol Herbst	272e927d0e	nir/spirv: initial handling of OpenCL.std extension opcodes Not complete, mostly just adding things as I encounter them in CTS. But not getting far enough yet to hit most of the OpenCL.std instructions. Anyway, this is better than nothing and covers the most common builtins. v2: add hadd proof from Jason move some of the lowering into opt_algebraic and create new nir opcodes simplify nextafter lowering fix normalize lowering for inf rework upsample to use nir_pack_bits add missing files to build systems v3: split lines of iadd/sub_sat expressions Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-05 22:28:29 +01:00
Karol Herbst	d0b47ec4df	nir/vtn: add support for SpvBuiltInGlobalLinearId v2: use formula with fewer operations Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-05 22:28:29 +01:00
Karol Herbst	f48c672965	nir: add support for address bit sized system values v2: add assert in else clause make local group intrinsics 32 bit wide v3: always use 32 bit constant for local_size v4: add comment by Jason Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-05 22:28:29 +01:00
Karol Herbst	5f8257fb0b	nir/spirv: improve parsing of the memory model v2: add some vtn_fail_ifs Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-05 22:28:29 +01:00
Karol Herbst	5d48359a2c	nir: replace magic numbers with M_PI we define it inside 'include/c99_math.h' so it is safe to use. Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-05 22:28:29 +01:00
Timur Kristóf	6684e039eb	nir: Add multiplier argument to nir_lower_uniforms_to_ubo. Note that locations can be set in different units, and the multiplier argument caters to supporting these different units. For example, st_glsl_to_nir uses dwords (4 bytes) so the multiplier should be 4, while tgsi_to_nir uses bytes, so the multiplier should be 16. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-05 19:13:27 +00:00
Timur Kristóf	909d1f50f3	nir: Move nir_lower_uniforms_to_ubo to compiler/nir. The nir_lower_uniforms_to_ubo function is useful outside of mesa/state_tracker, and in fact is needed to produce NIR for drivers that have the PIPE_CAP_PACKED_UNIFORMS capability. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-05 19:13:27 +00:00
Timur Kristóf	317f10bf40	nir: Add ability for shaders to use window space coordinates. This patch adds a shader_info field that tells the driver to use window space coordinates for a given vertex shader. It also enables this feature in radeonsi (the only NIR-capable driver that supported it in TGSI), and makes tgsi_to_nir aware of it. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-05 19:13:27 +00:00
Eric Anholt	2780a99ff8	v3d: Move the stores for fixed function VS output reads into NIR. This lets us emit the VPM_WRITEs directly from nir_intrinsic_store_output() (useful once NIR scheduling is in place so that we can reduce register pressure), and lets future NIR scheduling schedule the math to generate them. Even in the meantime, it looks like this lets NIR DCE some more code and make better decisions. total instructions in shared programs: 6429246 -> 6412976 (-0.25%) total threads in shared programs: 153924 -> 153934 (<.01%) total loops in shared programs: 486 -> 483 (-0.62%) total uniforms in shared programs: 2385436 -> 2388195 (0.12%) Acked-by: Ian Romanick <ian.d.romanick@intel.com> (nir)	2019-03-05 10:59:40 -08:00
Eric Anholt	a4f612b4cf	nir: Improve printing of load_input/store_output variable names. We were printing only when the channel was exactly the start channel, so scalarized loads/stores would be missing the name on the rest. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-05 10:59:40 -08:00
Jason Ekstrand	61e009d2c4	spirv: Use the same types for resource indices as pointers We need more space than just a 32-bit scalar and we have to burn all that space anyway so we may as well expose it to the driver. This also fixes a subtle bug when UBOs and SSBOs have different pointer types. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	9f7ee4f8e5	spirv: Use the generic dereference function for OpArrayLength With the new deref changes, the old pointer_offset version may not be the right one to call. Just call the generic one and let it sort it out. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	f1dbc7e97d	spirv: Pull offset/stride from the pointer for OpArrayLength We can't pull it from the variable type because it might be an array of blocks and not just the one block. While we're here, throw in some error checking. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable@lists.freedesktop.org	2019-03-05 10:06:50 -06:00
Jason Ekstrand	5c96120b5c	intel,nir: Lower TXD with min_lod when the sampler index is not < 16 When we have a larger sampler index, we get into the "high sampler" scenario and need an instruction header. Even in SIMD8, this pushes the instruction over the sampler message size maximum of 11 registers. Instead, we have to lower TXD to TXL. Fixes: `cb98e0755f` "intel/fs: Support min_lod parameters on texture..." Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-04 23:56:39 +00:00
Jason Ekstrand	ca295ddbfb	spirv: OpImageQueryLod requires a sampler No idea how this fell through the cracks besides the fact that the sampler bound at 0 almost always works and the CTS isn't amazing. In any case, this appears to have been broken for almost forever. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable@lists.freedesktop.org	2019-03-04 23:56:39 +00:00
Sagar Ghuge	58bcebd987	spirv: Allow [i/u]mulExtended to use new nir opcode Use new nir opcode nir_[i/u]mul_2x32_64 and extract lower and higher 32 bits as needed instead of emitting mul and mul_high. v2: Surround the switch case with curly braces (Jason Ekstrand) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-04 15:50:25 -08:00
Sagar Ghuge	47ec9bdc60	nir/algebraic: Optimize low 32 bit extraction Optimize a situation where we only need lower 32 bits from 64 bit result. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-04 15:50:25 -08:00
Sagar Ghuge	1d8994a63b	glsl: [u/i]mulExtended optimization for GLSL Optimize mulExtended to use 32x32->64 multiplication. Drivers which are not based on NIR, they can set the MUL64_TO_MUL_AND_MUL_HIGH lowering flag in order to have same old behavior. v2: Add missing condition check (Jason Ekstrand) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Matt Turner <Matt Turner <mattst88@gmail.com> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-04 15:50:25 -08:00
Sagar Ghuge	e551040c60	nir/glsl: Add another way of doing lower_imul64 for gen8+ On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We can reduce our 64x64 int multiplication from 4 instructions to 3. Also instead of emitting two mul instructions, we can emit single mul instuction and extract low/high 32 bits from 64 bit result for [i/u]mulExtended v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand) 2) Add lower_mul_2x32_64 flag (Matt Turner) 3) Remove associative property as bit size is different (Connor Abbott) v3: Fix indentation and variable naming convention (Jason Ekstrand) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-04 15:50:25 -08:00
Ilia Mirkin	4eec3a2a36	glsl: fix recording of variables for XFB in TCS shaders This is purely for conformance, since it's not actually possible to do XFB on TCS output varyings. However we do have to make sure we record the names correctly, and this removes an extra level of array-ness from the names in question. Fixes KHR-GL45.tessellation_shader.single.xfb_captures_data_from_correct_stage v2: Add comment to the new program_resource_visitor::process function. (Ilia Mirkin) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108457 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: 19.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-04 01:55:00 +01:00
Jose Maria Casanova Crespo	bf1f49482d	glsl: TCS outputs can not be transform feedback candidates on GLES Avoids regression on: KHR-GLES*.core.tessellation_shader.single.xfb_captures_data_from_correct_stage that is uncovered by the following patch. "glsl: fix recording of variables for XFB in TCS shaders" v2: Rebased over glsl: fix recording of variables for XFB in TCS shaders v3: Move this patch before "glsl: fix recording of variables for XFB in TCS shaders" to avoid temporal regressions. (Illia Mirkin) Cc: 19.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-04 01:55:00 +01:00
Jose Maria Casanova Crespo	cc7173b438	glsl: fix typos in comments "transfor" -> "transform" Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-04 01:55:00 +01:00
Gert Wollny	3214f20914	mesa: Expose EXT_texture_query_lod and add support for its use shaders EXT_texture_query_lod provides the same functionality for GLES like the ARB extension with the same name for GL. v2: Set ES 3.0 as minimum GLES version as required by the extension Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-03 21:50:42 +01:00
Jordan Justen	7de056e1a9	scons: Generate float64_glsl.h for glsl_to_nir fp64 lowering Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-02 14:33:44 -08:00
Jordan Justen	31b35916dd	nir: Add int64/doubles options into nir_shader_compiler_options This will allow the options to be visible under nir_shader->options, which will allow the gallium state_tracker to use the driver preferred settings during glsl_to_nir. Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-02 14:33:41 -08:00
Ian Romanick	bae0c36751	nir/algebraic: Optimize away an fsat of a b2f The b2f can only produce 0.0 or 1.0, so the fsat does nothing. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-02 13:58:56 -08:00
Ian Romanick	ecc9ffa778	nir/algebraic: Replace a-fract(a) with floor(a) I noticed this while looking at a shader that was affected by Tim's "more loop unrolling" series. In review, Tim Arceri asked: > Why the hurt on Gen6+ is this something that should be in the late > optimisations pass? As far as I can tell, it's just because our scheduler is terrible. In all the fragment shaders that I looked at (some hurt shaders were from other stages), only one of the SIMD8 or SIMD16 version would be hurt. In many of those case, the other SIMD width is improved (e.g., shaders/closed/steam/brutal-legend/3990.shader_test). Often it looks like the scheduler decides to differently schedule a SEND the occurs somewhere early in the shader. Once that happens, everything is different. I looked at one vertex shader that was hurt (from Goat Simulator). In that case, both the floor and fract are used. The optimization eliminates the add, and it should allow better scheduling. In the area of the FRC and RNDD instructions, the scheduler does the right thing. However, later in the shader a MAD and and ADD get scheduled differently, and that makes it slightly worse. In light of this, I tried adding some "is_used_once" mark-up, and that did not fix all the cycles regressions. It also did a lot more harm than good on SKL (helped 82 vs. hurt 241). All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15437001 -> 15435259 (-0.01%) instructions in affected programs: 213651 -> 211909 (-0.82%) helped: 988 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 1.76 x̃: 1 helped stats (rel) min: 0.15% max: 11.54% x̄: 1.14% x̃: 0.59% 95% mean confidence interval for instructions value: -1.89 -1.63 95% mean confidence interval for instructions %-change: -1.23% -1.05% Instructions are helped. total cycles in shared programs: 383007378 -> 382997063 (<.01%) cycles in affected programs: 1650825 -> 1640510 (-0.62%) helped: 679 HURT: 302 helped stats (abs) min: 1 max: 348 x̄: 23.39 x̃: 14 helped stats (rel) min: 0.04% max: 28.77% x̄: 1.61% x̃: 0.98% HURT stats (abs) min: 1 max: 250 x̄: 18.43 x̃: 7 HURT stats (rel) min: 0.04% max: 25.86% x̄: 1.41% x̃: 0.53% 95% mean confidence interval for cycles value: -13.05 -7.98 95% mean confidence interval for cycles %-change: -0.86% -0.50% Cycles are helped. Iron Lake and GM45 had similar results. (GM45 shown) total instructions in shared programs: 5043616 -> 5043010 (-0.01%) instructions in affected programs: 119691 -> 119085 (-0.51%) helped: 432 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 1.40 x̃: 1 helped stats (rel) min: 0.10% max: 8.11% x̄: 0.66% x̃: 0.39% 95% mean confidence interval for instructions value: -1.58 -1.23 95% mean confidence interval for instructions %-change: -0.72% -0.59% Instructions are helped. total cycles in shared programs: 128139812 -> 128135762 (<.01%) cycles in affected programs: 3829724 -> 3825674 (-0.11%) helped: 602 HURT: 0 helped stats (abs) min: 2 max: 486 x̄: 6.73 x̃: 6 helped stats (rel) min: 0.02% max: 4.85% x̄: 0.19% x̃: 0.10% 95% mean confidence interval for cycles value: -8.40 -5.05 95% mean confidence interval for cycles %-change: -0.22% -0.16% Cycles are helped. Reviewed-by: Elie Tournier <tournier.elie@gmail.com>	2019-03-01 12:43:25 -08:00
Ian Romanick	d40640efe8	nir/algebraic: Replace a bcsel of a b2f sources with a b2f(!(a \|\| b)) I have not investigated the result of doing this during code generation. That should be possible, but it would be a bit more effort. All Gen6+ platforms had nearly identical results. (Skylake shown) total cycles in shared programs: 370961508 -> 370961367 (<.01%) cycles in affected programs: 5174 -> 5033 (-2.73%) helped: 2 HURT: 0 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8206587 -> 8206589 (<.01%) instructions in affected programs: 1325 -> 1327 (0.15%) helped: 0 HURT: 2 total cycles in shared programs: 187657422 -> 187657428 (<.01%) cycles in affected programs: 11566 -> 11572 (0.05%) helped: 0 HURT: 2 This change has almost no effect right now. However, removing this patch (but leaving the patch "intel/fs: Generate if instructions with inverted conditions") after adding a patch that removes !(a < b) -> (a >= b) optimizations (like https://patchwork.freedesktop.org/patch/264787/) has the following results on Skylake: Skylake total instructions in shared programs: 15071804 -> 15071806 (<.01%) instructions in affected programs: 640 -> 642 (0.31%) helped: 0 HURT: 2 total cycles in shared programs: 369914348 -> 369916569 (<.01%) cycles in affected programs: 27900 -> 30121 (7.96%) helped: 4 HURT: 15 helped stats (abs) min: 2 max: 112 x̄: 30.00 x̃: 3 helped stats (rel) min: 0.28% max: 12.28% x̄: 3.34% x̃: 0.40% HURT stats (abs) min: 2 max: 758 x̄: 156.07 x̃: 81 HURT stats (rel) min: 0.20% max: 74.30% x̄: 16.29% x̃: 16.91% 95% mean confidence interval for cycles value: 12.68 221.11 95% mean confidence interval for cycles %-change: 3.09% 21.23% Cycles are HURT. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	eae19f5f19	nir/algebraic: Replace i2b used by bcsel or if-statement with comparison All of the helped shaders are in Deus Ex. I looked at a couple shaders, and they have a pattern like: vec1 32 ssa_373 = i2b32 ssa_345.w vec1 32 ssa_374 = bcsel ssa_373, ssa_20, ssa_0 ... vec1 32 ssa_377 = ine ssa_345.w, ssa_0 if ssa_377 { ... vec1 32 ssa_416 = i2b32 ssa_385.w vec1 32 ssa_417 = bcsel ssa_416, ssa_386, ssa_374 ... } The massive help occurs because the i2b32 is removed, then other passes determine that ssa_374 must be ssa_20 inside the if-statement allowing the first bcsel to also be deleted. v2: Rebase on 1-bit Boolean changes. v3: Fix i2b32 vs ine problem in if-statement replacement. Noticed by Bas. Skylake total instructions in shared programs: 15241394 -> 15186287 (-0.36%) instructions in affected programs: 890583 -> 835476 (-6.19%) helped: 355 HURT: 0 helped stats (abs) min: 1 max: 497 x̄: 155.23 x̃: 149 helped stats (rel) min: 0.09% max: 16.49% x̄: 6.10% x̃: 6.59% 95% mean confidence interval for instructions value: -165.07 -145.39 95% mean confidence interval for instructions %-change: -6.42% -5.77% Instructions are helped. total cycles in shared programs: 373846583 -> 371023357 (-0.76%) cycles in affected programs: 118972102 -> 116148876 (-2.37%) helped: 343 HURT: 14 helped stats (abs) min: 45 max: 118284 x̄: 8332.32 x̃: 6089 helped stats (rel) min: 0.03% max: 38.19% x̄: 2.48% x̃: 1.77% HURT stats (abs) min: 120 max: 4126 x̄: 2482.79 x̃: 3019 HURT stats (rel) min: 0.16% max: 17.37% x̄: 2.13% x̃: 1.11% 95% mean confidence interval for cycles value: -8723.28 -7093.12 95% mean confidence interval for cycles %-change: -2.57% -2.02% Cycles are helped. total spills in shared programs: 32401 -> 23465 (-27.58%) spills in affected programs: 24457 -> 15521 (-36.54%) helped: 343 HURT: 0 total fills in shared programs: 37866 -> 31765 (-16.11%) fills in affected programs: 18889 -> 12788 (-32.30%) helped: 343 HURT: 0 Broadwell and Haswell had similar results. (Haswell shown) Haswell total instructions in shared programs: 13764783 -> 13750679 (-0.10%) instructions in affected programs: 1176256 -> 1162152 (-1.20%) helped: 334 HURT: 21 helped stats (abs) min: 1 max: 358 x̄: 42.59 x̃: 47 helped stats (rel) min: 0.09% max: 11.81% x̄: 1.30% x̃: 1.37% HURT stats (abs) min: 1 max: 61 x̄: 5.76 x̃: 1 HURT stats (rel) min: 0.03% max: 1.84% x̄: 0.17% x̃: 0.03% 95% mean confidence interval for instructions value: -43.99 -35.47 95% mean confidence interval for instructions %-change: -1.35% -1.08% Instructions are helped. total cycles in shared programs: 386511910 -> 385402528 (-0.29%) cycles in affected programs: 143831110 -> 142721728 (-0.77%) helped: 327 HURT: 39 helped stats (abs) min: 16 max: 25219 x̄: 3519.74 x̃: 3570 helped stats (rel) min: <.01% max: 10.26% x̄: 0.95% x̃: 0.96% HURT stats (abs) min: 16 max: 4881 x̄: 1065.95 x̃: 997 HURT stats (rel) min: <.01% max: 16.67% x̄: 0.70% x̃: 0.24% 95% mean confidence interval for cycles value: -3375.59 -2686.60 95% mean confidence interval for cycles %-change: -0.92% -0.64% Cycles are helped. total spills in shared programs: 100480 -> 97846 (-2.62%) spills in affected programs: 84702 -> 82068 (-3.11%) helped: 316 HURT: 21 total fills in shared programs: 96877 -> 94369 (-2.59%) fills in affected programs: 69167 -> 66659 (-3.63%) helped: 316 HURT: 9 No changes on Ivy Bridge or earlier platforms. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Caio Marcelo de Oliveira Filho	1458aa1f78	nir/copy_prop_vars: handle indirect vector elements Differently than the direct case, the indirect array derefs of vector are handled like regular derefs, with the exception that we ignore any vector entry that has SSA values when performing a load. Such SSA values don't help loading of the indirect unless we emit an if-ladder. Copy_derefs are supported for indirects. Also enable two tests that now pass. v2: Remove unnecessary temporaries. Be clearer when identifying the case where copy_entry doesn't help when we are dealing with an indirect array_deref (of a vector). (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho	6c0de78cc2	nir/copy_prop_vars: prefer using entries from equal derefs When looking up an entry to use, always prefer an equal match, as it more likely to contain reusable SSA or derefs to propagate. This will be necessary when adding entries with array derefs of vectors, because we don't want the vector if the equal entry (an array deref of that vector) is present. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho	61965afd00	nir/copy_prop_vars: add tests for indirect array deref Both on an actual array and on a vector, and an extra test on a vector mixing direct and indirect access. The vector tests are disabled and will be enabled by a later commit. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho	96c32d7776	nir/copy_prop_vars: handle load/store of vector elements When direct array deref is used on a vector type (for loads and stores), copy_prop_vars is now smart to propagate values it knows about. Given a 'vec4 v', storing to v[3] will update the copy entry for v and it is equivalent to a write to v.w. Loading from v[1] will try first to see if there's a known value for v.y -- and drop the load in that case. The copy entries still always refer to the entire vectors, so the operations happen on the parent deref (the 'vector') and the values are fixed accordingly. It might be the case now that certain entries have not only different SSA defs in each element but also those come from different components than they are set to, because stores to individual elements always come from a SSA definition with a single component. Tests related to these cases are now enabled. v2: Instead of asserting on invalid indices, "load" an undef and remove the store. (Jason) v3: Merge code path for the cases of is_array_deref_of_vector into the regular code path. Add a base_index parameter to value_set_from_value. (code changes by Jason) v4: Removed the get_entry_for_deref helper, now being used only once. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:50:05 -08:00
Caio Marcelo de Oliveira Filho	33dafdc024	nir/copy_prop_vars: use NIR_MAX_VEC_COMPONENTS Also replace uses of 0xf with the appropriate full mask created from the number of components. Note that an increase of MAX might make us change how the data is stored later on, but for now at least we make sure the pass is not hardcoded. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:50:05 -08:00
Caio Marcelo de Oliveira Filho	e84c841fb0	nir/copy_prop_vars: rename/refactor store_to_entry helper The name reflected this function role back when the pass also did dead write elimination. So rename it to what it does now, which is setting a value using another value; and narrow the argument list. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:50:05 -08:00
Juan A. Suarez Romero	b43b55d461	nir/spirv: return after emitting a branch in block When emitting a branch in a block, it does not make sense to continue processing further instructions, as they will not be reachable. This fixes a nasty case with a loop with a branch that both then-part and else-part exits the loop: %1 = OpLabel OpLoopMerge %2 %3 None OpBranchConditional %false %2 %2 %3 = OpLabel OpBranch %1 %2 = OpLabel [...] We know that block %1 will branch always to block %2, which is the merge block for the loop. And thus a break is emitted. If we keep continuing processing further instructions, we will be processing the branch conditional and thus emitting the proper NIR conditional, which leads to instructions after the break. This fixes dEQP-VK.graphicsfuzz.continue-and-merge. CC: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 09:47:06 +01:00
Timothy Arceri	7536af670b	glsl: fix shader cache for packed param list Some types of params such as some builtins are always padded. We need to keep track of this so we can restore the list correctly. Here we also remove a couple of cache entries that are not actually required as they get rebuilt by the _mesa_add_parameter() calls. This patch fixes a bunch of arb_texture_multisample and arb_sample_shading piglit tests for the radeonsi NIR backend. Fixes: `edded12376` ("mesa: rework ParameterList to allow packing") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-02-28 11:47:37 +11:00
Gert Wollny	b7201a468d	nir: Add posibility to not lower to source mod 'abs' for ops with three sources This is useful for r600 since there the abs source modifier is not supported for ops with three sources v2: Use correct logic to enable lowering to abs source mod (Eric Anhold) Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-27 11:04:06 +00:00
Kasireddy, Vivek	78fb3fd17e	nir/lower_tex: Add support for XYUV lowering The memory layout associated with this format would be: Byte: 0 1 2 3 Component: V U Y X Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-26 13:08:51 +00:00
Tapani Pälli	22267feff1	nir: initialize value in copy_prop_vars_block Fixes following valgrind warning: ==27561== Conditional jump or move depends on uninitialised value(s) ==27561== at 0x667856B: value_set_ssa_components (nir_opt_copy_prop_vars.c:78) ==27561== by 0x667A1C4: copy_prop_vars_block (nir_opt_copy_prop_vars.c:797) Fixes: `62332d139c` "nir: Add a local variable-based copy propagation pass" Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-26 08:56:25 +02:00
Eric Anholt	7c1bf075f3	nir: Just return when asked to rewrite uses of an SSA def to itself. The nir_builder swizzling improvement to not emit extra MOVs resulted in nir_lower_tex() trying to rewrite an SSA def to itself, triggering the assert on all texturing in v3d. There's no work to be done in this case, so just stop asserting. Fixes: `743700be1f` ("nir/builder: Don't emit no-op swizzles") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-25 21:25:24 +00:00

1 2 3 4 5 ...

3393 commits