fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 07:00:12 +01:00

Author	SHA1	Message	Date
Francisco Jerez	2d7d652d5c	intel/fs: Disable opt_sampler_eot() in 32-wide dispatch. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	db6ca13efc	intel/fs: Emit LINE+MAC for LINTERP with unaligned coordinates On g4x through Sandy Bridge, src1 (the coordinates) of the PLN instruction is required to be an even register number. When it's odd (which can happen with SIMD32), we have to emit a LINE+MAC combination instead. Unfortunately, we can't just fall through to the gen4 case because the input registers are still set up for PLN which lays out the four src1 registers differently in SIMD16 than LINE. v2 (Jason Ekstrand): - Take advantage of both accumulators and emit LINE LINE MAC MAC (Based on a patch from Francisco Jerez) - Unify the gen4 and gen4x-6 cases using a loop v3 (Jason Ekstrand): - Don't unify gen4 with gen4x-6 as this turns out to be more fragile than first thought without reworking the gen4 barycentric coordinate layout. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	566e6abd6d	intel/fs: Mark LINTERP opcode as writing accumulator on platforms without PLN When we don't have PLN (gen4 and gen11+), we implement LINTERP as either LINE+MAC or a pair of MADs. In both cases, the accumulator is written by the first of the two instructions and read by the second. Even though the accumulator value isn't actually ever used from a logical instruction perspective, it is trashed so we need to make the scheduler aware. Otherwise, the scheduler could end up re-ordering instructions and putting a LINTERP between another an instruction which writes the accumulator and another which tries to use that result. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	73d60455e9	intel/fs: Rework INTERPOLATE_AT_PER_SLOT_OFFSET This reworks INTERPOLATE_AT_PER_SLOT_OFFSET to work more like an ALU operation and less like a send. This is less code over-all and, as a side-effect, it now properly handles execution groups and lowering so SIMD32 support just falls out. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	74b477039d	intel/fs: Add the group to the flag subreg number on SNB and older We want consistent behavior in the meaning of the flag_subreg field between SNB and IVB+. v2 (Jason Ekstrand): - Add some extra commentary Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	2aefa5e19f	intel/fs: Fix FB read header setup for SIMD32. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	e06f5b30cc	intel/fs: Fix logical FB write lowering for SIMD32 Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	ce370902d4	intel/fs: Fix FB write message control codegen for SIMD32. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	8b788069fb	intel/fs: Don't enable dual source blend if no outputs are written This prevents a crash in some arb_enhanced_layouts tests that would be caused by the next commit. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	48241c780a	intel/fs: Fix codegen of FS_OPCODE_SET_SAMPLE_ID for SIMD32. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	789d20df36	intel/eu: Fix pixel interpolator queries for SIMD32. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	1650442026	intel/fs: Disable SIMD32 dispatch for fragment shaders with discard. Current discard handling requires dedicating the second flag register to discard. However, control-flow in SIMD32 requires both flag registers so it's incompatible with the current discard handling. Just don't support SIMD32+discard for now. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	1811cbdc25	intel/fs: Disable SIMD32 dispatch on Gen4-6 with control flow The hardware's control flow logic is 16-wide so we're out of luck here. We could, in theory, support SIMD32 if we know the control-flow is uniform but we don't have that information at this point. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	d5b617a28e	intel/fs: Split instructions low to high in lower_simd_width Commit `0d905597f` fixed an issue with the placement of the zip and unzip instructions. However, as a side-effect, it reversed the order in which we were emitting the split instructions so that they went from high group to low instead of low to high. This is fine for most things like texture instructions and the like but certain render target writes really want to be emitted low to high. This commit just switches the order back around to be low to high. Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: `0d905597f` "intel/fs: Be more explicit about our placement of [un]zip"	2018-06-28 13:19:38 -07:00
Jason Ekstrand	0b830081f0	intel/fs: Rework KSP data to be SIMD width-based Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	9d78abbef8	intel/compiler: Add and use helpers for working with KSP indices The pixel shader dispatch table is kind-of a confusing mess. This adds some helpers for dealing with it and for easily extracting the correct data from wm_prog_data. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	5b6e91dd35	intel/fs: Remove program key argument from generator. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	a14fb0184a	intel/fs: Set up FB write message headers in the visitor Doing instruction header setup in the generator is awful for a number of reasons. For one, we can't schedule the header setup at all. For another, it means lots of implied writes which the instruction scheduler and other passes can't properly read about. The second isn't a huge problem for FB writes since they always happen at the end. We made a similar change to sampler handling in `ff4726077d`. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	dda31a7bbc	intel/fs: Fix implied_mrf_writes() for headerless FB writes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	90643689aa	intel/fs: Fix fs_inst::flags_written() for Gen4-5 FB writes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	ed09e78023	intel/eu: Return new instruction to caller from brw_fb_WRITE(). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	c0a1c248b8	intel/fs: Pull FB write implied headers from src[0] Now that we have the implied header in src[0] for tracking purposes, we may as well use it in the generator. This makes things a tiny bit more general. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	b1cc9a9ae1	intel/fs: Properly track implied header regs read by FB writes The FB write opcode on gen4-5 does implied copies from g0 and g1 to the message payload. With this commit, we start tracking that as part of the IR by having the FB write read from g0-1. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	d91fa20655	intel/fs: FS_OPCODE_REP_FB_WRITE has side effects It doesn't matter since we don't ever run replicated write shaders through the optimizer but it's good to be complete. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Andrii Simiklit	232c5d75ea	i965/gen6/gs: Handle case where a GS doesn't allocate VUE We can not use the VUE Dereference flags combination for EOT message under ILK and SNB because the threads are not initialized there with initial VUE handle unlike Pre-IL. So to avoid GPU hangs on SNB and ILK we need to avoid usage of the VUE Dereference flags combination. (Was tested only on SNB but according to the specification SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6 the ILK must behave itself in the similar way) v2: Approach to fix this issue was changed. Instead of different EOT flags in the program end we will create VUE every time even if GS produces no output. v3: Clean up the patch. Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399 CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Tested-by: Mark Janes <mark.a.janes@intel.com>	2018-06-26 08:18:55 +02:00
Jason Ekstrand	5a02ffb733	nir: Rework lower_locals_to_regs to use deref instructions This completely reworks the pass to support deref instructions and delete support for old deref chains Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-06-22 20:54:00 -07:00
Jason Ekstrand	2fa7a4a541	intel,ir3: Re-enable nir_opt_copy_prop_vars Now that it's rewritten for deref instructions, we can turn it back on. Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-06-22 20:54:00 -07:00
Jason Ekstrand	606eb56ab9	intel/nir: Only lower load/store derefs Everything else should already be handled. Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-06-22 20:54:00 -07:00
Jason Ekstrand	71cd9ebed9	intel/fs: Use image_deref intrinsics instead of image_var Since we had to rewrite the deref walking loop anyway, I took the opportunity to make it a bit clearer and more efficient. In particular, in the AoA case, we will now emit one minmax instead of one per array level. Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-06-22 20:54:00 -07:00
Jason Ekstrand	152057b138	i965: Move nir_lower_deref_instrs to right before locals_to_regs Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-06-22 20:15:57 -07:00
Jason Ekstrand	d7d5aab45b	intel,ir3: Disable nir_opt_copy_prop_vars This pass doesn't handle deref instructions yet. Making it handle both legacy derefs and deref instructions would be painful. Since it's not important for correctness, just disable it for now. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-06-22 20:15:56 -07:00
Jose Maria Casanova Crespo	b8e099e7d5	intel/fs: shuffle_64bit_data_for_32bit_write is not used anymore Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	a4965842d6	intel/fs: Use new shuffle_32bit_write for all 64-bit storage writes Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	a4d445b93c	intel/fs: shuffle_32bit_load_result_to_64bit_data is not used anymore Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	71b319a285	intel/fs: Use shuffle_from_32bit_read for 64-bit FS load_input As the previous use of shuffle_32bit_load_result_to_64bit_data had a source/destination overlap for 64-bit. Now a temporary destination is used for 64-bit cases to use shuffle_from_32bit_read that doesn't handle src/dst overlaps. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	8003ae87f4	intel/fs: shuffle_from_32bit_read at load_per_vertex_input at TCS/TES Previously, the shuffle function had a source/destination overlap that needs to be avoided to use shuffle_from_32bit_read. As we can use for the shuffle destination the destination of removed MOVs. This change also avoids the internal MOVs done by the previous shuffle to deal with possible overlaps. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	5565630f85	intel/fs: Use shuffle_from_32bit_read at VS load_input shuffle_from_32bit_read manages 32-bit reads to 32-bit destination in the same way that the previous loop so now we just call the new function for all bitsizes, simplifying also the 64-bit load_input. v2: Add comment about future 16-bit support (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	152bffb69b	intel/fs: Use shuffle_from_32bit_read for 64-bit gs_input_load This implementation avoids two unneeded MOVs for each 64-bit component. One was done in the old shuffle, to avoid cases of src/dst overlap but this is not the case. And the removed MOV was already being being done in the shuffle. Copy propagation wasn't able to remove them because shuffle destination values are defined with partial writes because they have stride == 2. v2: Reword commit log summary (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	8b26a2d96d	intel/fs: shuffle_from_32bit_read for 64-bit do_untyped_vector_read do_untyped_vector_read is used at load_ssbo and load_shared. The previous MOVs are removed because shuffle_from_32bit_read can handle storing the shuffle results in the expected destination just using the proper offset. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	c2297bdf19	intel/fs: Remove old 16-bit shuffle/unshuffle functions Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	fd3d8a8f79	intel/fs: Use shuffle_for_32bit_write for 16-bits store_ssbo Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	20e4732f7d	intel/fs: Use shuffle_from_32bit_read to read 16-bit SSBO Using shuffle_from_32bit_read instead of 16-bit shuffle functions avoids the need of retype. At the same time new function are ready for 8-bit type SSBO reads. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	a0891eabca	intel/fs: Use shuffle_from_32bit_read at VARYING_PULL_CONSTANT_LOAD shuffle_from_32bit_read can manage the shuffle/unshuffle needed for different 8/16/32/64 bit-sizes at VARYING PULL CONSTANT LOAD. To get the specific component the first_component parameter is used. In the case of the previous 16-bit shuffle, the shuffle operation was generating not needed MOVs where its results where never used. This behaviour passed unnoticed on SIMD16 because dead_code_eliminate pass removed the generated instructions but for SIMD8 they cound't be removed because of being partial writes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	22c654941b	intel/fs: New shuffle_for_32bit_write and shuffle_from_32bit_read These new shuffle functions deal with the shuffle/unshuffle operations needed for read/write operations using 32-bit components when the read/written components have a different bit-size (8, 16, 64-bits). Shuffle from 32-bit to 32-bit becomes a simple MOV. shuffle_src_to_dst takes care of doing a shuffle when source type is smaller than destination type and an unshuffle when source type is bigger than destination. So this new read/write functions just need to call shuffle_src_to_dst assuming that writes use a 32-bit destination and reads use a 32-bit source. As shuffle_for_32bit_write/from_32bit_read components take components in unit of source/destination types and shuffle_src_to_dst takes units of the smallest type component, we adjust components and first_component parameters. To enable this new functions it is needed than there is no source/destination overlap in the case of shuffle_from_32bit_read. That never happens on shuffle_for_32bit_write as it allocates a new destination register as it was at shuffle_64bit_data_for_32bit_write. v2: Reword commit log and add comments to explain why first_component and components parameters are adjusted. (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Jose Maria Casanova Crespo	a5665056e5	intel/fs: general 8/16/32/64-bit shuffle_src_to_dst function This new function takes care of shuffle/unshuffle components of a particular bit-size in components with a different bit-size. If source type size is smaller than destination type size the operation needed is a component shuffle. The opposite case would be an unshuffle. Component units are measured in terms of the smaller type between source and destination. As we are un/shuffling the smaller components from/into a bigger one. The operation allows to skip first_component number of components from the source. Shuffle MOVs are retyped using integer types avoiding problems with denorms and float types if source and destination bitsize is different. This allows to simplify uses of shuffle functions that are dealing with these retypes individually. Now there is a new restriction so source and destination can not overlap anymore when calling this shuffle function. Following patches that migrate to use this new function will take care individually of avoiding source and destination overlaps. v2: (Jason Ekstrand) - Rewrite overlap asserts. - Manage type_sz(src.type) == type_sz(dst.type) case using MOVs from source to dest. This works for 64-bit to 64-bits operation that on Gen7 as it doesn't support Q registers. - Explain that components units are based in the smallest type. v3: - Fix unshuffle overlap assert (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-16 22:39:08 +02:00
Ian Romanick	4467040cb6	i965/fs: Propagate conditional modifiers from not instructions Skylake total instructions in shared programs: 14399081 -> 14399010 (<.01%) instructions in affected programs: 26961 -> 26890 (-0.26%) helped: 57 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 1.25 x̃: 1 helped stats (rel) min: 0.16% max: 0.80% x̄: 0.30% x̃: 0.18% 95% mean confidence interval for instructions value: -1.50 -0.99 95% mean confidence interval for instructions %-change: -0.35% -0.25% Instructions are helped. total cycles in shared programs: 532978307 -> 532976050 (<.01%) cycles in affected programs: 468629 -> 466372 (-0.48%) helped: 33 HURT: 20 helped stats (abs) min: 3 max: 360 x̄: 116.52 x̃: 98 helped stats (rel) min: 0.06% max: 3.63% x̄: 1.66% x̃: 1.27% HURT stats (abs) min: 2 max: 172 x̄: 79.40 x̃: 43 HURT stats (rel) min: 0.04% max: 3.02% x̄: 1.48% x̃: 0.44% 95% mean confidence interval for cycles value: -81.29 -3.88 95% mean confidence interval for cycles %-change: -1.07% 0.12% Inconclusive result (%-change mean confidence interval includes 0). All Gen6+ platforms, except Ivy Bridge, had similar results. (Haswell shown) total instructions in shared programs: 12973897 -> 12973838 (<.01%) instructions in affected programs: 25970 -> 25911 (-0.23%) helped: 55 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.07 x̃: 1 helped stats (rel) min: 0.16% max: 0.62% x̄: 0.28% x̃: 0.18% 95% mean confidence interval for instructions value: -1.14 -1.00 95% mean confidence interval for instructions %-change: -0.32% -0.24% Instructions are helped. total cycles in shared programs: 410355841 -> 410352067 (<.01%) cycles in affected programs: 578454 -> 574680 (-0.65%) helped: 47 HURT: 5 helped stats (abs) min: 3 max: 360 x̄: 85.74 x̃: 18 helped stats (rel) min: 0.05% max: 3.68% x̄: 1.18% x̃: 0.38% HURT stats (abs) min: 2 max: 242 x̄: 51.20 x̃: 4 HURT stats (rel) min: <.01% max: 0.45% x̄: 0.15% x̃: 0.11% 95% mean confidence interval for cycles value: -104.89 -40.27 95% mean confidence interval for cycles %-change: -1.45% -0.66% Cycles are helped. Ivy Bridge total instructions in shared programs: 11679351 -> 11679301 (<.01%) instructions in affected programs: 28208 -> 28158 (-0.18%) helped: 50 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.12% max: 0.54% x̄: 0.23% x̃: 0.16% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.27% -0.19% Instructions are helped. total cycles in shared programs: 257445362 -> 257444662 (<.01%) cycles in affected programs: 419338 -> 418638 (-0.17%) helped: 40 HURT: 3 helped stats (abs) min: 1 max: 170 x̄: 65.05 x̃: 24 helped stats (rel) min: 0.02% max: 3.51% x̄: 1.26% x̃: 0.41% HURT stats (abs) min: 2 max: 1588 x̄: 634.00 x̃: 312 HURT stats (rel) min: 0.05% max: 2.97% x̄: 1.21% x̃: 0.62% 95% mean confidence interval for cycles value: -97.96 65.41 95% mean confidence interval for cycles %-change: -1.56% -0.62% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45. v2: Move 'if (cond != BRW_CONDITIONAL_Z && cond != BRW_CONDITIONAL_NZ)' check outside the loop. Suggested by Iago. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>	2018-06-15 17:22:27 -07:00
Ian Romanick	f2d8bb7a7b	i965/fs: Rearrange code to remove most of the gotos Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>	2018-06-15 17:22:27 -07:00
Ian Romanick	77f269bb56	i965/fs: Refactor propagation of conditional modifiers from compares to adds Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>	2018-06-15 17:22:27 -07:00
Ian Romanick	22f9fbc0d9	i965/vec4: Optimize OR with 0 into a MOV All of the affected shaders are geometry shaders... the same ones from the similar fs changes. The "No changes on any other platforms" comment below is not quite right. Without the previous change to register coalescing, this optimization caused quite a few regressions in tests that either used gl_ClipVertex or used different interpolation modes. I observed that with both patches applied, glsl-1.10/execution/interpolation/interpolation-none-gl_BackSecondaryColor-smooth-vertex.shader_test was one instruction shorter. I suspect other shaders would be similarly affected. Since this is all based on NOS, shader-db does not reflect it. Haswell total instructions in shared programs: 12954955 -> 12954918 (<.01%) instructions in affected programs: 3603 -> 3566 (-1.03%) helped: 37 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.21% max: 2.50% x̄: 1.99% x̃: 2.50% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -2.30% -1.69% Instructions are helped. total cycles in shared programs: 410012108 -> 410012098 (<.01%) cycles in affected programs: 3540 -> 3530 (-0.28%) helped: 5 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.28% max: 0.28% x̄: 0.28% x̃: 0.28% 95% mean confidence interval for cycles value: -2.00 -2.00 95% mean confidence interval for cycles %-change: -0.28% -0.28% Cycles are helped. Ivy Bridge total instructions in shared programs: 11679387 -> 11679351 (<.01%) instructions in affected programs: 3292 -> 3256 (-1.09%) helped: 36 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.21% max: 2.50% x̄: 2.04% x̃: 2.50% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -2.34% -1.74% Instructions are helped. No changes on any other platforms. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>	2018-06-15 17:22:27 -07:00
Ian Romanick	e6a9bd97b9	i965/vec4: Don't register coalesce into source of VS_OPCODE_UNPACK_FLAGS_SIMD4X2 This prevents regressions in a bunch of clipping and interpolation tests caused by the next patch (i965/vec4: Optimize OR with 0 into a MOV). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>	2018-06-15 17:22:27 -07:00

... 16 17 18 19 20 ...

1355 commits