fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 18:00:13 +01:00

Author	SHA1	Message	Date
Francisco Jerez	008f95a043	intel/fs: Add virtual instruction to load mask of live channels into flag register. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Cc: 20.0 <mesa-stable@lists.freedesktop.org>	2020-02-14 14:31:48 -08:00
Jason Ekstrand	f93dfb509c	intel/fs: Write the address register with NoMask for MOV_INDIRECT This fixes a hang in the following Vulkan CTS test on TGL-LP: dEQP-VK.descriptor_indexing.storage_buffer_dynamic_in_loop Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3642> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3642>	2020-01-31 17:23:39 +00:00
Sagar Ghuge	a27542c5dd	intel/compiler: Clear accumulator register before EOT v2: (Francisco Jerez) - Drop vec4 changes. - Handle explicit acc0 operand and implicit one. - Make sure instruction is SIMD16, prediction is off and default mask control set to true. v3: (Francisco Jerez) - Clear accumulator only when it's written. - Use BRW_MASK_DISABLE instead of true. - Use correct width for brw_acc_reg(). - Fix last_inst_offset. v4: (Francisco Jerez) - Don't check for last instruction for accummulator write. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3376> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3376>	2020-01-27 19:48:11 +00:00
Matt Turner	49c21802cb	intel/compiler: Split has_64bit_types into float/int Gen7 has 64-bit floats but not 64-bit ints. Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>	2020-01-22 00:19:20 +00:00
Caio Marcelo de Oliveira Filho	18e72ee210	intel/fs: Add FS_OPCODE_SCHEDULING_FENCE Like a SHADER_OPCODE_MEMORY_FENCE but doesn't doesn't generate any assembly code. Will be used when the compiler shouldn't reorder certain instructions but there's no need to generate code for the HW to do it -- as the ordering will be guaranteed by other means. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>	2020-01-21 23:41:35 +00:00
Jason Ekstrand	53bfcdeecf	intel/fs: Implement the new load/store_scratch intrinsics This commit fills in a number of different pieces: 1. We add support to brw_nir_lower_mem_access_bit_sizes to handle the new intrinsics. This involves simple plumbing work as well as a tiny bit of extra logic to always scalarize scratch intrinsics 2. Add code to brw_fs_nir.cpp to turn nir_load/store_scratch intrinsics into byte/dword scattered read/write messages which use the A32 stateless model. 3. Add code to lower_surface_logical_send to handle dword scattered messages and the A32 stateless model. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-11-11 17:17:02 +00:00
Kenneth Graunke	f192741ddd	intel/compiler: Report the number of non-spill/fill SEND messages This can be useful to measure whether memory access optimizations are having the desired effect. For example, we might see a reduction in image loads/stores, or constant buffer loads. We can already see this in cycle estimates to some extent, but this is a more direct approach, minus a lot of the noise of random scheduler shuffling. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-10-17 20:44:00 -07:00
Francisco Jerez	ceb123befa	intel/fs/gen11+: Fix CS_OPCODE_CS_TERMINATE codegen. Apparently the ts_request_type and ts_resource_select thread spawner message descriptor bits were removed from the hardware at least since ICL. Drop them in order to avoid assertion failures on Gen12+ platforms which don't have any encoding for this. On Gen9+ these are probably just ignored by the hardware, so this is unlikely to have had any functional implications prior to Gen12. v2: Mark TS message fields as non-existing in brw_inst.h on ICL. (Caio) Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-10-11 12:24:16 -07:00
Francisco Jerez	a5efb0eae8	intel/fs/gen12: Fix barrier codegen. The WAIT instruction has been removed, but SYNC.bar can be used instead to wait for a notification on n0.0. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-11 12:24:16 -07:00
Francisco Jerez	e0b8d7953e	intel/fs/gen12: Add scheduling information to the IR. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-10-11 12:24:16 -07:00
Francisco Jerez	15e3a0d9d2	intel/eu/gen12: Set SWSB annotations in hand-crafted assembly. Reviewers are encouraged to audit the code generation pass independently for the case I missed some potential data hazard or new code has been added in the meantime. v2: Add SYNC instruction to cr0 workaround in brw_float_controls_mode(). v3: Drop likely redundant (and potentially harmful) RegDist SWSB annotation from ce0 read in brw_find_live_channel() (Caio). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-10-11 12:24:16 -07:00
Francisco Jerez	c22db5e188	intel/fs/gen12: Add codegen support for the SYNC instruction. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-11 12:24:16 -07:00
Francisco Jerez	a66ea33991	intel/eu/gen12: Don't set DD control, it's gone. A future lowering pass will simulate the same behavior originally provided by NoDDChk/NoDDClr at the IR level by using appropriate SWSB annotations. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-11 12:24:16 -07:00
Francisco Jerez	8a5fad0d92	intel/eu/gen12: Use SEND instruction for split sends. The new SEND instruction behaves like the former SENDS instruction. The original single-payload SEND instruction is gone. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-11 12:24:16 -07:00
Francisco Jerez	6634ede7aa	intel/eu/gen12: Codegen SEND descriptor regions correctly. The SEND instruction is now four-source. The descriptor is no longer part of source 1, so avoid touching it to avoid corruption while initializing the descriptor. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-11 12:24:16 -07:00
Jason Ekstrand	651725f7a1	intel/fs: Allow CLUSTER_BROADCAST to do type conversion We can't really handle it in the little-core 64-bit case but it's not really needed there. Where we really want this is for when we need to do 16 -> 8-bit conversions. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-20 18:02:15 +00:00
Paulo Zanoni	10532c6831	intel/fs: don't forget the stride at generate_shuffle During generate_shuffle(), when we use byte sized registers we end up with a destination stride of 2. We don't take the stride into consideration when selecting the group offset for the last MOV operation, which means we end up moving things to the wrong place, leaving the last few channels untouched. Take the destination stride in consideration so we don't miss the last channels. v2: Assert this is not necessary for the IVB special case (Jason). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-20 10:57:05 -07:00
Paulo Zanoni	8e614c7a29	intel/fs: fix SHADER_OPCODE_CLUSTER_BROADCAST for SIMD32 The current code can create functions with a width of 32, which is not supported by our hardware. Add some code to simplify how we express what we want and prevent such cases. For some unknown reason, all the tests I could run seem to work even with these unsupported MOVs. Fixes: `b0858c1cc6` "intel/fs: Add a couple of simple helper opcodes" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-19 02:48:27 +00:00
Paulo Zanoni	c99df52873	intel/fs: the maximum supported stride width is 16 There are cases where we try to generate registers with a stride of 32, while the hardware maximum is just 16. This happens, for example, when using 8 bit integers on SIMD32. This results in a crash because the variable 'width' has a value of 32: ../../src/intel/compiler/brw_reg.h:550: brw_reg brw_vecn_reg(unsigned int, brw_reg_file, unsigned int, unsigned int): Assertion `!"Invalid register width"' failed. This change prevents the crash and makes the tests pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-19 02:48:27 +00:00
Samuel Iglesias Gonsálvez	8a6507b6fe	i965/fs/generator: add new opcode to set float controls modes in control register Before this commit, we had only FPRoundingMode decoration (the per instruction one) that is applied during the SPIR-V handling. In vtn_alu we find out the rounding mode, and generate the code accordingly that later will be used to look for the respective nir_op_f2f16_{rtz,rtne}. Per-instruction gets prioritized because we make them explicit conversions (with RTZ or RTNE nir opcodes) and they will override the default execution mode defined with float controls. However, we need to come back to the mode defined by float controls after the execution of the FP Rounding instruction. Therefore, the new SHADER_OPCODE_FLOAT_CONTROL_MODE opcode will be used to set the default rounding mode and denorms treatment in the whole shader while the pre-existent SHADER_OPCODE_RND_MODE, will be used as prioritized rounding mode in a per-instruction basis. v2: - Fix bug in defining BRW_CR0_FP_MODE_MASK. v3: - Update comment (Caio). v4: - Split the patch into the helper and the new opcode (this one) (Caio). v5: - Add an explanation on the actual purpose and priority of the newly introduced opcode in the commit log (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-17 23:39:19 +03:00
Samuel Iglesias Gonsálvez	28da9558f5	i965/fs/generator: refactor rounding mode helper in preparation for float controls v2: - Fix bug in defining BRW_CR0_FP_MODE_MASK. v3: - Update comment (Caio). v4: - Split the patch into the helper (this one) and the new opcode (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-17 23:39:19 +03:00
Jason Ekstrand	d15fe8ca82	Revert "intel/fs: Move the scalar-region conversion to the generator." This reverts commit `c0504569ea`. Now that we're doing interpolation lowering in NIR, we can continue to stride the FS input registers directly in the brw_fs_nir code like we did before. This fixes SIMD32 fragment shaders which broke because lower_simd_width depended on the 0 stride to split PLN instructions correctly. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-09-06 03:58:09 +00:00
Kenneth Graunke	86a63b1098	intel/compiler: Refactor FB write message control setup into a helper. This will be used by visitor code to convert directly to SEND in a bit. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-27 14:20:07 -07:00
Jason Ekstrand	134607760a	intel/compiler: Fill a compiler statistics struct This commit is all annoying plumbing work which just adds support for a new brw_compile_stats struct. This struct provides a binary driver readable form of the same statistics we dump out to stderr when we INTEL_DEBUG is set with a shader stage. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Danylo Piliaiev	04a9951580	intel/compiler: add ability to override shader's assembly When dumping shader's assembly with INTEL_DEBUG=vs,tcs,... sha1 of the resulting assembly is also printed, having environment variable INTEL_SHADER_ASM_READ_PATH present driver will try to load a "%sha1%.bin" file from the path and substitute current assembly with the one from the file. Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-08-05 17:19:09 +00:00
Jason Ekstrand	499d760c6e	intel/fs: Use ALIGN16 instructions for all derivatives on gen <= 7 The issue here was discovered by a set of Vulkan CTS tests: dEQP-VK.glsl.derivate..dynamic_ These tests use ballot ops to construct a branch condition that takes the same path for each 2x2 quad but may not be uniform across the whole subgroup. They then tests that derivatives work and give the correct value even when executed inside such a branch. Because the derivative isn't executed in uniform control-flow and the values coming into the derivative aren't smooth (or worse, linear), they nicely catch bugs that aren't uncovered by simpler derivative tests. Unfortunately, these tests require Vulkan and the equivalent GL test would require the GL_ARB_shader_ballot extension which requires int64. Because the requirements for these tests are so high, it's not easy to test on older hardware and the bug is only proven to exist on gen7; gen4-6 are a conjecture. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-30 22:38:19 +00:00
Matt Turner	46a3ea06be	i965/fs: Print the scheduler mode. Line wrap some awfully long lines while we are here. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-07-30 14:35:43 -07:00
Matt Turner	dabb5d4bee	i965/fs: Add a shader_stats struct. It'll grow further, and we'd like to avoid adding an additional parameter to fs_generator() for each new piece of data. v2 (idr): Rebase on 17 months. Track a visitor instead of a cfg. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-30 14:35:43 -07:00
Caio Marcelo de Oliveira Filho	b390ff3517	intel/fs: Add support for SLM fence in Gen11 Gen11 SLM is not on L3 anymore, so now the hardware has two separate fences. Add a way to control which fence types to use. At this time, we don't have enough information in NIR to control the visibility of the memory being fenced, so for now be conservative and assume that fences will need a stall. With more information later we'll be able to reduce those. Fixes Vulkan CTS tests in ICL: dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.workgroup.guard_local.buffer.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.buffer.guard_nonlocal.workgroup.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.image.guard_nonlocal.workgroup.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.buffer.guard_nonlocal.workgroup.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.image.guard_nonlocal.workgroup.comp The whole set of supported tests in dEQP-VK.memory_model.* group should be passing in ICL now. v2: Pass BTI around instead of having an enum. (Jason) Emit two SHADER_OPCODE_MEMORY_FENCE instead of one that gets transformed into two. (Jason) List tests fixed. (Lionel) v3: For clarity, split the decision of which fences to emit from the emission code. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-11 08:29:32 -07:00
Jason Ekstrand	fa869f45c8	intel/fs: Use nir_lower_interpolation on gen11+ On gen11, the removed the PLN instruction so we have to emit a pile of MAD to emulate it. We may as well do that in NIR so we can optimize and later schedule it. Shader-db results on Ice Lake: total instructions in shared programs: 17145644 -> 16556440 (-3.44%) instructions in affected programs: 11507454 -> 10918250 (-5.12%) helped: 35763 HURT: 42085 helped stats (abs) min: 1 max: 140 x̄: 19.09 x̃: 18 helped stats (rel) min: 0.04% max: 37.93% x̄: 15.40% x̃: 14.49% HURT stats (abs) min: 1 max: 248 x̄: 2.22 x̃: 2 HURT stats (rel) min: 0.05% max: 50.00% x̄: 5.00% x̃: 2.47% 95% mean confidence interval for instructions value: -7.67 -7.47 95% mean confidence interval for instructions %-change: -4.46% -4.29% Instructions are helped. total loops in shared programs: 4370 -> 4370 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 360624645 -> 368220857 (2.11%) cycles in affected programs: 269631244 -> 277227456 (2.82%) helped: 15583 HURT: 65874 helped stats (abs) min: 1 max: 28561 x̄: 78.45 x̃: 32 helped stats (rel) min: <.01% max: 67.81% x̄: 5.38% x̃: 2.44% HURT stats (abs) min: 1 max: 238638 x̄: 133.87 x̃: 20 HURT stats (rel) min: <.01% max: 306.25% x̄: 5.81% x̃: 3.97% 95% mean confidence interval for cycles value: 67.42 119.09 95% mean confidence interval for cycles %-change: 3.61% 3.73% Cycles are HURT. total spills in shared programs: 8943 -> 8981 (0.42%) spills in affected programs: 1925 -> 1963 (1.97%) helped: 44 HURT: 14 total fills in shared programs: 21815 -> 21925 (0.50%) fills in affected programs: 3511 -> 3621 (3.13%) helped: 41 HURT: 18 LOST: 70 GAINED: 14 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-02 16:15:25 +00:00
Sagar Ghuge	83fdec0f0d	intel/compiler: Enable the emission of ROR/ROL instructions v2: 1) Drop changes for vec4 backend as on Gen11+ we don't support align16 mode (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 10:14:22 -07:00
Lionel Landwerlin	836225840c	intel/compiler: fix derivative on y axis implementation This rewrites the ddy in EXECUTE_4 mode with a loop to make it more obvious what is going on and also sets the group each of the 4 threads in the groups are supposed to execute. Fixes the following CTS tests : dEQP-VK.glsl.derivate.dfdyfine.dynamic_* Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Co-Authored-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: `2134ea3800` ("intel/compiler/fs: Implement ddy without using align16 for Gen11+")	2019-06-27 18:14:58 +00:00
Jason Ekstrand	f4ef34f207	intel/fs: Add an UNDEF instruction to avoid excess live ranges With 8 and 16-bit types and anything where we have to use non-trivial strides registersto deal with restrictions, we end up with things that look like partial writes even though we don't care about any values in the register except those written by that instruction. This is particularly important when dealing with loops because liveness sees is_partial_write and the fact that an old version from a previous loop iteration may be valid at that point and extends all purely partially written values to the entire loop. This commit adds a new UNDEF instruction which does nothing (the generator doesn't emit anything) but which does a fake write to the register. This informs liveness that we don't care about any values before that point so it won't consider those registers to be falsely live. We can safely emit UNDEF instructions for all SSA values that come in from NIR and nearly all temporaries generated by various stages of the compiler. In particular, we need to insert UNDEF instructions when we handle region restrictions because the newly allocated registers are almost guaranteed to be partially written. No shader-db changes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110432 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-04 14:27:30 -05:00
Jason Ekstrand	9e403dc56e	intel/fs: Do a stalling MFENCE in endInvocationInterlock() Fixes: `939312702e` "i965: Add ARB_fragment_shader_interlock support" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-30 14:00:26 +00:00
Jason Ekstrand	859de4a748	intel/fs,vec4: Use g0 as the header for MFENCE We set header_present but then pass it some random garbage. Give it g0 instead. I'm not actually sure this does anything but g0 is the usual header data and this is what the windows driver does so it seems like a good idea. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-30 14:00:26 +00:00
Matt Turner	e8c74a1e16	intel/compiler: Unset flag reg when FB write is not predicated In the FS IR we pretend that the instruction is predicated with (+f0.1) just for flag dependency tracking purposes. Since the instruction doesn't support predication before Haswell, we unset the predicate so we should also unset the flag register so that we can round-trip the disassembly. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-07 14:33:48 -07:00
Rafael Antognolli	70e03e220c	intel/fs: Remove fs_generator::generate_linterp from gen11+. We now have a lowering pass that will do this at the fs_visitor level, so we can remove this code from gen11+. v2: Reduce size of the "i" array from 4 to 2 (Matt). Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-22 16:54:00 -07:00
Rafael Antognolli	c0504569ea	intel/fs: Move the scalar-region conversion to the generator. Move the scalar-region conversion from the IR to the generator, so it doesn't affect the Gen11 path. We need the non-scalar regioning for a later lowering pass that we are adding. v2: Better commit message (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-22 16:54:00 -07:00
Iago Toral Quiroga	aaae24179f	intel/compiler: fix ddy for half-float in Broadwell Broadwell has restrictions that apply to Align16 half-float that make the Align16 implementation of this invalid for this platform. Use the gen11 path for this instead, which uses Align1 mode. The restriction is not present in cherryview, gen9 or gen10, where the Align16 implementation seems to work just fine. v2: - Rework the comment in the code, move the PRM citation from the commit message to the comment in the code (Matt) - Cherryview isn't affected, only Broadwell (Matt) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1) Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	60c7c6d3ba	intel/compiler: fix ddx and ddy for 16-bit float We were assuming 32-bit elements. Also, In SIMD8 we pack 2 vector components in a single SIMD register, so for example, component Y of a 16-bit vec2 starts is at byte offset 16B. This means that when we compute the offset of the elements to be differentiated we should not stomp whatever base offset we have, but instead add to it. v2 - Use byte_offset() helper (Jason) - Merge the fix for SIMD8: using byte_offset() fixes that too. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1) Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 11:05:18 +02:00
Plamena Manolova	19ab082001	i965: Disable ARB_fragment_shader_interlock for platforms prior to GEN9 ARB_fragment_shader_interlock depends on memory fences to ensure fragment ordering and this ordering guarantee is only supported from GEN9 onwards. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109980 Fixes: `939312702e` "i965: Add ARB_fragment_shader_interlock support." Signed-off-by: Plamena Manolova <plamena.n.manolova@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-14 13:04:12 +00:00
Jason Ekstrand	c4fb6b0c81	intel/eu: Add an EOT parameter to send_indirect_[split]_message For split indirect sends we have to put the EOT parameter in the extended descriptor as well as the instruction itself so just calling brw_inst_set_eot is insufficient. Moving the EOT handling handling into the send_indirect_[split]_message helper lets us handle it properly.	2019-02-25 11:35:12 -06:00
Francisco Jerez	e03be78252	intel/fs: Implement extended strides greater than 4 for IR source regions. Strides up to 32B can be implemented for the source regions of most instructions by leveraging either the vertical or the horizontal stride of the hardware Align1 region. The main motivation for this is that currently the lower_integer_multiplication() pass will happily double the stride of one of the 32-bit sources, which can blow up if the stride of the original source was already the maximum value allowed by the hardware. An alternative would be to use the regioning legalization pass in order to lower such strides into the composition of multiple legal strides, but that would be somewhat less efficient. This showed up as a regression from my commit `cbea91eb57` in Vulkan 1.1 CTS tests on CHV/BXT platforms, however it was really a pre-existing problem that had affected conformance on other platforms without native support for integer multiplication. CHV/BXT were getting around it because the code I removed in that commit had the "fortunate" side effect of emitting narrower regions that didn't hit the hardware stride limit after lowering. Beyond fixing the regression this fixes ~90 additional Vulkan 1.1 subgroup CTS tests on ICL (that's why this patch is marked for inclusion in mesa-stable even though the original regressing patch was not). According to Jason, a nearly equivalent change had been committed previously as `e8c9e65185` and then (mistakenly?) reverted as `a31d038208`. Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328 Reported-by: Mark Janes <mark.a.janes@intel.com> Tested-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 14:07:25 -08:00
Jason Ekstrand	5064464931	intel/fs: Silence a compiler warning Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-14 16:04:47 -06:00
Jason Ekstrand	eab1c55590	intel/fs: Support SENDS in SHADER_OPCODE_SEND Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	b284d222db	intel/fs: Use SHADER_OPCODE_SEND for varying UBO pulls on gen7+ Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	8514eba693	intel/fs: Use SHADER_OPCODE_SEND for texturing on gen7+ Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	d2d3e04501	intel/fs: Use SHADER_OPCODE_SEND for surface messages Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	7f1cf046cd	intel/fs: Add a generic SEND opcode Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Kenneth Graunke	04c2f12ab2	i965: Drop mark_surface_used mechanism. The original idea was that the backend compiler could eliminate surfaces, so we would have it mark which ones are actually used, then shrink the binding table accordingly. Unfortunately, it's a pretty blunt mechanism - it can only prune things from the end, not the middle - since we decide the layout before we even start the backend compiler, and only limit the size. It also basically gives up if it sees indirect array access. Besides, we do the vast majority of our surface elimination in NIR anyway, not the backend - and I don't see that trend changing any time soon. Vulkan abandoned this plan a long time ago, and I don't use it in Iris, but it's still been kicking around in i965. I hacked shader-db to print the binding table size in bytes, and observed no changes with this patch. So, this code appears to do nothing useful. Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-13 09:35:32 -08:00

1 2 3

120 commits