fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-22 04:10:40 +01:00

Author	SHA1	Message	Date
Dave Airlie	d98d6e6269	egl/dri3: don't crash on no context. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94925 Pointed out by Karol Herbst on irc. Signed-off-by: Dave Airlie <airlied@redhat.com> Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-30 11:30:04 +10:00
Dave Airlie	e2791b38b4	mesa/program_interface_query: fix transform feedback varyings. The spec says gl_NextBuffer and gl_SkipComponents need to be returned to userspace in the program interface queries. We currently throw those away, this requires a complete piglit run to make sure no drivers fallover due to the extra varyings. This fixes: GL45-CTS.program_interface_query.transform-feedback-built-in Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2016-05-30 11:26:50 +10:00
Dave Airlie	6effdce92e	glsl/ast: subroutineTypes can't be returned from functions. These types can't be returned. This fixes: GL43-CTS.shader_subroutine.subroutines_not_allowed_as_variables_constructors_and_argument_or_return_types for the return type case. Reviewed-by: Chris Forbes <chrisforbes@google.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2016-05-30 11:25:30 +10:00
Timothy Arceri	db2a35193f	glsl: use has_double() helper Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>	2016-05-30 11:01:40 +10:00
Timothy Arceri	8f4ac20b6f	glsl: fix explicit uniform block alignment This stops the offset being bumped again when and an explicit alignment has already been applied. Fixes alignment issues in: GL44-CTS.enhanced_layouts.uniform_block_alignment Note the test still fails due to unrelated issues with doubles. Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>	2016-05-30 11:01:32 +10:00
Jordan Justen	7398a32c50	i965: Shrink stage_prog_data param array length It appears we were over-allocating these arrays. Previously we would use nir->num_uniforms directly for scalar programs, and multiply it by 4 for vec4 programs. Instead we should have been dividing by 4 in both cases to convert from bytes to a gl_constant_value count. The size of gl_constant_value is 4 bytes. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-29 09:59:55 -07:00
Ilia Mirkin	160063b110	nv50,nvc0: fix the max_vertices=0 case This is apparently legal. Drop any emit/restarts, and pass a 1 to the hardware. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-05-29 09:34:03 -04:00
Ilia Mirkin	f2e7268a55	st/mesa: fix setting of point_size_per_vertex in ES contexts GL ES 2.0+ does not have a GL_PROGRAM_POINT_SIZE enable, unlike desktop GL. So we have to go and check the last pre-rasterizer stage to see whether it outputs a point size or not. This fixes a number of dEQP tests that use a geometry or tessellation shader to emit points primitives. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>	2016-05-29 09:34:03 -04:00
Marek Olšák	04a78068ff	mesa: skip level checking for FramebufferTexture*D if texture is zero From the OpenGL 4.5 core spec: "An INVALID_VALUE error is generated if texture is not zero and level is not a supported texture level for textarget, as described above." Other FramebufferTexture functions already do the right thing. This fixes the main menu in F1 2015. Cc: 11.1 11.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Dave Airlie <airlied@redhat.com>	2016-05-29 14:24:23 +02:00
Ilia Mirkin	60341ddd5c	st/mesa: expose OES_shader_io_blocks when we have enough for ES 3.1 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-05-28 20:58:12 -04:00
Vinson Lee	884ac61722	swr: [rasterizer] Do not define _mm256_storeu2_m128i with icc. Fix build error with icc. CXX libswrAVX_la-swr_clear.lo icpc: command line warning #10006: ignoring unknown option '-Wdelete-non-virtual-dtor' In file included from ./rasterizer/jitter/jit_api.h(31), from swr_context.h(30), from swr_clear.cpp(24): ./rasterizer/common/os.h(135): error: expected an identifier void _mm256_storeu2_m128i(__m128i hi, __m128i lo, __m256i a) ^ Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-05-28 14:26:54 -07:00
Thomas Hindoe Paaboel Andersen	df210ff24d	i965: add missing return in if statement Re-add the "return false" that was removed in `0c02d7002d` It seems that something went wrong when merging the patch. The patch sent to the mailing list does not directly match what was committed. https://lists.freedesktop.org/archives/mesa-dev/2016-May/118198.html Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-28 11:26:33 -07:00
Ilia Mirkin	c7731a0740	gk110/ir: fix unspilling of predicates from registers Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96258 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.2 11.1" <mesa-stable@lists.freedesktop.org>	2016-05-28 13:14:19 -04:00
Samuel Pitoiset	697237b71e	nvc0: remove outdated surfaces validation code for GK104 This code was used for validating surfaces with compute but now we use pipe_image_view instead. Anyway, surfaces support should be re-introduced properly once OpenCL happens. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-05-28 15:50:07 +02:00
Samuel Pitoiset	f07ade6881	nvc0: do not always invalidate 3D CBs when using compute Constant buffers are aliased between 3D and CP on Fermi, but we should only invalidate them when a compute shader actually uses CBs and not all the time after a lauching grid. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-05-28 15:50:03 +02:00
Francisco Jerez	357495b94d	i965: Update compute workgroup size limit calculation for SIMD32. This should have the side effect of enabling the ARB_compute_shader extension on Gen8+ hardware and all Gen7 platforms that didn't previously expose it (VLV and IVB GT1) due to the number of hardware threads per subslice being insufficient in SIMD16 mode. v2: Bump workgroup size limit for GLES too (Jordan). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2016-05-27 23:29:06 -07:00
Francisco Jerez	46ce93ed22	i965: Add do32 debug option. The do32 INTEL_DEBUG option causes the back-end to try to generate a SIMD32 program when compiling a compute shader regardless of the specified compute shader workgroup size, which will be useful for testing SIMD32 code generation in the most common case in which the workgroup size doesn't exceed the SIMD16 limit so SIMD32 codegen wouldn't be automatically enabled. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:06 -07:00
Francisco Jerez	864737ce6c	i965/fs: Build 32-wide compute shader when needed. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:06 -07:00
Francisco Jerez	37fd13ee2d	i965/fs: Extend back-end interface for limiting the shader dispatch width. This replaces the current fs_visitor::no16() interface with fs_visitor::limit_dispatch_width(), which takes an additional parameter allowing the caller to specify the maximum dispatch width a shader can be compiled with. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:06 -07:00
Francisco Jerez	2d288cb9ea	i965/fs: Implement SIMD32 register allocation support. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:06 -07:00
Francisco Jerez	7f10d3983b	i965/fs: Remove pre-Gen7 register allocation class micro-optimization. This was trying to save some one-time init on pre-Gen7 hardware under the assumption that one would only ever need 1, 2, 4 and 8-wide registers on those platforms. However nothing guarantees that those will be the only VGRF sizes used after lowering and optimization. In some cases we may end up with a temporary of different size being allocated (e.g. by SIMD lowering to zip or unzip a multi-component register region of a logical send instruction), and there is no guarantee that they will be optimized away before register allocation (especially since the compute_to_mrf coalescing pass is rather... lacking...). Instead just allocate classes for all possible VGRF sizes up to MAX_VGRF_SIZE to avoid a crash in pq_test() when we encounter a variable of any other size. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:06 -07:00
Francisco Jerez	1d5bf46ad1	i965/fs: Don't mutate multi-component arguments in sampler payload set-up. The Gen5+ sampler message payload construction code steps through the coordinate and derivative components by induction like 'coordinate = offset(coordinate, bld, 1)', the problem is that while doing that it may step one past the end of the coordinate vector causing an assertion failure in offset() if it happens to be a (single component) immediate. Right now coordinates and derivatives are typically passed as actual registers but that will no longer be the case when we start propagating constants into logical messages. Instead express coordinate components in closed form like 'offset(coordinate, bld, i)' -- The end result seems slightly more readable that way and it allows passing the coordinate and derivative registers by const reference instead of by value, so it seems like a clean-up in its own right. v2: Fold a few post-increment operators into the last MOV statement. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:06 -07:00
Francisco Jerez	ad8f66ed33	i965/fs: Fix multiple ACP interference during copy propagation. This is more fallout from `cf375a3333`. It's possible for multiple ACP entries to interfere with a given VGRF write, so we need to continue iterating even if an overlapping entry has already been found. Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:06 -07:00
Francisco Jerez	c88b52745c	i965/fs: Fix cmod propagation not to propagate non-identity cmod into CMP(N). The conditional mod of these instructions determines the semantics of the comparison itself (rather than being evaluated based on the result of the instruction as is usually the case for most other instructions that allow conditional mods), so it's in general not legal to propagate a conditional mod into a CMP instruction. This prevents cmod propagation from (mis)optimizing: cmp.z.f0 tmp, ... mov.z.f0 null, tmp into: cmp.z.f0 tmp, ... which gives the negation of the flag result of the original sequence. I could reproduce this easily with SIMD32 but I don't see any reason why the problem would be SIMD32-specific, it was most likely working by luck. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:06 -07:00
Francisco Jerez	8476233ae2	i965/fs: Estimate number of registers written correctly in opt_register_renaming. The current estimate is incorrect for non-32b types. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:05 -07:00
Francisco Jerez	437e65f9d9	i965/fs: Add (sub)reg_offset asserts to brw_reg_from_fs_reg. These are completely ignored by the conversion to brw_reg, so they better be zero. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:05 -07:00
Francisco Jerez	51dd6a60f5	i965/fs: Reset reg_offset of the original destination to zero in compute_to_mrf(). Prevents an assertion failure in the following commit. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:05 -07:00
Francisco Jerez	b9eab911ba	i965/fs: Skip remove_duplicate_mrf_writes() during SIMD32 runs. The pass is disabled in SIMD16 dispatch mode for the same reason, it cannot handle instructions that write multiple MRF registers at once. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:05 -07:00
Francisco Jerez	796238d9e6	i965/fs: Use SIMD8 SSBO GET_BUFFER_SIZE message regardless of the dispatch width. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:05 -07:00
Francisco Jerez	29e4717251	i965/fs: Don't emit duplicated SSBO GET_BUFFER_SIZE instruction unnecessarily. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:05 -07:00
Francisco Jerez	a55452530f	i965/fs: Emit fixed width memory fence opcode regardless of the dispatch width. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:05 -07:00
Francisco Jerez	ae730049c6	i965/fs: Return 32 bit mask from fs_builder::sample_mask(). This doesn't actually handle the FS case, just add an assertion for the moment so I don't forget to update it later on for SIMD32 fragment shader dispatch. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:05 -07:00
Francisco Jerez	8b6edee679	i965/fs: Emit fixed-width null register regardless of the dispatch width. brw_null_vec() cannot handle widths over 16 but it doesn't really matter what width we specify for null registers because destination regions have no width field at the hardware level. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:05 -07:00
Francisco Jerez	298320280f	i965/fs: Fix half() to handle more exotic register files. horiz_offset() is able to deal with a superset of the register files currently special-cased in half(). Just call horiz_offset() in all cases. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:05 -07:00
Francisco Jerez	8c9601ef7b	i965/fs: Fix horiz_offset() to handle ARF and HW GRF register files. We'll hit these in some cases during SIMD lowering in 32-wide programs. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:05 -07:00
Francisco Jerez	7d430fc05e	i965/fs: Clean up remaining uses of fs_inst::reads_flag and ::writes_flag. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:05 -07:00
Francisco Jerez	ecd7a7255a	i965/fs: Keep track of flag dependencies with byte granularity during scheduling. This prevents false dependencies from being created between instructions that write disjoint 8-bit portions of the flag register and OTOH should make sure that the scheduler considers dependencies between instructions that write or read multiple flag subregisters at once (e.g. 32-wide predication or conditional mods). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:04 -07:00
Francisco Jerez	0fec265373	i965/fs: Track flag register liveness with byte granularity. This is required for correctness in presence of multiple 8-wide flag writes (e.g. 8-wide instructions with a conditional mod set) which update a different portion of the same 16-bit flag subregister. Right now we keep track of flag dataflow with 16-bit granularity and consider flag writes to have killed any previous definition of the same subregister even if the write was less than 16 channels wide, which can cause live flag register updates to be dead code-eliminated incorrectly. Additionally this makes sure that we handle 32-wide flag writes and reads which may span multiple flag subregisters so the current approach of just setting/testing a single bit from the live set wouldn't have worked. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:04 -07:00
Francisco Jerez	df1aec763e	i965/fs: Define methods to calculate the flag subset read or written by an fs_inst. v2: Codestyle fixes (Jason). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:04 -07:00
Francisco Jerez	ece41df247	i965/fs: Expose arbitrary channel execution groups to the IR. This generalizes the current fs_inst::force_sechalf flag to allow specifying channel enable groups other than 0 or 8. At some point it will likely make sense to fix the vec4 generator to support arbitrary execution groups and then move the definition of fs_inst::group into backend_instruction (e.g. so we can do FP64 in the VEC4 back-end). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:04 -07:00
Francisco Jerez	81bc6de8c0	i965/ir: Make BROADCAST emit an unmasked single-channel move. Alternatively we could have extended the current semantics to 32-wide mode by changing brw_broadcast() to emit multiple indexed MOV instructions in the generator copying the selected value to all destination registers, but it seemed rather silly to waste EU cycles unnecessarily copying the exact same value 32 times in the GRF. The vstride change in the Align16 path is required to avoid assertions in validate_reg() since the change causes the execution size of the MOV and SEL instructions to be equal to the source region width. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:04 -07:00
Francisco Jerez	41562eb8f3	i965/fs: Allow specifying arbitrary quarter control to FIND_LIVE_CHANNEL. This makes FIND_LIVE_CHANNEL behave like a normal instruction for non-zero quarter control. On Gen8+ we just leave the quarter control field of the emitted FBL instruction set to the default value so the hardware applies the expected shift to the execution mask signals. On Gen7 we apply the offset manually by specifying a non-zero subregister offset in the source region of the FBL instruction. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:04 -07:00
Francisco Jerez	a5a0810960	i965/fs: Allow specifying arbitrary execution sizes up to 32 to FIND_LIVE_CHANNEL. Due to a Gen7-specific hardware bug native 32-wide instructions get the lower 16 bits of the execution mask applied incorrectly to both halves of the instruction, so the MOV trick we currently use wouldn't work. Instead emit multiple 16-wide MOV instructions in 32-wide mode in order to cover the whole execution mask. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:04 -07:00
Francisco Jerez	1e3c58ffaf	i965/fs: Lower 32-wide scratch writes in the generator. The hardware has messages that can write 32 32bit components at once but the channel enable mask gets messed up. We need to split them into several 16-wide scratch writes for the channel enables to be applied correctly. The SIMD lowering pass cannot be used for this because scratch writes are emitted rather late during register allocation long after SIMD lowering has been done. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:29:02 -07:00
Francisco Jerez	a7d319c00b	i965/fs: Implement scratch reads and writes of 4 GRFs at a time. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:28:59 -07:00
Francisco Jerez	fe5cdde2f9	i965/eu: Fix Gen7+ DP scratch message size calculation on Gen7. Gen7 hardware expects the block size field in the message descriptor to be the number of registers minus one instead of the log2 of the number of registers. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:28:59 -07:00
Francisco Jerez	fc7107de1d	i965/eu: Set execution size explicitly for memory fence send message. We don't want to emit a 32-wide send message in 32-wide programs. The memory fence message should have the same effect regardless of the execution size (as long as it's valid) so just set it to one. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:28:59 -07:00
Francisco Jerez	5c887326c5	i965/eu: Consider QtrCtrl 3Q-4Q in typed surface message descriptor setup. In SIMD32 programs the compiler is responsible for providing the appropriate half of the sample mask in the message header, so the first and third quarters both map to the first slot group of the provided 16-bit half, while the second and fourth quarters map to the second slot group -- IOW they should be equivalent to 1Q and 2Q modulo two. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:28:59 -07:00
Francisco Jerez	448340d31f	i965/fs: Clean up remaining uses of dispatch_width in the generator. Most of these are bugs because the intended execution size of an instruction and the dispatch width of the shader aren't necessarily the same (especially in SIMD32 programs). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:28:59 -07:00
Francisco Jerez	7f28ad8c4d	i965/eu: Remove brw_codegen::compressed and ::compressed_stack. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-27 23:28:59 -07:00

... 71 72 73 74 75 ...

85652 commits