fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 00:48:07 +02:00

Author	SHA1	Message	Date
Kenneth Graunke	447afbf31b	i965: Don't grow batch/state buffer on every emit after an overflow. Once we reach the intended size of the buffer (BATCH_SZ or STATE_SZ), we try and flush. If we're not allowed to flush, we resort to growing the buffer so that there's space for the data we need to emit. We accidentally got the threshold wrong. The first non-wrappable call beyond (e.g.) STATE_SZ would grow the buffer to floor(1.5 * STATE_SZ), The next call would see we were beyond STATE_SZ and think we needed to grow a second time - when the buffer was already large enough. We still want to flush when we hit STATE_SZ, but for growing, we should use the actual size of the buffer as the threshold. This way, we only grow when actually necessary. v2: Simplify the control flow (suggested by Jordan) Fixes: `2dfc119f22` "i965: Grow the batch/state buffers if we need space and can't flush." Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (cherry picked from commit `ca43616586`)	2017-12-01 17:09:03 +00:00
Kenneth Graunke	09f6bd5ef2	i965: Preserve EXEC_OBJECT_CAPTURE when growing the BO. The original state buffer was marked with EXEC_OBJECT_CAPTURE. When growing it, we want to preserve that flag so we continue to capture it in GPU hang reports. Fixes: `2dfc119f22` "i965: Grow the batch/state buffers if we need space and can't flush." Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (cherry picked from commit `52d32917e1`)	2017-12-01 17:08:55 +00:00
Kenneth Graunke	a49b70d2ec	i965: Use old_bo->align when growing batch/state buffer instead of 4096. The intention here is make the new BO use the same alignment as the old BO. This isn't strictly necessary, but we would have to update the 'alignment' field in the validation list when swapping it out, and we don't bother today. The batch and state buffers use an alignment of 4096, so this should be equivalent - it's just clearer than cut and pasting a magic constant. Fixes: `2dfc119f22` "i965: Grow the batch/state buffers if we need space and can't flush." Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (cherry picked from commit `2af7085460`)	2017-12-01 17:08:50 +00:00
Kenneth Graunke	f1050f0435	i965: Program the dynamic state heap size to MAX_STATE_SIZE. STATE_BASE_ADDRESS specifies a maximum size of the dynamic state section, beyond which data supposedly reads back as 0. On Gen8+, we were programming it to the size of the buffer. This worked fine until we started growing the state buffer in commit `2dfc119f22`. When the state buffer grows, the value in STATE_BASE_ADDRESS becomes too small, and our state beyond STATE_SZ bytes would read back as 0. To avoid having to update the value, we program it to MAX_STATE_SIZE. We used to program the upper bound to the maximum on older hardware anyway, so programming it too large isn't a big deal. Bogus SURFACE_STATE can easily lead to GPU hangs and misrendering. DiRT Rally was hitting the statebuffer growth path, and suffered from bad texture corruption and GPU hangs (usually around the same time). This patch fixes both issues. Fixes: `2dfc119f22` "i965: Grow the batch/state buffers if we need space and can't flush." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103101 Tested-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (cherry picked from commit `cfc5af588c`)	2017-12-01 17:08:03 +00:00
Marek Olšák	14e528b2db	radeonsi/gfx9: fix importing shared textures with DCC VI has 11 dwords at least. GFX9 has 10 dwords. Cc: 17.2 17.3 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit `ed4780383c`) [Emil Velikov: s\|radeon/r600_texture.c\|radeonsi/si_state.c\|] Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Conflicts: src/gallium/drivers/radeon/r600_texture.c	2017-12-01 17:07:20 +00:00
Frank Richter	c846d72523	gallium/wgl: fix default pixel format issue When creating a context without SetPixelFormat() don't blindly take the pixel format reported by GDI. Instead, look for our own closest pixel format. Minor clean-ups added by Brian Paul. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103412 Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Brian Paul <brianp@vmware.com> (cherry picked from commit `bf41b2b262`)	2017-11-29 19:46:17 +00:00
Roland Scheidegger	56993f4b8a	r600: set DX10_CLAMP for compute shader too I really intended to set this for all shader stages by `3835009796` but missed it for compute shaders (because it's in a different source file...). Reviewed-by: Dave Airlie <airlied@redhat.com> (cherry picked from commit `71e630753e`)	2017-11-29 19:45:15 +00:00
Roland Scheidegger	9b2c27a39e	r600: use DX10_CLAMP bit in shader setup The docs are not very concise in what this really does, however both Alex Deucher and Nicolai Hähnle suggested this only really affects instructions using the CLAMP output modifier, and I've confirmed that with the newly changed piglit isinf_and_isnan test. So, with this bit set, if an instruction has the CLAMP modifier bit (which clamps to [0,1]) set, then NaNs will be converted to zero, otherwise the result will be NaN. D3D10 would require this, glsl doesn't have modifiers (with mesa clamp(x,0,1) would get converted to such a modifier) coupled with a whatever-floats-your-boat specified NaN behavior, but the clamp behavior should probably always be used (this also matches what a decomposition into min(1.0, max(x, 0.0)) would do, if min/max also adhere to the ieee spec of picking the non-nan result). Some apps may in fact rely on this, as this prevents misrenderings in This War of Mine since using ieee muls (`ce7a045fee`), without having to use clamped rcp opcode, which would also fix this bug there. radeonsi also seems to set this bit nowadays if I see that righ (albeit the llvm amdgpu code comment now says "Make clamp modifier on NaN input returns 0" instead of "Do not clamp NAN to 0" since it was changed, which also looks a bit misleading). v2: set it in all shader stages. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103544 Reviewed-by: Dave Airlie <airlied@redhat.com> (cherry picked from commit `3835009796`)	2017-11-29 19:45:12 +00:00
Roland Scheidegger	6954eb1a2a	r600: use min_dx10/max_dx10 instead of min/max I believe this is the safe thing to do, especially ever since the driver actually generates NaNs for muls too. The ISA docs are not very helpful here, however the dx10 versions will pick a non-nan result over a NaN one (this is also the ieee754 behavior), whereas the non-dx10 ones will pick the NaN (verified by newly changed piglit isinf-and-isnan test). Other "modern" drivers will most likely do the same. This was shown to make some difference for bug 103544, albeit it is not required to fix it. Reviewed-by: Dave Airlie <airlied@redhat.com> (cherry picked from commit `aab0bfc648`)	2017-11-29 19:44:58 +00:00
Nicolai Hähnle	b79e15b086	glsl: fix interpolateAtXxx(some_vec[idx], ...) with dynamic idx The dynamic index of a vector (not array!) is lowered to a sequence of conditional assignments. However, the interpolate_at_* expressions require that the interpolant is an l-value of a shader input. So instead of doing conditional assignments of parts of the shader input and then interpolating that (which is nonsensical), we interpolate the entire shader input and then do conditional assignments of the interpolated result. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (cherry picked from commit `ca63a5ed3e`)	2017-11-29 19:42:26 +00:00
Nicolai Hähnle	77cba992c3	glsl: allow any l-value of an input variable as interpolant in interpolateAt* The intended rule has been clarified in GLSL 4.60, Section 8.13.2 (Interpolation Functions): "For all of the interpolation functions, interpolant must be an l-value from an in declaration; this can include a variable, a block or structure member, an array element, or some combination of these. Component selection operators (e.g., .xy) may be used when specifying interpolant." For members of interface blocks, var->data.must_be_shader_input must be determined on-the-fly after lowering interface blocks, since we don't want to disable varying packing for an entire block just because one input in it is used in interpolateAt. v2: keep setting must_be_shader_input in ast_function (Ian) v3: follow the relaxed rule of GLSL 4.60 v4: only apply the relaxed rules to desktop GL (the ES WG decided that the relaxed rules may apply in a future version but not retroactively; see also dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101378 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1) Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (cherry picked from commit `4f42450b86`)	2017-11-29 19:42:24 +00:00
Kenneth Graunke	88fd81d3a3	i965: Fix Smooth Point Enables. We want to program the 3DSTATE_RASTER field to the gl_context value, not the other way around. Fixes: `13ac46557a` (i965: Port Gen8+ 3DSTATE_RASTER state to genxml.) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (cherry picked from commit `760e0156df`)	2017-11-29 19:37:02 +00:00
Nicolai Hähnle	f768744970	st_glsl_to_tgsi: check for the tail sentinel in merge_two_dsts This fixes yet another case where DFRACEXP has only one destination. Found by address sanitizer. Fixes tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-frexp-dvec4-only-mantissa.shader_test Fixes: `3b666aa747` ("st/glsl_to_tgsi: fix DFRACEXP with only one destination") Acked-by: Marek Olšák <marek.olsak@amd.com> (cherry picked from commit `7e35bdad1c`)	2017-11-29 19:36:24 +00:00
Marek Olšák	1e908f5035	radeonsi: fix layered DCC fast clear Cc: 17.2 17.3 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit `6863651bbd`)	2017-11-29 19:35:16 +00:00
Dave Airlie	9777d08e57	r600/sb: handle jump after target to end of program. (v2) This fixes hangs on cayman with tests/spec/arb_tessellation_shader/execution/trivial-tess-gs_no-gs-inputs.shader_test This has a single if/else in it, and when this peephole activated, it would set the jump target to NULL if there was no instruction after the final POP. This adds a NOP if we get a jump in this case, and seems to fix the hangs, so we have a valid target for the ELSE instruction to go to, instead of 0 (which causes infinite loops). v2: update last_cf correctly. (I had some other patches hide this) Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Dave Airlie <airlied@redhat.com> (cherry picked from commit `579ec9c311`)	2017-11-29 19:34:54 +00:00
Tapani Pälli	a34ad6f363	mesa/gles: adjust internal format in glTexSubImage2D error checks When floating point textures are created on OpenGL ES 2.0, driver is free to choose used internal format. Mesa makes this decision in adjust_for_oes_float_texture. Error checking for glTexImage2D properly checks that sized formats are not used. We use same error checking path for glTexSubImage2D (since there is lot of overlap), however since those checks include internalFormat checks, we need to pass original internalFormat passed by the client. Patch adds oes_float_internal_format that does reverse adjust_for_oes_float_texture to get that format. Fixes following test failure: ES2-CTS.gtf.GL2ExtensionTests.texture_float.texture_float (when running test with MESA_GLES_VERSION_OVERRIDE=2.0) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103227 Cc: "17.3" <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com> (cherry picked from commit `1e508e10d9`)	2017-11-29 19:33:45 +00:00
Emil Velikov	4bbc0f366a	gl_table.py: add extern C guard for the generated glapitable.h The header can be included from C++, hence contents should have appropriate notation. Cc: mesa-stable@lists.freedesktop.org Cc: Dylan Baker <dylan@pnwbakers.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (cherry picked from commit `c7616ac069`)	2017-11-27 19:26:16 +00:00
Eduardo Lima Mitev	86b35a9901	glsl/linker: Check that re-declared, inter-shader built-in blocks match >From GLSL 4.5 spec, section "7.1 Built-In Language Variables", page 130 of the PDF states: "If multiple shaders using members of a built-in block belonging to the same interface are linked together in the same program, they must all redeclare the built-in block in the same way, as described in section 4.3.9 “Interface Blocks” for interface-block matching, or a link-time error will result." Fixes: * GL45-CTS.CommonBugs.CommonBug_PerVertexValidation v2 (Neil Roberts): Explicitly look for gl_PerVertex in the symbol tables instead of waiting to find a variable in the interface. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102677 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Eduardo Lima Mitev <elima@igalia.com> Signed-off-by: Neil Roberts <nroberts@igalia.com> (cherry picked from commit `f9de7f5596`)	2017-11-27 19:21:15 +00:00
Eduardo Lima Mitev	f34c7ba4e1	glsl: Use the utility function to copy symbols between symbol tables This effectively factorizes a couple of similar routines. v2 (Neil Roberts): Non-trivial rebase on master Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Eduardo Lima Mitev <elima@igalia.com> Signed-off-by: Neil Roberts <nroberts@igalia.com> (cherry picked from commit `f5fe99ac85`)	2017-11-27 19:21:04 +00:00
Eduardo Lima Mitev	ebb7ccb306	glsl_parser_extra: Add utility to copy symbols between symbol tables Some symbols gathered in the symbols table during parsing are needed later for the compile and link stages, so they are moved along the process. Currently, only functions and non-temporary variables are copied between symbol tables. However, the built-in gl_PerVertex interface blocks are also needed during the linking stage (the last step), to match re-declared blocks of inter-stage shaders. This patch adds a new utility function that will factorize current code that copies functions and variables between two symbol tables, and in addition will copy explicitly declared gl_PerVertex blocks too. The function will be used in a subsequent patch. v2 (Neil Roberts): Allow the src symbol table to be NULL and explicitly copy the gl_PerVertex symbols in case they are not referenced in the exec_list. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Eduardo Lima Mitev <elima@igalia.com> Signed-off-by: Neil Roberts <nroberts@igalia.com> (cherry picked from commit `4c62a270a9`)	2017-11-27 19:20:47 +00:00
Matt Turner	e4d964670a	util: Fix disk_cache index calculation on big endian The cache-test test program attempts to create a collision (using key_a and key_a_collide) by making the first two bytes identical. The idea is fine -- the shader cache wants to use the first four characters of a SHA1 hex digest as the index. The following program unsigned char array[4] = {1, 2, 3, 4}; int ptr = (int )array; for (int i = 0; i < 4; i++) { printf("%02x", array[i]); } printf("\n"); printf("%08x\n", *ptr); prints 01020304 04030201 on little endian, and 01020304 01020304 on big endian. On big endian platforms reading the character array back as an int (as is done in disk_cache.c) does not yield the same results as reading the byte array. To get the first four characters of the SHA1 hex digest when we mask with CACHE_INDEX_KEY_MASK, we need to byte swap the int on big endian platforms. Bugzilla: https://bugs.freedesktop.org/103668 Bugzilla: https://bugs.gentoo.org/637060 Bugzilla: https://bugs.gentoo.org/636326 Fixes: `87ab26b2ab` ("glsl: Add initial functions to implement an on-disk cache") Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (cherry picked from commit `c690a7a8cd`)	2017-11-27 18:33:40 +00:00
Matt Turner	bb8431aa3e	util: Fix SHA1 implementation on big endian The code defines a macro blk0(i) based on the preprocessor condition BYTE_ORDER == LITTLE_ENDIAN. If true, blk0(i) is defined as a byte swap operation. Unfortunately, if the preprocessor macros used in the test are no defined, then the comparison becomes 0 == 0 and it evaluates as true. Fixes: `d1efa09d34` ("util: import sha1 implementation from OpenBSD") Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (cherry picked from commit `532674303a`)	2017-11-27 18:33:27 +00:00
Matt Turner	a05879c982	i965/fs: Handle negating immediates on MADs when propagating saturates MADs don't take immediate sources, but we allow them in the IR since it simplifies a lot of things. I neglected to consider that case. Fixes: `4009a9ead4` ("i965/fs: Allow saturate propagation to propagate negations into MADs.") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103616 Reported-and-Tested-by: Ruslan Kabatsayev <b7.10110111@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (cherry picked from commit `a05af1f7b8`)	2017-11-24 18:48:33 +00:00
Nicolai Hähnle	3e639156b8	ddebug: fix use-after-free of streamout targets Fixes: `b47727a83a` ("ddebug: implement pipelined hang detection mode") Reviewed-by: Marek Olšák <marek.olsak@amd.com> (cherry picked from commit `16f8da2997`)	2017-11-24 18:44:48 +00:00
Nicolai Hähnle	e7904e1275	radeonsi/gfx9: fix VM fault with fetched instance divisors We need to account for SGPR locations in merged shaders. This case is exercised by KHR-GL45.enhanced_layouts.vertex_attrib_locations Fixes: `79c2e7388c` ("radeonsi/gfx9: use SPI_SHADER_USER_DATA_COMMON") Reviewed-by: Marek Olšák <marek.olsak@amd.com> (cherry picked from commit `df5ebe0c26`)	2017-11-24 18:44:31 +00:00
George Barrett	210bbf948e	glsl: Catch subscripted calls to undeclared subroutines generate_array_index fails to check whether the target of a subroutine call exists in the AST, potentially passing around null ir_rvalue pointers eventuating in abort/segfault. Fixes: `fd01840c0b` ("glsl: add AoA support to subroutines") Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100438 (cherry picked from commit `f09c2cefdd`)	2017-11-24 18:43:33 +00:00
Gert Wollny	9ffe450dab	r600: Emit EOP for more CF instruction types So far on pre-cayman chipsets the CF instructions CF_OP_LOOP_END, CF_OP_CALL_FS, CF_OP_POP, and CF_OP_GDS an extra CF_NOP instruction was added to add the EOP flag, even though this is not actually needed, because all these instrutions support the EOP flag. This patch removes the fixup code, adds setting the EOP flag for the according instructions as well as others like CF_OP_TEX and CF_OP_VTX, and adds writing out EOP for this type of instruction in the disassembler. This also fixes a bug where shaders were created that didn't actually have the EOP flag set in the last CF instruction, which might have resulted in GPU lockups. [airlied: cleaned up a little] Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Dave Airlie <airlied@redhat.com> (cherry picked from commit `1d076aafbc`)	2017-11-24 18:42:10 +00:00
Jason Ekstrand	2859a8f298	i965: Mark BOs as external when we export their handle Almost all of our BO export paths were already properly marked the BO as external and added it to the handle table. Most export use-cases go through a prime fd or flink where we have a brw_bo export helper that does the right thing. The one missing one happens when you call queryImage and ask for __DRI_IMAGE_ATTRIB_HANDLE. We just grabbed the gem handle out of the BO (because it's really easy to do that) and handed it off to the client; what could go wrong? As it turns out, this path is used by basically every compositor that wants to turn around and call drmModeAddFB2 on it so it can hand it off to display. The result, as of `4b1e70cc57`, is that we no longer set MOCS_PTE on those surfaces and the kernel's attempts to disable caching fail and we scanout gets corruption. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103759 Fixes: `4b1e70cc57` Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit `0a6a137eb2`)	2017-11-24 18:40:14 +00:00
Jason Ekstrand	0904becf94	i965/bufmgr: Add a helper to mark a BO as external Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit `344252a27f`)	2017-11-24 18:39:20 +00:00
Kenneth Graunke	7bc213a644	i965: Revert Gen8 aspect of VF PIPE_CONTROL workaround. This apparently causes hangs on Broadwell, so let's back it out for now. I think there are other PIPE_CONTROL workarounds that we're missing. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103787 (cherry picked from commit `a01ba366e0`)	2017-11-18 00:42:15 +00:00
Jason Ekstrand	093ae29b3c	anv/cmd_buffer: Take bo_offset into account in fast clear state addresses Otherwise, if the image is not bound to the start of the buffer, we're going to be reading and writing its fast clear state in the wrong spot. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit `a07f7b2619`)	2017-11-17 22:52:52 +00:00
Jason Ekstrand	d2d5439412	anv/cmd_buffer: Advance the address when initializing clear colors Found by inspection Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit `a6cc361e5f`)	2017-11-17 22:52:52 +00:00
Anuj Phogat	b3bc46f1c7	i965/gen8+: Fix the number of dwords programmed in MI_FLUSH_DW Number of dwords in MI_FLUSH_DW changed from 4 to 5 in gen8+. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Cc: <mesa-stable@lists.freedesktop.org> (cherry picked from commit `1dc45d75bb`) [Emil Velikov: trivial conflicts] Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Conflicts: src/mesa/drivers/dri/i965/intel_blit.c	2017-11-17 22:52:52 +00:00
Anuj Phogat	bf0c7200bd	i965: Program DWord Length in MI_FLUSH_DW Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Cc: <mesa-stable@lists.freedesktop.org> (cherry picked from commit `6165fda59b`) Squashed with: i965: Remove DWord length from MI_FLUSH_DW definition Fixes: `6165fda59b` ("i965: Program DWord Length in MI_FLUSH_DW") Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (cherry picked from commit `822fd2341d`)	2017-11-17 22:52:40 +00:00
Matt Turner	55c4921326	Revert "intel/fs: Use a pure vertical stride for large register strides" This reverts commit `e8c9e65185`. With the actual bug fixed (by commit `6ac2d16901`), this is not necessary. I'm doubtful of its correctness in any case. (cherry picked from commit `a31d038208`)	2017-11-17 19:24:29 +00:00
Matt Turner	78a7e2a2d4	i965/fs: Split all 32->64-bit MOVs on CHV, BXT, GLK Fixes the following tests on CHV, BXT, and GLK: KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint32_to_int64 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103115 (cherry picked from commit `cfcfa0b9cd`)	2017-11-17 19:24:29 +00:00
Matt Turner	3be7bb6741	i965/fs: Fix extract_i8/u8 to a 64-bit destination The MOV instruction can extract bytes to words/double words, and words/double words to quadwords, but not byte to quadwords. For unsigned byte to quadword, we can read them as words and AND off the high byte and extract to quadword in one instruction. For signed bytes, we need to first sign extend to word and the sign extend that word to a quadword. Fixes the following test on CHV, BXT, and GLK: KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103628 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (cherry picked from commit `6ac2d16901`)	2017-11-17 19:24:29 +00:00
Nicolai Hähnle	f539ea0e8b	tgsi/exec: fix LDEXP in softpipe Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103128 Fixes: `cad959d901` ("gallium: add LDEXP TGSI instruction and corresponding cap") Reviewed-by: Brian Paul <brianp@vmware.com> (cherry picked from commit `f3fa3b0d95`)	2017-11-17 19:24:29 +00:00
Derek Foreman	e4f186d3ae	egl/wayland: Add a fallback when fourcc query isn't supported When queryImage doesn't support __DRI_IMAGE_ATTRIB_FOURCC wayland clients will die with a NULL derefence in wl_proxy_add_listener. Attempt to provide a simple fallback to keep ancient systems working. Fixes: `6595c69951` ("egl/wayland: Remove more surface specifics from create_wl_buffer") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103519 Signed-off-by: Derek Foreman <derekf@osg.samsung.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (cherry picked from commit `0db36caa19`) Squashed with: egl: fix var type queryImage() takes an `int*`; compiler is warning about the signed<->unsigned pointer mismatch. Fixes: `0db36caa19` "egl/wayland: Add a fallback when fourcc query isn't supported" Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Frank Binns <frank.binns@imgtec.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Derek Foreman <derekf@osg.samsung.com> (cherry picked from commit `ca95d7ad4e`)	2017-11-17 19:24:29 +00:00
Bas Nieuwenhuizen	8269b7ec4b	radv: Free temporary syncobj after waiting on it. Otherwise we leak it. Fixes: `eaa56eab6d` "radv: initial support for shared semaphores (v2)" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (cherry picked from commit `7c25578863`)	2017-11-17 19:24:29 +00:00
Bas Nieuwenhuizen	577af89bd1	radv: Free syncobj with multiple imports. Otherwise we can leak the old syncobj. Fixes: `eaa56eab6d` "radv: initial support for shared semaphores (v2)" Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (cherry picked from commit `917d3b43f2`)	2017-11-17 19:24:29 +00:00
Thomas Hellstrom	040c0df11d	loader/dri3: Improve dri3 thread-safety It turned out that with recent changes that call into dri3 from glFinish(), it appears like different thread end up waiting for X events simultaneously, causing deadlocks since they steal events from eachoter and update the dri3 counters behind eachothers backs. This patch intends to improve on that. It allows at most one thread at a time to wait on events for a single drawable. If another thread intends to do the same, it's put to sleep until the first thread finishes waiting, and then it rechecks counters and optionally retries the waiting. Threads that poll for X events never pulls X events off the event queue if there are other threads waiting for events on that drawable. Counters in the dri3 drawable structure are protected by a mutex. Finally, the mutex we introduce is never held while waiting for the X server to avoid unnecessary stalls. This does not make dri3 drawables completely thread-safe but at least it's a first step. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102358 Fixes: `d5ba75f888` "st/dri2 Plumb the flush_swapbuffer functionality through to dri3" Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit `54a58b2856`)	2017-11-17 19:24:29 +00:00
Kenneth Graunke	699ff16e54	intel/tools: Fix detection of enabled shader stages. We renamed "Function Enable" to "Enable", which broke our detection of whether shaders are enabled or not. So, we'd see a bunch of HS/DS packets with program offsets of 0, and think that was a valid TCS/TES. Fixes: `c032cae9ff` (genxml: Rename "Function Enable" to "Enable".) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (cherry picked from commit `9a0465b3a3`)	2017-11-17 19:24:29 +00:00
Kenneth Graunke	c2d020336c	i965: Upload invariant state once at the start of the batch on Gen4-5. We want to emit invariant state at the start of a render batch. In the past, this more or less happened: a new batch flagged BRW_NEW_CONTEXT (because we don't have hardware contexts), which triggered the brw_invariant_state atom. So, it would be emitted before any 3D drawing. (Technically, there might be some BLT commands in the batch because Gen4-5 have a single combined render/BLT ring, but that should be harmless). With the advent of BLORP, this broke. The first item in a batch might be a BLORP operation, which bypasses the normal draw upload path. So, we need to ensure invariant state happens first. To do that, we just upload it when creating a new batch. On Gen6+ we'd need to worry about whether it's a RENDER or BLT batch, but because we have a combined ring, this approach should work fine on Gen4-5. Seems to fix GPU hangs when playing hardware accelerated video with mpv -hwdec=vaapi on Ironlake. Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103529 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (cherry picked from commit `8f91aa35a5`)	2017-11-17 19:24:29 +00:00
Kenneth Graunke	8ed01c0a57	i965: Implement another VF cache invalidate workaround on Gen8+. ...and provide a better citation for the existing one. v2: - Apply the workaround to Gen8 too, as intended (caught by Topi). - Restructure to add bits instead of an extra flush (based on a similar patch by Rafael Antognolli). Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> (cherry picked from commit `8d48671492`)	2017-11-17 19:24:29 +00:00
Tim Rowley	957c66de1c	swr/rast: Faster emulated simd16 permute Speed up simd16 frontend (default) on avx/avx2 platforms; fixes performance regression caused by switch to simdlib. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com> Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit `d8489517a5`)	2017-11-17 19:24:29 +00:00
Tim Rowley	c798200543	swr/rast: Use gather instruction for i32gather_ps on simd16/avx512 Speed up avx512 platforms; fixes performance regression caused by swithc to simdlib. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com> Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit `439904847e`)	2017-11-17 19:24:29 +00:00
Jason Ekstrand	f3caa303cf	i965: Add stencil buffers to cache set regardless of stencil texturing We may access them as a texture using blorp regardless of whether or not stencil texturing is enabled. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit `6830ba0d3b`)	2017-11-17 19:24:29 +00:00
Jason Ekstrand	fdd99c97ec	i965: Use PTE MOCS for all external buffers We were already using PTE for all render targets in case one happened to get scanned out. However, this still wasn't 100% correct because there are still possibly cases where we may want to texture from an external buffer even though we don't know the caching mode. This can happen, for instance, on buffers imported from another GPU via prime. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101691 Cc: "17.3" <mesa-stable@lists.freedesktop.org> Tested-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (cherry picked from commit `d7a19d69eb`)	2017-11-17 19:24:29 +00:00
Jason Ekstrand	a9bc277482	intel/blorp: Make the MOCS setting part of blorp_address This makes our MOCS settings significantly more flexible. Cc: "17.3" <mesa-stable@lists.freedesktop.org> Tested-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (cherry picked from commit `bc933d0e84`)	2017-11-17 19:24:29 +00:00

1 2 3 4 5 ...

89311 commits