fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-05 09:38:07 +02:00

Author	SHA1	Message	Date
Roland Scheidegger	65123ee62c	r600: set the number type correctly for float rts in cb setup Float rts were always set as unorm instead of float. Not sure of the consequences, but at least it looks like the blend clamp would have been enabled, which is against the rules (only eg really bothered to even attempt to specify this correctly, r600 always used clamp anyway). Albeit r600 (not r700) setup still looks bugged to me due to never setting BLEND_FLOAT32 which must be set according to docs... Not sure if the hw really cares, no piglit change (on eg/juniper). Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-11-15 03:13:46 +01:00
Roland Scheidegger	570d5b7992	r600: use ieee version of rsq Both r600 and evergreen used the clamped version, whereas cayman used the ieee one. I don't think there's a valid reason for this discrepancy, so let's switch to the ieee version for r600 and evergreen too, since we generally want to stick to ieee arithmetic. With this, behavior for both rcp and rsq should now be the same for all of r600, eg, cm, all using ieee versions (albeit note rsq retains the abs behavior for everybody, which may not be a good idea ultimately). Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-11-15 03:13:46 +01:00
Roland Scheidegger	1c8d57a008	r600: use ieee version of rcp r600 used the clamped version for rcp, whereas both evergreen and cayman used the ieee version. I don't know why that discrepancy exists (it does so since day 1) but there does not seem to be a valid reason for this, so make it consistent. This seems now safer than before the previous commit (using the dx10 clamp bit). Note that rsq still uses clamped version (as before even though the table may have suggested otherwise for evergreen) for r600/eg, but not for cayman. Will be changed separately for better regression tracking... Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-11-15 03:13:46 +01:00
Roland Scheidegger	3835009796	r600: use DX10_CLAMP bit in shader setup The docs are not very concise in what this really does, however both Alex Deucher and Nicolai Hähnle suggested this only really affects instructions using the CLAMP output modifier, and I've confirmed that with the newly changed piglit isinf_and_isnan test. So, with this bit set, if an instruction has the CLAMP modifier bit (which clamps to [0,1]) set, then NaNs will be converted to zero, otherwise the result will be NaN. D3D10 would require this, glsl doesn't have modifiers (with mesa clamp(x,0,1) would get converted to such a modifier) coupled with a whatever-floats-your-boat specified NaN behavior, but the clamp behavior should probably always be used (this also matches what a decomposition into min(1.0, max(x, 0.0)) would do, if min/max also adhere to the ieee spec of picking the non-nan result). Some apps may in fact rely on this, as this prevents misrenderings in This War of Mine since using ieee muls (`ce7a045fee`), without having to use clamped rcp opcode, which would also fix this bug there. radeonsi also seems to set this bit nowadays if I see that righ (albeit the llvm amdgpu code comment now says "Make clamp modifier on NaN input returns 0" instead of "Do not clamp NAN to 0" since it was changed, which also looks a bit misleading). v2: set it in all shader stages. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103544 Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-11-15 03:13:46 +01:00
Roland Scheidegger	aab0bfc648	r600: use min_dx10/max_dx10 instead of min/max I believe this is the safe thing to do, especially ever since the driver actually generates NaNs for muls too. The ISA docs are not very helpful here, however the dx10 versions will pick a non-nan result over a NaN one (this is also the ieee754 behavior), whereas the non-dx10 ones will pick the NaN (verified by newly changed piglit isinf-and-isnan test). Other "modern" drivers will most likely do the same. This was shown to make some difference for bug 103544, albeit it is not required to fix it. Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-11-15 03:13:46 +01:00
Dave Airlie	3ceee04a4f	r600: fix cubemap arrays A lot of cubemap array piglits fail, port the texture type picking code from radeonsi which seems to fix most of them. For images I will port the rest of the code. Fixes: getteximage-depth gl_texture_cube_map_array-* fbo-generatemipmap-cubemap array getteximage-targets cube_array amongst others. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-11-15 11:26:11 +10:00
Rob Clark	7676e71113	freedreno/a5xx: small comment fix Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-11-14 18:12:47 -05:00
Rob Clark	d27318bdd0	freedreno/a5xx: indirect draw support A couple failures in piglit tests w/ TF or gl_VertexID + indirect draws. OTOH all the deqp tests (although they don't test those combinations). I suspect this could be fixed by a firmware update, but I don't think there is much we can do in mesa for that. Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-11-14 18:10:58 -05:00
Rob Clark	f383cf9d41	freedreno/a5xx: split out helper for pipeline stalls We need a similar thing for indirect draws. Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-11-14 18:10:51 -05:00
Rob Clark	d74029bddc	freedreno: update generated headers Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-11-14 18:10:43 -05:00
Timothy Arceri	5041ea96a0	gallium/radeon: disable the cache when nir backend enabled Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-11-15 08:47:31 +11:00
Timothy Arceri	7273e9820e	st/glsl_to_tgsi: use tgsi_get_gl_varying_semantic() for gs/tes outputs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-15 08:26:34 +11:00
Timothy Arceri	bc308122cc	gallium/tgsi: add tess output supoort to tgsi_get_gl_varying_semantic() Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-15 08:26:34 +11:00
Timothy Arceri	4ae9f0b580	st/glsl_to_tgsi: make use of tgsi_get_gl_varying_semantic() Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-15 08:26:34 +11:00
Timothy Arceri	3d21eb3b7d	gallium/tgsi: add prim id to tgsi_get_gl_varying_semantic() Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-15 08:26:34 +11:00
Anuj Phogat	fc59546e9a	i965: Make use of brw_load_register_imm32() helper function Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Cc: Nanley Chery <nanley.g.chery@intel.com>	2017-11-14 13:23:18 -08:00
Anuj Phogat	1dc45d75bb	i965/gen8+: Fix the number of dwords programmed in MI_FLUSH_DW Number of dwords in MI_FLUSH_DW changed from 4 to 5 in gen8+. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Cc: <mesa-stable@lists.freedesktop.org>	2017-11-14 13:23:18 -08:00
Anuj Phogat	6165fda59b	i965: Program DWord Length in MI_FLUSH_DW Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Cc: <mesa-stable@lists.freedesktop.org>	2017-11-14 13:23:18 -08:00
Anuj Phogat	5d8164c428	anv/gen10: Enable float blend optimization Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2017-11-14 13:23:18 -08:00
Anuj Phogat	72a239266b	intel/genxml: Add Cache Mode SubSlice Register to gen10.xml Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2017-11-14 13:23:18 -08:00
Anuj Phogat	aacf1943c0	anv/gen10: Implement WaSampleOffsetIZ workaround We already have this workaround in OpenGL driver. See Mesa commit `3cf4fe2219`. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Cc: Nanley Chery <nanley.g.chery@intel.com> Cc: Rafael Antognolli <rafael.antognolli@intel.com>	2017-11-14 13:23:18 -08:00
Andres Rodriguez	20e8dfcca9	mesa/st: add missing copyright headers to memoryobjects files Signed-off-by: Andres Rodriguez <andresx7@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-14 11:32:44 -08:00
Andres Rodriguez	60baf1a962	mesa: minor tidy up for memory object error strings Signed-off-by: Andres Rodriguez <andresx7@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-14 11:31:49 -08:00
Andres Rodriguez	f7580e7204	broadcom/vc4: fix indentation in vc4_screen.c Stumbled into this when adding a new PIPE_CAP. Signed-off-by: Andres Rodriguez <andresx7@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2017-11-14 11:31:36 -08:00
Matt Turner	a31d038208	Revert "intel/fs: Use a pure vertical stride for large register strides" This reverts commit `e8c9e65185`. With the actual bug fixed (by commit `6ac2d16901`), this is not necessary. I'm doubtful of its correctness in any case.	2017-11-14 11:24:08 -08:00
Matt Turner	6ac2d16901	i965/fs: Fix extract_i8/u8 to a 64-bit destination The MOV instruction can extract bytes to words/double words, and words/double words to quadwords, but not byte to quadwords. For unsigned byte to quadword, we can read them as words and AND off the high byte and extract to quadword in one instruction. For signed bytes, we need to first sign extend to word and the sign extend that word to a quadword. Fixes the following test on CHV, BXT, and GLK: KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103628 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-11-14 10:56:18 -08:00
Matt Turner	cfcfa0b9cd	i965/fs: Split all 32->64-bit MOVs on CHV, BXT, GLK Fixes the following tests on CHV, BXT, and GLK: KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint32_to_int64 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103115	2017-11-14 10:56:18 -08:00
Tim Rowley	d8489517a5	swr/rast: Faster emulated simd16 permute Speed up simd16 frontend (default) on avx/avx2 platforms; fixes performance regression caused by switch to simdlib. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-14 11:40:19 -06:00
Tim Rowley	439904847e	swr/rast: Use gather instruction for i32gather_ps on simd16/avx512 Speed up avx512 platforms; fixes performance regression caused by swithc to simdlib. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-14 11:39:02 -06:00
Derek Foreman	0db36caa19	egl/wayland: Add a fallback when fourcc query isn't supported When queryImage doesn't support __DRI_IMAGE_ATTRIB_FOURCC wayland clients will die with a NULL derefence in wl_proxy_add_listener. Attempt to provide a simple fallback to keep ancient systems working. Fixes: `6595c69951` ("egl/wayland: Remove more surface specifics from create_wl_buffer") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103519 Signed-off-by: Derek Foreman <derekf@osg.samsung.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>	2017-11-14 15:38:43 +00:00
Marek Olšák	89e669d2fd	radeonsi: remove has_cp_dma, has_streamout flags (v2) v2: remove r600_can_dma_copy_buffer	2017-11-14 15:24:50 +01:00
Julien Isorce	b904ad7d21	i965: implement (un)mapImage Already implemented for Gallium drivers. Useful for gbm_bo_(un)map. Tests: By porting wayland/weston/clients/simple-dmabuf-drm.c to GBM. kmscube --mode=rgba kmscube --mode=nv12-1img kmscube --mode=nv12-2img piglit ext_image_dma_buf_import-refcount -auto piglit ext_image_dma_buf_import-transcode-nv12-as-r8-gr88 -auto piglit ext_image_dma_buf_import-sample_rgb -fmt=XR24 -alpha-one -auto piglit ext_image_dma_buf_import-sample_rgb -fmt=AR24 -auto piglit ext_image_dma_buf_import-sample_yuv -fmt=NV12 -auto piglit ext_image_dma_buf_import-sample_yuv -fmt=YU12 -auto piglit ext_image_dma_buf_import-sample_yuv -fmt=YV12 -auto v2: add early return if (flag & MAP_INTERNAL_MASK) v3: take input rect into account and test with kmscube and piglit. v4: handle wraparound and bo reference. v5: indent, exclude 0 width and height on the boundary, map bo independently of the image. Signed-off-by: Julien Isorce <jisorce@oblong.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-11-14 14:23:13 +00:00
Samuel Pitoiset	8a7d4092d2	radv: force enable LLVM sisched for The Talos Principle It seems safe and it improves performance by +4% (73->76). A drirc based solution is not what we want for now, keep it simple and improve later if it's really needed. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-11-14 15:21:50 +01:00
Samuel Pitoiset	ecabe2280c	radv: add nosisched debug option Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-11-14 15:21:48 +01:00
Alejandro Piñeiro	b498172d0e	spirv: fix typo on DO NOT EDIT header Introduced on commit `157c9a1341` Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-14 13:07:36 +01:00
Jon Turney	7df9a3609a	meson: if dep_dl is an empty list, it's not a dependency object It's ok to use an empty list for dependencies:, but it's not ok to try to use the found() method of it. See also https://github.com/mesonbuild/meson/issues/2324 Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2017-11-14 12:00:25 +00:00
Bas Nieuwenhuizen	7c25578863	radv: Free temporary syncobj after waiting on it. Otherwise we leak it. Fixes: `eaa56eab6d` "radv: initial support for shared semaphores (v2)" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-11-14 10:03:02 +01:00
Bas Nieuwenhuizen	917d3b43f2	radv: Free syncobj with multiple imports. Otherwise we can leak the old syncobj. Fixes: `eaa56eab6d` "radv: initial support for shared semaphores (v2)" Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-11-14 10:03:02 +01:00
Jason Ekstrand	fb0e9b5197	i965: Track the depth and render caches separately Previously, we just had one hash set for tracking depth and render caches called brw_context::render_cache. This is less than ideal because the depth and render caches are separate and we can't track moves between the depth and the render caches. This limitation led to some unnecessary flushing around the depth cache. There are cases (mostly with BLORP) where we can end up touching a depth or stencil buffer through the render cache. To guard against this, blorp would unconditionally do a render_cache_set_check_flush on it's destination which meant that if you did any rendering (including a BLORP operation) to a given surface and then used it as a blorp destination, you would end up flushing it out of the render cache before rendering into it. Things get worse when you dig into the depth/stencil state code for regular GL draw calls. Because we may end up rendering to a depth or stencil buffer via BLORP, we did a render_cache_set_check_flush on all depth and stencil buffers in brw_emit_depthbuffer to ensure that they got flushed out of the render cache prior to using them for depth or stencil testing. However, because we also need to track dirtiness for depth and stencil so that we can implement depth and stencil texturing correctly, we were adding all depth and stencil buffers to the render cache set in brw_postdraw_set_buffers_need_resolve. This meant that, if anything caused 3DSTATE_DEPTH_BUFFER to get re-emitted (currently _NEW_BUFFERS, BRW_NEW_BATCH, and BRW_NEW_BLORP), we would almost always do a full pipeline stall and render/depth cache flush. The root cause of both of these problems is that we can't tell the difference between the render and depth caches in our tracking. This commit splits our cache tracking into two sets, one for render and one for depth, and properly handles transitioning between the two. We still flush all the caches whenever anything needs to be flushed. The idea is that if we're going to take the hit of a flush and stall, we may as well flush everything in the hopes that we can avoid a flush by something else later. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-13 21:51:59 -08:00
Jason Ekstrand	d6d0ac95d5	i965/blorp: Add more destination flushing Right now we just always flush the destination for render and aren't particularly careful about depth or stencil. Soon, flush_for_render isn't going to do the same thing as flush_for_depth and we may be doing a good deal less depth flushing so we should be a bit more precise. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-13 21:51:59 -08:00
Jason Ekstrand	4a09070295	i965: Add more precise cache tracking helpers In theory, this will let us track the depth and render caches separately. Right now, they're just wrappers around brw_render_cache_set_* Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-13 21:51:59 -08:00
Jason Ekstrand	6830ba0d3b	i965: Add stencil buffers to cache set regardless of stencil texturing We may access them as a texture using blorp regardless of whether or not stencil texturing is enabled. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: mesa-stable@lists.freedesktop.org	2017-11-13 21:51:59 -08:00
Jason Ekstrand	4b1e70cc57	i965: Switch over to fully external-or-not MOCS scheme Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-13 21:35:52 -08:00
Jason Ekstrand	d7a19d69eb	i965: Use PTE MOCS for all external buffers We were already using PTE for all render targets in case one happened to get scanned out. However, this still wasn't 100% correct because there are still possibly cases where we may want to texture from an external buffer even though we don't know the caching mode. This can happen, for instance, on buffers imported from another GPU via prime. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101691 Cc: "17.3" <mesa-stable@lists.freedesktop.org> Tested-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-13 21:35:44 -08:00
Jason Ekstrand	bc933d0e84	intel/blorp: Make the MOCS setting part of blorp_address This makes our MOCS settings significantly more flexible. Cc: "17.3" <mesa-stable@lists.freedesktop.org> Tested-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-13 19:40:10 -08:00
Jason Ekstrand	deec84fd77	anv/blorp: Add a device parameter to blorp_surf_for_anv_image Cc: "17.3" <mesa-stable@lists.freedesktop.org> Tested-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-13 19:40:09 -08:00
Jason Ekstrand	4639cc716e	intel/blorp: Use mocs.tex for depth stencil Cc: "17.3" <mesa-stable@lists.freedesktop.org> Tested-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-13 19:39:57 -08:00
Kenneth Graunke	866158b4b6	intel/tools/error: Decode compute shaders. This is a bit more annoying than your average shader - we need to look at MEDIA_INTERFACE_DESCRIPTOR_LOAD in the batch buffer, then hop over to the dynamic state buffer to read the INTERFACE_DESCRIPTOR_DATA, then hop over to the instruction buffer to decode the program. Now that we store all the buffers before decoding, we can actually do this fairly easily. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-13 17:11:02 -08:00
Kenneth Graunke	7049c38655	intel/tools/error: Use do-while for field iterator loops. while loops skip the first field of the instruction/structure, which is not what the code intended. It works out because the field we're looking for doesn't happen to be first, but we ought to do it right regardless. Found while writing the next patch, where Kernel Start Pointer is the first field of INTERFACE_DESCRIPTOR_DATA. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-13 17:11:02 -08:00
Kenneth Graunke	8b749ee0ea	intel/tools/error: Decode shaders while decoding batch commands. This makes aubinator_error_decode's shader dumping work like aubinator. Instead of printing them after the fact, it prints them right inside the 3DSTATE_VS/HS/DS/GS/PS packet that references them. This saves you the effort of cross-referencing things and jumping back and forth. It also reduces a bunch of book-keeping, and eliminates the limitation that we could only handle 4096 programs. That code was also broken and failed to print any shaders if there were under 4096 programs. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-13 17:11:02 -08:00

1 2 3 4 5 ...

97745 commits