fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-06-01 07:08:18 +02:00

Author	SHA1	Message	Date
Samuel Pitoiset	4aaacd6dd0	gm107/ir: add emission for SUSTx and SULDx Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-07-20 11:11:21 +02:00
Samuel Pitoiset	e14cb05ce1	gm107/ra: fix constraints for surface operations Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-07-20 11:11:16 +02:00
Samuel Pitoiset	c68989b2c8	gm107/ir: lower surface operations Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-07-20 11:11:12 +02:00
Samuel Pitoiset	2ae4b5d622	nvc0: bind images for 3d/cp shaders on GM107+ On Maxwell, images binding is slightly different (and much better) regarding Fermi and Kepler because a texture view needs to be uploaded for each image and this is going to simplify the thing a lot. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-07-20 11:11:03 +02:00
Samuel Pitoiset	1da704a94c	nvc0: increase the tex handles area size in the driver cb Currently, we can store 32 tex handles of 32-bits integer each and that fits perfectly with the underlying hardware except on GM107+ which requires to upload a texture view for each images. This patch increases the number of storable texture handles in the driver constant buffer from 32 to 40 because we expose 8 images. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-07-20 11:10:56 +02:00
Marek Olšák	0ab47146c9	winsys/amdgpu: use pb_cache buckets for fewer pb_cache misses Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-19 23:45:06 +02:00
Marek Olšák	dea6fdadca	winsys/radeon: use pb_cache buckets for fewer pb_cache misses This makes Bioshock Infinite with deferred flushing 2.2% faster. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-19 23:45:06 +02:00
Marek Olšák	8d5944199d	gallium/pb_cache: reduce the number of pointer dereferences Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-19 23:45:06 +02:00
Marek Olšák	3cdc0e133f	gallium/pb_cache: divide the cache into buckets for reducing cache misses Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-19 23:45:06 +02:00
Marek Olšák	fec7f74129	gallium/pb_cache: check parameters that are more likely to fail first This makes Bioshock Infinite with deferred flushing 2% faster. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-19 23:45:06 +02:00
Marek Olšák	2596ae2b6e	radeonsi: emit PS exports last This effectively removes s_waitcnt instructions after FP16 exports. Before: v_cvt_pkrtz_f16_f32_e32 v0, v0, v1 ; 5E000300 v_cvt_pkrtz_f16_f32_e32 v1, v2, v3 ; 5E020702 exp 15, 0, 1, 0, 0, v0, v1, v0, v0 ; F800040F 00000100 s_waitcnt expcnt(0) ; BF8C0F0F v_cvt_pkrtz_f16_f32_e32 v0, v4, v5 ; 5E000B04 v_cvt_pkrtz_f16_f32_e32 v1, v6, v7 ; 5E020F06 exp 15, 1, 1, 0, 0, v0, v1, v0, v0 ; F800041F 00000100 s_waitcnt expcnt(0) ; BF8C0F0F v_cvt_pkrtz_f16_f32_e32 v0, v8, v9 ; 5E001308 v_cvt_pkrtz_f16_f32_e32 v1, v10, v11 ; 5E02170A exp 15, 2, 1, 0, 0, v0, v1, v0, v0 ; F800042F 00000100 s_waitcnt expcnt(0) ; BF8C0F0F v_cvt_pkrtz_f16_f32_e32 v0, v12, v13 ; 5E001B0C v_cvt_pkrtz_f16_f32_e32 v1, v14, v15 ; 5E021F0E exp 15, 3, 1, 1, 1, v0, v1, v0, v0 ; F8001C3F 00000100 s_endpgm ; BF810000 After: v_cvt_pkrtz_f16_f32_e32 v0, v0, v1 ; 5E000300 v_cvt_pkrtz_f16_f32_e32 v1, v2, v3 ; 5E020702 v_cvt_pkrtz_f16_f32_e32 v2, v4, v5 ; 5E040B04 v_cvt_pkrtz_f16_f32_e32 v3, v6, v7 ; 5E060F06 exp 15, 0, 1, 0, 0, v0, v1, v0, v0 ; F800040F 00000100 v_cvt_pkrtz_f16_f32_e32 v4, v8, v9 ; 5E081308 v_cvt_pkrtz_f16_f32_e32 v5, v10, v11 ; 5E0A170A exp 15, 1, 1, 0, 0, v2, v3, v0, v0 ; F800041F 00000302 v_cvt_pkrtz_f16_f32_e32 v6, v12, v13 ; 5E0C1B0C v_cvt_pkrtz_f16_f32_e32 v7, v14, v15 ; 5E0E1F0E exp 15, 2, 1, 0, 0, v4, v5, v0, v0 ; F800042F 00000504 exp 15, 3, 1, 1, 1, v6, v7, v0, v0 ; F8001C3F 00000706 s_endpgm ; BF810000 Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-19 23:45:06 +02:00
Marek Olšák	b2b45cecef	radeonsi: set optimal settings in COMPUTE_RESOURCE_LIMITS ported from Vulkan Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-19 23:45:06 +02:00
Marek Olšák	ad70c3954b	radeonsi: really wait for the second EOP event and not the first one Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-19 23:45:06 +02:00
Marek Olšák	1a1cc67edd	gallium/radeon: remove RADEON_FLUSH_KEEP_TILING_FLAGS flag always set Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-19 23:45:06 +02:00
Samuel Pitoiset	9c63224540	gm107/ir: make use of ADD32I for all immediates ADD only allows to emit 19-bits immediates. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: <mesa-stable@lists.freedesktop.org>	2016-07-19 18:07:15 +02:00
Samuel Pitoiset	0904a2ba97	gm107/ir: add missing NEG modifier for IADD32I Like FADD32I, the NEG modifier of src0 is at position 56. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2016-07-19 18:07:10 +02:00
Andreas Boll	c482decd4d	ddebug: Fix trivial typo in stderr message Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>	2016-07-19 16:04:40 +02:00
Eric Engestrom	8ba46fbd9e	vl: fix memory leak CovID: 1363008 Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2016-07-19 12:41:00 +02:00
Boyuan Zhang	60c7450f16	vl: add entry point Add entrypoint to distinguish H.264 decode and encode. For example, in patch 5/11 when is calling "VaCreateContext", "pps" and "sps" shouldn't be allocated for H.264 encoding. So we need to use the entry_point to determine this is H.264 decode or H.264 encode. We can use config to determine the entrypoint since config_id is passed to us for VaCreateContext call. However, for VaDestoyContext call, only context_id is passed to us. So we need to know the entrypoint in order to not free the pps/sps for encoding case. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2016-07-19 12:36:46 +02:00
Ilia Mirkin	ed9dd3bcd9	nv50,nvc0: srgb rendering is only available for rgba/bgra Mark both L8_SRGB and L8A8_SRGB as non-renderable (the latter already didn't have the bind flags). This makes the state tracker pick a different format when rendering is required, or mark the fb as incomplete. This fixes: bin/getteximage-formats init-by-clear-and-render -auto -fbo bin/getteximage-formats init-by-rendering -auto -fbo which previously ran into srgb-encoding differences. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2016-07-18 20:04:17 -04:00
Ilia Mirkin	8e7893eb53	nvc0: add support for BGRA8 images This is useful for pbo downloads, which are now accelerated with images. BGRA8 is a moderately common format to do that in. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-07-18 20:04:17 -04:00
Christian König	3e1ad846f9	radeon/uvd: add session context buffer for polaris 10/11 v2 This way we have unlimited UVD sessions. v2: only enable it when kernel supports it as well. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2016-07-18 17:13:17 +02:00
Leo Liu	134d6e4e4f	vl/dri3: fix a memory leak from front buffer Inspired by fix for mem leak of vdpau interop, resource_from_handle set texture reference count, that need to be decreased and released, recall there is a similar case for DRI3, that is with VA-API glx extension, there is temporary TFP(texture from pixmap), we target it through dma-buf. leak happens when without count down the reference. Checked and found with mpv vo=opengl case, there only one static TFP, the leak happens once, but for totem player using gstreamer VA-API glx, the dynamic TFP for each frame, so leak quite a bit. This fixes mem leak for mpv and totem. Signed-off-by: Leo Liu <leo.liu@amd.com> Cc: "12.0" <mesa-stable@lists.freedesktop.org>	2016-07-18 09:20:40 -04:00
Kenneth Graunke	ac1181ffbe	compiler: Rename INTERP_QUALIFIER_* to INTERP_MODE_. Likewise, rename the enum type to glsl_interp_mode. Beyond the GLSL front-end, talking about "interpolation modes" seems more natural than "interpolation qualifiers" - in the IR, we're removed from how exactly the source language specifies how to interpolate an input. Also, SPIR-V calls these "decorations" rather than "qualifiers". Generated by: $ find . -regextype egrep -regex '.\.(c\|cpp\|h)' -type f -exec sed -i \ -e 's/INTERP_QUALIFIER_/INTERP_MODE_/g' \ -e 's/glsl_interp_qualifier/glsl_interp_mode/g' {} \; Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Dave Airlie <airlied@redhat.com>	2016-07-17 19:26:48 -07:00
Dave Airlie	e7d96e7685	virgl: drop pointless leftover init of virgl_transfer_inline_write. Pointed out by Marek. Signed-off-by: Dave Airlie <airlied@redhat.com>	2016-07-17 06:20:53 +10:00
Ilia Mirkin	062c6b8e54	nv50: fix alphatest for non-blendable formats The hardware can only do alphatest when using a blendable format. This means that the various *16 norm formats didn't work with alphatest. It appears that Talos Principle uses such formats, as well as alpha tests, for some internal renders, which made them be incorrect. However this does not appear to affect the final renders, but in a different game it easily could. The approach we take is that when alphatests are enabled and a suitable format is used (which we anticipate is the vast minority of the time), we insert code into the shader to perform the comparison and discard. Once inserted, that code lives in the shader forever, and we re-upload it each time the function changes with a fixed-up compare. To avoid re-uploading too often, if we switch back to a blendable format, the test is (effectively) disabled and the hw alphatest functionality is used. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-07-16 11:45:30 -04:00
Rob Clark	44bbfedbd9	gallium/u_queue: add optional cleanup callback Adds a second optional cleanup callback, called after the fence is signaled. This is needed if, for example, the queue has the last reference to the object that embeds the util_queue_fence. In this case we cannot drop the ref in the main callback, since that would result in the fence being destroyed before it is signaled. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-16 10:00:04 -04:00
Nicolai Hähnle	6f73c7595f	radeonsi: remove the DRAW_PREAMBLE packet According to firmware guys, the new sequence that we added for Polaris should work on all CIK parts, and should actually be faster on some parts. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-07-16 13:02:37 +02:00
Eric Anholt	3bcd0f1912	vc4: Speed up glGenerateMipmaps by avoiding shadow baselevel. To support general GL_TEXTURE_BASE_LEVEL we have to copy to a temporary miptree. However, if a single level is being selected, we can use the existing miptree and force all the sampling to be from that particular level. This avoids a ton of software fallbacks in glGenerateMipmaps(), which uses base levels in the blit implementation in gallium. Improves "glmark2 -b terrain" from 2 fps to 3 (perhaps some more precision would be useful?), and cuts its CPU usage during the benchmarking from ~30% to ~10% (total CPU time from 8.8s to 7.6s).	2016-07-15 13:54:00 -07:00
Eric Anholt	88152d7dc0	vc4: Drop VC4_DIRTY_TEXSTATE in favor of the per-stage flags. The compiler uses the per-stage flags already, so it didn't need this. vc4_uniforms was using it, so just replace it with both of the stage flags for now.	2016-07-15 13:54:00 -07:00
Eric Anholt	5db82e0c89	vc4: Remove dead dirty_samplers field. We use a big VC4_DIRTY_FRAGTEX/VC4_DIRTY_VERTEX on the stage, instead.	2016-07-15 13:54:00 -07:00
Eric Anholt	219b75deb9	vc4: Turn on control flow support in the simulator environment. We can't merge the non-simulator support until we merge the kernel side and get a new libdrm release.	2016-07-15 13:54:00 -07:00
Charmaine Lee	6b7923ee46	svga: avoid ubinding render targets that have already been unbound Fixed the remaining redundant SetRenderTargets command emission. Tested with lightsMark2008, Heaven, mtt piglit, glretrace, conform. Reviewed-by: Brian Paul <brianp@vmware.com>	2016-07-15 14:24:34 -06:00
Neha Bhende	4f633d110a	svga: dump code for GenMips. Reviewed-by: Brian Paul <brianp@vmware.com>	2016-07-15 14:24:33 -06:00
Yaakov Selkowitz	5d303867f5	Use correct names for dlopen()ed files on Cygwin Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com> Reviewed-by: Jon Turney <jon.turney@dronecode.org.uk>	2016-07-15 19:46:54 +01:00
Brian Paul	50a669de4e	svga: handle mismatched number of samplers, sampler views in svga_init_shader_key_common(). Since the CSO module only tracks sampler views for fragment shaders, the number of samplers and sampler views can be mismatched for other types of shaders. This situation triggered an assertion in Chrome with maps.google.com This patch adds defensive code to handle that situation. Fixes VMware bug 1694027 Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2016-07-15 11:05:18 -06:00
Leo Liu	b9d10e79c8	st/omx/enc: check uninitialized list from task release The uninitialized list should be checked and returned. Thank Julien for the notification and suggested fix. Signed-off-by: Leo Liu <leo.liu@amd.com> Cc: "12.0" <mesa-stable@lists.freedesktop.org>	2016-07-15 09:17:36 -04:00
Samuel Pitoiset	ea6b236ab1	nv50/ir: add missing string for SV_WORK_DIM Fixes: `2aa1197` ("nouveau: Add support for SV_WORK_DIM") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Hans de Goede <hdegoede@redhat.com>	2016-07-14 22:28:39 +02:00
Marek Olšák	f84e9d749f	Revert "radeon/llvm: Use alloca instructions for larger arrays" This reverts commit `513fccdfb6`. Bioshock Infinite hangs with that.	2016-07-14 22:15:08 +02:00
Jan Vesely	489bb5473b	r600,compute: Reserve vtx 3 for kernel arguments Using vtx 0 does not work for dynamic offsets. v2: add explanatory comment Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2016-07-14 16:04:50 -04:00
Marek Olšák	33eddde4a7	radeon/uvd: fail to create a decoder if RUVD_MSG_CREATE submission fails This is the bare minimum for reporting the error to the user. Reviewed-by: Christian König <christian.koenig@amd.com>	2016-07-14 22:00:54 +02:00
Marek Olšák	85388652f9	winsys/amdgpu: return an error on IB submission failures Reviewed-by: Christian König <christian.koenig@amd.com>	2016-07-14 22:00:54 +02:00
Marek Olšák	a7d84f7731	gallium/radeon: add a return value to cs_flush Required by our UVD code. Reviewed-by: Christian König <christian.koenig@amd.com>	2016-07-14 22:00:54 +02:00
francians@gmail.com	3db7f3458f	freedreno/a4xx: Fix sign compare warnings Signed-off-by: Rob Clark <robdclark@gmail.com>	2016-07-14 09:55:02 -04:00
francians@gmail.com	948822018f	freedreno/a3xx: Fix sign compare warnings Signed-off-by: Rob Clark <robdclark@gmail.com>	2016-07-14 09:55:02 -04:00
francians@gmail.com	cf2f345356	freedreno/a2xx: Fix sign compare warnings Signed-off-by: Rob Clark <robdclark@gmail.com>	2016-07-14 09:55:02 -04:00
Boyuan Zhang	23c5e8bc58	radeon/vce: handle newly added parameters Replace the previous hardcoded value with newly defined parameters Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2016-07-14 09:49:21 +02:00
Boyuan Zhang	5490068fb1	st/omx: assign previous values to new structure Assign previously hardcoded values for OMX to newly defined structure. As a result, OMX behaviour will not change at all. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2016-07-14 09:49:14 +02:00
Boyuan Zhang	b86bf4b568	vl: add parameters for VAAPI encode Allow to specify more parameters in the encoding interface which previously just hardcoded in the encoder Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2016-07-14 09:49:07 +02:00
Eric Anholt	9194473dd2	vc4: Emit resets of the uniform stream at the starts of blocks. If a block might be entered from multiple locations, then the uniform stream will (probably) be at different points, and we need to make sure that it's pointing where we expect it to be. The kernel also enforces that any block reading a uniform resets uniforms, to prevent reading outside of the uniform stream by using looping.	2016-07-13 23:54:15 -07:00

1 2 3 4 5 ...

28082 commits