fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-25 12:38:11 +02:00

Author	SHA1	Message	Date
Tim Rowley	ae2412dbbd	swr/rast: Remove hardcoded clip/cull slot from clipper Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-09-13 10:09:18 -05:00
Tim Rowley	5471f65976	swr/rast: Start to remove hardcoded clipcull_dist vertex attrib slot Add new field in SWR_BACKEND_STATE::vertexClipCullOffset to specify the start of the clip/cull section of the vertex header. Removed use of hardcoded slot from binner. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-09-13 10:09:11 -05:00
Tim Rowley	9669972692	swr/rast: Move clip/cull enables in API Moved from from SWR_RASTSTATE to SWR_BACKEND_STATE. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-09-13 10:09:04 -05:00
Tim Rowley	f5031fb952	swr/rast: Add new API SwrStallBE SwrStallBE stalls the backend threads until all work submitted before the stall has finished. The frontend threads can continue to make forward progress. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-09-13 10:08:46 -05:00
Marek Olšák	4ba20c9473	Revert "winsys/amdgpu: disable local BOs on Raven" This reverts commit `1cda9a2fee`. It works now.	2017-09-12 22:44:02 +02:00
Marek Olšák	6eade342eb	radeonsi: optimize TCS epilog when invocation 0 writes tess factors This removes the barrier and LDS stores and loads for tess factors when it's possible. The removal of the barrier seems more important to me though. In one shader, it removes 17 * 4 bytes from the shader binary. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-11 19:02:02 +02:00
Marek Olšák	386d165d8d	tgsi/scan: add a new pass that analyzes tess factor writes (v2) The pass tries to deduce whether tess factors are always written by all shader invocations. The implication for radeonsi is that it doesn't have to use a barrier near the end of TCS, and doesn't have to use LDS for passing the tess factors to the epilog. v2: Handle barriers and do the analysis pass for each code segment surrounded by barriers separately, and AND results from all such segments writing tess factors. The change is trivial in the main switch statement. Also, the result is renamed to "tessfactors_are_def_in_all_invocs" to make the name accurate. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-11 19:02:02 +02:00
Marek Olšák	a2a326e8f8	winsys/amdgpu: use the new raw CS API This also cleans things up. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-11 16:29:52 +02:00
Marek Olšák	3824ca7610	radeonsi: implement pipe_context::fence_server_sync This will be more useful once we have sync_file support. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-11 16:29:52 +02:00
Marek Olšák	8843bf6dfd	winsys/amdgpu: factor out some fence dependency code into separate functions Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-11 16:29:52 +02:00
Marek Olšák	a6eb164eb2	winsys/amdgpu: rename fence_dependency functions Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-11 16:29:52 +02:00
Marek Olšák	fc45495474	gallium/radeon: add a proper fail path for calloc in r600_flush_from_st Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-11 16:29:52 +02:00
Marek Olšák	7213293fe2	winsys/amdgpu: don't allow interprocess resource sharing for IBs Now we should get IB submissions with bo_list == NULL when DRI buffers aren't referenced. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-11 16:29:52 +02:00
Marek Olšák	46e7478986	radeonsi/gfx9: fix interprocess resource sharing on Raven This kinda fragiile, but it at least unbreaks the driver. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-11 16:29:52 +02:00
Dave Airlie	8d6b97a815	r600: handle the non-TXF_LZ support path. it appears that texcoord.z/w will be 0 in all cases already, so just put them into the vbo always. Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-09-11 02:10:24 +02:00
Marek Olšák	c1d92f8222	gallium/u_blitter: use UTIL_BLITTER_ATTRIB_NONE (0) instead of 0 directly Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Tested-by: Brian Paul <brianp@vmware.com>	2017-09-11 02:10:24 +02:00
Marek Olšák	005fa89bfa	gallium/u_blitter: don't pass GENERIC in VS if it's not needed Now, depth-only clears and custom passes don't read memory in VS. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Tested-by: Brian Paul <brianp@vmware.com>	2017-09-11 02:10:24 +02:00
Marek Olšák	22ed1ba01a	gallium/u_blitter: use draw_rectangle for all blits except cubemaps Add ZW coordinates to the draw_rectangle callback and use it. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Tested-by: Brian Paul <brianp@vmware.com>	2017-09-11 02:10:24 +02:00
Marek Olšák	43247c440e	gallium/u_blitter: use draw_rectangle callback for layered clears They are done with instancing. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Tested-by: Brian Paul <brianp@vmware.com>	2017-09-11 02:10:23 +02:00
Marek Olšák	7aaf4c73de	gallium/u_blitter: add new union blitter_attrib to replace pipe_color_union Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Tested-by: Brian Paul <brianp@vmware.com>	2017-09-11 02:10:23 +02:00
Marek Olšák	e4c457f695	gallium/radeon: use rectangles for 1D and 2D texture blits Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-11 02:10:23 +02:00
Roland Scheidegger	57a341b0a9	llvmpipe, draw: improve shader cache debugging With GALLIVM_DEBUG=perf set, output the relevant stats for shader cache usage whenever we have to evict shader variants. Also add some output when shaders are deleted (but not with the perf setting to keep this one less noisy). While here, also don't delete that many shaders when we have to evict. For fs, there's potentially some cost if we have to evict due to the required flush, however certainly shader recompiles have a high cost too so I don't think evicting one quarter of the cache size makes sense (and, if we're evicting based on IR count, we probably typically evict only very few or just one shader too). For vs, I'm not sure it even makes sense to evict more than one shader at a time, but keep the logic the same for now. Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2017-09-09 03:06:10 +02:00
Roland Scheidegger	772f475351	llvmpipe: enable PIPE_CAP_QUERY_PIPELINE_STATISTICS This was implemented since forever, but not enabled. It passes all piglit tests except one, arb_pipeline_statistics_query-frag. The reason is that the test (for drawing a 10x10 rect) expects between 100 and 150 pixel shader invocations. But since llvmpipe counts this with 4x4 granularity (and due to the rect being 2 tris) we end up with 224 invocations. I believe however what llvmpipe is doing violates neither the spirit nor the letter of the spec (our fragment shader granularity really is 4x4 pixels, albeit we will bail out early on 2x2 or 4x2 (the latter if AVX is available) granularity), the spec allows to count additional invocations due to implementation reasons. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2017-09-09 03:06:10 +02:00
Roland Scheidegger	dcf2feadc3	gallivm: fix gather implementation a bit gather is defined in terms of bilinear filtering, just without the filtering part. However, there's actually some subtle differences required in our implementation, because we use some tricks to simplify coord wrapping for the two coords per direction. For bilinear filtering, we don't care if we end up with an incorrect texel, as long as the filter weight is 0.0 for it. Likewise, the order of the texels doesn't actually matter (as long as they still have the correct filter weight). But for gather, these tricks lead to incorrect results. Fix this for CLAMP_TO_EDGE, and add some comments to the other wrap functions which look broken (the 3 mirror_clamp plus mirror_repeat) (too complex to fix right now, and noone really seems to care...). Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2017-09-09 03:06:10 +02:00
Charmaine Lee	57d9222ef2	svga: abort shader translation upon indirect indexing of temporaries This patch aborts shader translation upon indirect indexing of temporary register on non-vgpu10 device. This prevents non-supported feature sending to the device. Tested wth MTT-piglit, glretrace. Reviewed-by: Brian Paul <brianp@vmware.com>	2017-09-08 13:58:38 -06:00
Eric Engestrom	f77d06fb28	gallium/tests: use ARRAY_SIZE macro Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-09-08 10:29:40 +01:00
Eric Engestrom	db8c5ae853	r300: use ARRAY_SIZE macro Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-09-08 10:29:40 +01:00
Connor Abbott	b8a51c8c4b	radeonsi: move the guts of ARB_shader_group_vote emission to ac Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-09-08 04:12:49 +01:00
Connor Abbott	bd73b89792	radeonsi: move si_emit_ballot() to ac Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-09-08 04:12:42 +01:00
Connor Abbott	ac27fa7294	radeonsi: move emit_optimization_barrier() to ac Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-09-08 04:06:47 +01:00
Connor Abbott	c181d4f2b7	radeonsi: move llvm_get_type_size() to ac Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-09-08 04:04:16 +01:00
Leo Liu	6e8ef53837	Revert "st/va: add enviromental variable to disable interlace" This reverts commit `10dec2de2d`. The environment variable is no longer needed with the previous change Reviewed-by: Christian König <christian.koenig@amd.com>	2017-09-07 13:32:36 -04:00
Leo Liu	15d4d44d9b	st/va: move YUV content to deinterlaced buffer when reallocated for encoder v2: use deinterlace common function v3: make sure deinterlace only Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2017-09-07 13:32:36 -04:00
Leo Liu	cadeb73f6b	st/va: reallocate the buffer if the layout isn't supported So that it makes more clear for buffer reallocation based on buffers layout for both decoder and encoder. Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2017-09-07 13:32:36 -04:00
Leo Liu	78ec7400c5	vl/compositor: make vl_compositor_set_yuv_layer() static Since it's no longer being called outside of compositor Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2017-09-07 13:32:36 -04:00
Leo Liu	9f32078c20	st/omx: use vl/compositor helper function for YUV deinterlacing v2: separate helper function in different patch Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2017-09-07 13:32:36 -04:00
Leo Liu	a6da7e6c3a	vl/compositor: make a helper function for YUV deinterlacing The similar function is in OMX, and only used by OMX. Now have it moved to vl/compositor for other state tracker to use later. Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2017-09-07 13:32:36 -04:00
Marek Olšák	4bd2bdbb3c	ac/surface: add radeon_surf::has_stencil for convenience Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-07 17:59:37 +02:00
Marek Olšák	7ec64bd88c	radeonsi: don't read tcs_out_lds_layout.patch_stride from an SGPR Same as before, writing TCS outputs to LDS is rare. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-07 13:00:07 +02:00
Marek Olšák	07fe10c75d	radeonsi: don't read tcs_out_lds_layout.vertex_size from an SGPR TCS outputs are usually not written to LDS, so no stats here. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-07 13:00:07 +02:00
Marek Olšák	89bf8668c2	radeonsi/gfx9: don't read LS out vertex stride from an SGPR in monolithic HS -44 bytes in a monolithic LS-HS binary. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-07 13:00:07 +02:00
Marek Olšák	f974bb768b	radeonsi: don't read the LS output vertex stride from an SGPR in LS Now it's able to generate ds_write2_b64 instead of ds_write2_b32. -20 bytes in one shader binary. (having only 1 output) Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-07 13:00:07 +02:00
Marek Olšák	22f5dfd300	radeonsi: don't read the number of TCS out vertices from an SGPR in TCS -16 bytes in one shader binary. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-07 13:00:07 +02:00
Marek Olšák	17dd4856a6	radeonsi: don't always apply the PrimID instancing bug workaround on SI It looks like commit `391673af7a` that should have fixed the perf regression didn't really change much if anything. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-07 13:00:06 +02:00
Marek Olšák	a0823df148	radeonsi: remove 2 callbacks from si_shader_context Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-07 13:00:06 +02:00
Marek Olšák	1cda9a2fee	winsys/amdgpu: disable local BOs on Raven It hangs with a high degree of reproducibility. Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-07 12:57:48 +02:00
Roland Scheidegger	6d9d6071ee	llvmpipe, tgsi: hook up dx10 gather4 opcode Trivial. We already support tg4 for legacy tex opcodes, so the actual texture sampling code already handles it. (Just like TG4, we don't handle additional capabilities and always sample red channel.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2017-09-07 03:32:01 +02:00
Roland Scheidegger	de6810d9be	llvmpipe, draw: increase shader cache limits We're not particularly concerned with memory usage, if the tradeoff is shader recompiles. And it's common for apps to have a lot of shaders nowadays (and, since our shaders include a LOT of context state of course we may create quite a bit more shaders even). So quadruple the amount of shaders draw will cache (from 128 to 512). For llvmpipe (fs shaders) quadruple the number of instructions, keep the number of variants the same for now (only with very simple, non-texturing shaders the variant limit could really be reached), and simplify the definition, it's probably easier to just have one different definition per branch... Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2017-09-07 03:32:01 +02:00
Leo Liu	e1e3c0384b	radeon/uvd: fix the assertion check for YUYV format Fixes:7319ff87("radeon/uvd: add YUYV format support for target buffer") Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2017-09-06 15:53:18 -04:00
Tim Rowley	dad32fc61c	swr/rast: FE/Clipper - unify SIMD8/16 functions using simdlib types Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-09-06 11:02:36 -05:00

... 8 9 10 11 12 ...

32650 commits