fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-06-06 12:48:16 +02:00

Author	SHA1	Message	Date
Samuel Pitoiset	e01a482182	nvc0: invalidate textures/samplers between 3D and CP on Fermi Like constant buffers, samplers and textures are aliased on Fermi and we need to invalidate the state when switching from 3D to CP and vice versa. This fixes rendering issues in the UE4 demos. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-05-26 23:51:22 +02:00
Jason Ekstrand	32210dea8e	compiler: Move glsl_to_nir to libglsl.la Right now libglsl.la depends on libnir.la so putting it in libnir.la adds a dependency on libglsl.la that goes the wrong direction. Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2016-05-26 14:13:38 -07:00
Bas Nieuwenhuizen	43d7305a40	radeonsi: Allow TES distribution between shader engines. The R_028B50_VGT_TESS_DISTRIBUTION value is copied from amdgpu-pro. Smaller values in the ACCUM fields seem to decrease the performance advantage from this patch, higher values don't seem to matter. v2: Add distribution mode field enums. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	f91c85b29b	radeonsi: Process multiple patches per threadgroup. Using more than 1 wave per threadgroup does increase performance generally. Not using too many patches per threadgroup also increases performance. Both catalyst and amdgpu-pro seem to use 40 patches as their maximum, but I haven't really seen any performance increase from limiting the number of patches to 40 instead of 64. Note that the trick where we overlap the input and output LDS does not work anymore as the insertion of the tess factors changes the patch stride. v2: - Add comment about LDS assumptions. - Add constant for buffer size. - Fix code style. v3: - Correct limits for not splitting patches between waves. - Set max num_patches to 40 as in the proprietary driver. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	fd0a7a382f	radeonsi: Add barrier before writing the tess factors. The factors may be stored to LDs by another invocation than the invocation for vertex 0. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	fee3160af9	radeonsi: Enable dynamic HS. This allows running the TES on different CU's than the TCS which results in performance improvements. v2: Only write the control word from one invocation. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	26f436132b	radeonsi: Remove LDS layout user SGPR's from TES. They are unused. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	a4e2146a9d	radeonsi: Use buffer loads and stores for passing data from TCS to TES. We always try to use 4-component loads, as LLVM does not combine loads and they bypass the L1 cache. We can't use a similar strategy for stores and this is especially notable with the tess factors, as they are often set with separate MOV's per component in the TGSI. We keep storing to LDS and the LDS space, so we can load the outputs later, either due to the shader, of for wrting the tess factors. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	6217716e8f	radeonsi: Store inputs to memory when not using a TCS. We need to copy the VS outputs to memory. I decided to do this using a shader key, as the value depends on other shaders. I also switch the fixed function TCS over to monolithic, as otherwisze many of the user SGPR's need to be passed to the epilog, which increases register pressure, or complexity to avoid that. The main body of the fixed function TCS is not that interesting to precompile anyway, since we do it on demand and it is very small. v2: Use u_bit_scan64. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	7846fa8768	radeonsi: Add offchip buffer address calculation. Instead of creating a memory area per patch and per vertex, we put the same attribute of every vertex & patch together. Most loads and stores access the same attribute across all lanes, only for different patches and vertices. For the TCS this results in tightly packed data for 4-component stores. For the TES this is not the case as within a patch the loads often also access the same vertex. However if there are < 4 vertices/patch, this still results in a reduction of the number of cache lines. In the LDS situation we only do better than worst case if the data per patch < 64 bytes, which due to the tessellation factors is pretty much never. We do not use hardware swizzling for this. It would slightly reduce the number of executed VALU instructions, but I had issues with increased wait times that I haven't been able to solve yet. Furthermore, the tbuffer_store intrinsic does not support both VGPR offset and an index, so we have a problem storing indirectly indexed outputs. This can be solved by temporarily storing arrays in LDS and then copying them, but I don't think that is worth the effort. The difference in VALU cycles hardware swizzling gives is about 0.2% of total busy cycles. That is without handling the array case. I chose for attributes instead of components as they are often accessed together, and the software swizzling takes VALU cycles for calculating offsets. v2: - Rename functions to get_tcs_tes_buffer_address. - multiply by 16 as late as possible. - Use tgsi_full_src_register_from_dst. - Remove some bad comments. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	c49e68dc4b	radeonsi: Add user SGPR for the layout of the offchip buffer. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	d9a0c54f6f	radeonsi: Use correct parameter index for LS_OUT_LAYOUT. This happens to be in the right position, but that changes when TCS/TES get new parameters. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	3e7a7a9a65	radeonsi: Add buffer load functions. v2: - Use llvm.admgcn.buffer.load instrinsics for new LLVM. - Code style fixes. v3: - Code style fix. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	9fdb778702	radeonsi: Define build_tbuffer_store_dwords earlier to support new users. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	5c34562d7c	radeonsi: Add offchip tessellation parameters. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Bas Nieuwenhuizen	d27ff7d683	radeonsi: Add buffer for offchip storage between TCS and TES. The buffer is quite large, but should only be allocated if the application uses tessellation. Most non-games don't. v2: - Use the correct register for SI. - Add define for block size. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-26 22:07:04 +02:00
Rob Clark	6e51fe75a4	tgsi: fix coverity out-of-bounds warning CID 1271532 (#1 of 1): Out-of-bounds read (OVERRUN)34. overrun-local: Overrunning array of 2 16-byte elements at element index 2 (byte offset 32) by dereferencing pointer &inst.Dst[i]. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-05-26 15:17:49 -04:00
Rob Clark	3d66ba971e	tgsi: fix out of bounds access Not sure why coverity calls this an out-of-bounds read vs out-of-bounds write. CID 1358920 (#1 of 1): Out-of-bounds read (OVERRUN)9. overrun-local: Overrunning array r of 3 16-byte elements at element index 3 (byte offset 48) using index chan (which evaluates to 3). Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-05-26 15:17:49 -04:00
Samuel Pitoiset	c52e92ec3a	nvc0: allow to monitor MP perf counters with compute shaders To read out MP perf counters we use a compute shader and need to upload input data like a 64-bits addr used to store the values and a sequence ID for synchronization. Currently, this input data is uploaded as user uniforms which means that it's sticked to c0[], but if a compute shader from a real application is used, monitoring those performance counters will just overwrite some data and miserably crash. Instead, sticking the 64-bits addr and the sequence into the driver constant buffer seems like much better and will allow to monitor counters with GL 4.3 apps. Tested on GF119 and GK110, but should not hurt anything on GK104. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-05-26 19:34:57 +02:00
Marek Olšák	8539c9bf31	gallium/radeon: add the kernel version into the renderer string Example: Gallium 0.4 on AMD TONGA (DRM 3.2.0 / 4.5.0, LLVM 3.9.0) My kernel version is pretty long already (4.5.0-amd-01025-g32791c1) and adding "kernel" into the string would make too it long for glxinfo to display. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2016-05-26 16:53:46 +02:00
Marek Olšák	53f33619a4	winsys/amdgpu: add back multithreaded command submission Ported from the initial amdgpu winsys from the private AMD branch. The thread creates the buffer list, submits IBs, and cleans up the submission context, which can also destroy buffers. 3-5% reduction in CPU overhead is expected for apps submitting a lot of IBs per frame. This is most visible with DMA IBs. v2: use a semaphore instead of a busy loop in amdgpu_ws_queue_cs add another amdgpu_cs_sync_flush call into amdgpu_bo_map Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-05-26 16:43:45 +02:00
Lars Hamre	c626a86586	gallium/tgsi: use _mesa_roundevenf in micro_rnd Fixes the following piglit tests (for softpipe): /spec/glsl-1.30/execution/built-in-functions/... fs-roundeven-float fs-roundeven-vec2 fs-roundeven-vec3 fs-roundeven-vec4 vs-roundeven-float vs-roundeven-vec2 vs-roundeven-vec3 vs-roundeven-vec4 /spec/glsl-1.50/execution/built-in-functions/... gs-roundeven-float gs-roundeven-vec2 gs-roundeven-vec3 gs-roundeven-vec4 Signed-off-by: Lars Hamre <chemecse@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-05-26 07:59:15 -06:00
Ilia Mirkin	f998e5dc6b	nvc0: add note about where the viewport mask would go Not piping this all the way through yet, but no better place to note this down. This will can be used with NV_viewport_array2. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-05-26 08:46:29 -04:00
Ilia Mirkin	b634936d3b	nvc0: enable 32 textures on kepler+ For fermi, this likely will require use of linked tsc mode. However on bindless architectures, we can have as many as we want. As it stands, the AUX_TEX_INFO has 32 teture handles reserved. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-05-26 08:46:13 -04:00
Bruce Cherniak	c8835a5924	swr: [rasterizer] Correctly select optimized primitive assembly. Indexed primitives were always using cut-aware primitive assembly, whether primitive_restart was enabled or not. Correctly pass down primitive_restart and select optimized PA when possible. Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-05-25 18:47:16 -05:00
Rob Clark	231dcb19f9	freedreno/ir3: cmdline compiler for glsl Use glsl/libstandalone.la to add support for taking glsl src files (in addition to .tgsi) as input. Then glsl->nir and feed the result into the ir3 backend as normal. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-05-25 16:31:15 -04:00
Samuel Pitoiset	71c30bd87c	nvc0: add descriptions for hardware perf counters/metrics The GALLIUM_HUD does not yet expose a description for each events, but this might be useful for developers who want to have a long description of hw perf counters directly in the source code. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-05-25 21:06:49 +02:00
Giuseppe Bilotta	8c00fe3970	scons: whitespace cleanup This text transformation was done automatically via the following shell command: $ find -name SCons\* -exec sed -i s/\\s\\+$// '{}' \; Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-05-25 12:23:12 -06:00
Brian Paul	9690ab0cdf	tgsi: print TGSI_PROPERTY_NEXT_SHADER value as string, not an integer Print "GEOM" instead of "2", for example. v2: also update the text parsing code, per Ilia. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-25 07:21:23 -06:00
Brian Paul	2b773fcf00	tgsi: s/6/PIPE_SHADER_TYPES/ for tgsi_processor_type_names array size Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-25 07:21:23 -06:00
Emil Velikov	e384d75b12	mesa_glinterop: make GL interop version field bidirectional This allows clear and easy communication between the two. Caller: Requesting information (struct vN) Callee: I know how to deal with older version (vN-1) only. Here is your data and the version I support. Caller: Older version ? Sure I'll cap all access to the fields provided by the older version (vN-1) Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Tom Stellard <thomas.stellard@amd.com>	2016-05-24 23:03:00 +01:00
Emil Velikov	0e983276b9	mesa_glinterop: drop mesa_glinterop_device_info::interop_version One cannot use a single version to control both export_in and export_out versions. Using this forces us to always extend/bump both structs at the same time. An alternative scheme is coming with next patch. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Tom Stellard <thomas.stellard@amd.com>	2016-05-24 23:03:00 +01:00
Emil Velikov	f8a114aa5c	st/dri: add note about GL interop version checks ... and make them more explicit. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Tom Stellard <thomas.stellard@amd.com>	2016-05-24 23:03:00 +01:00
Emil Velikov	923bdbf48c	mesa_glinterop: rename MESA_GLINTEROP_INVALID_{VALUE,VERSION} Be more explicit what it actually does. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Tom Stellard <thomas.stellard@amd.com>	2016-05-24 23:03:00 +01:00
Emil Velikov	c196de23ae	mesa_glinterop: s/struct_version/version/ OCD polish for consistency with other mesa interfaces. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Tom Stellard <thomas.stellard@amd.com>	2016-05-24 23:03:00 +01:00
Emil Velikov	cbf29d90ba	mesa_glinterop: use consistent naming scheme for GL interop Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Tom Stellard <thomas.stellard@amd.com>	2016-05-24 23:02:08 +01:00
Tim Rowley	0ceed1701d	swr: [rasterizer] remove containers.hpp Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-05-24 13:29:37 -05:00
Tim Rowley	1e3e22efb5	swr: [rasterizer core] remove utility dead code Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-05-24 13:29:29 -05:00
Tim Rowley	dc34479b8c	swr: [rasterizer core] buckets fixes 1. Don't clear bucket descriptions to fix issues with sim level buckets getting out of sync. 2. Close out threadviz file descriptors in ClearThreads(). 3. Skip buckets for jitter based buckets when multithreaded. We need thread local storage through llvm jit functions to be fixed before we can enable this. 4. Fix buckets StopCapture to correctly detect capture complete. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-05-24 13:29:21 -05:00
Tim Rowley	3074a2b4fa	swr: [rasterizer core] move centroid setup out of CalcCentroidBarycentrics Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-05-24 13:29:14 -05:00
Tim Rowley	9a2a4ecb39	swr: [rasterizer jitter] implement InstanceID/VertexID in fetch jit Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-05-24 13:28:47 -05:00
Ilia Mirkin	f236f1f506	nvc0: expose robust buffer access We apparently pass all the relevant CTS tests. There are probably some shortcomings, but they can be addressed down the line. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-05-23 22:22:05 -04:00
Kenneth Graunke	70048eb1e3	gallium: Add a pipe cap for whether primitive restart works for patches. Some hardware supports primitive restart on patch primitives, and other hardware does not. Modern GL and ES include a query for this feature; adding a capability bit will allow us to answer it. As far as I know, AMD hardware does not support this feature, while NVIDIA and Intel hardware does. However, most Gallium drivers do not appear to support tessellation shaders yet. So, I've enabled it for nvc0 and disabled it everywhere else. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-23 16:44:11 -07:00
Rob Clark	46ff17559b	freedreno/ir3: disable cp for indirect src's The variable-indexing tests always had a few random fails, which I usually couldn't reproduce when running tests manually. Somehow recently this got a lot worse. I ported a couple of the shaders to GLES to see what blob does, and it also seems to be avoiding to cp indirect srcs. So I guess indirect w/ instructions other than cat1 (mov) are not totally reliable. Let's just switch that off until this is better understood. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-05-23 15:57:13 -04:00
Samuel Pitoiset	c3c4370299	nvc0: do not invalidate compute constbufs on Kepler Constbufs are only aliased on Fermi and this will reduce the number of flushes when we switch between 3d and compute. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-05-23 20:56:29 +02:00
Emil Velikov	a155cdaace	vl/drm: don't call close(-1) in vl_drm_screen_create error path Analogous to previous commits. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2016-05-23 12:07:47 +01:00
Emil Velikov	ed3f6ccce0	st/xa: don't call close(-1) in xa_tracker_create error path Analogous to previous commit. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2016-05-23 12:07:46 +01:00
Emil Velikov	6e00a1e6cb	st/dri: don't call close(-1) in dri{2, kms_}_init_screen error path Add separate labels and jump to the correct one as needed. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2016-05-23 12:07:46 +01:00
Rob Herring	e8431a630d	st/dri: Add support for DRIimage extension mapImage/unmapImage Implement support for mapImage/unmapImage functions in version 12 of the DRIimage extension. Signed-off-by: Rob Herring <robh@kernel.org> [Emil Velikov: align/indent the map/unmap vfuncs] Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2016-05-23 12:07:46 +01:00
Ilia Mirkin	74e71cbfcb	nv30: don't assert when running out of registers This happens with dEQP tests. The code doesn't at all protect against this condition, so while unhandled, this is an expected situation. Also avoid using more than the first 16 registers for nv3x vertex programs. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-05-22 22:57:18 -04:00

1 2 3 4 5 ...

27433 commits