fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-08 15:38:09 +02:00

Author	SHA1	Message	Date
Connor Abbott	49503ae74e	st/nir: Don't lower indirects when linking I believe this was stuck here early because otherwise nir_opt_copy_prop_vars could undo what lower_io_to_temporaries does. However that has since been fixed. Also, we now use scratch for large variables so the comment is stale. On radeonsi these are the shader-db results: Totals: SGPRS: 3955968 -> 3955968 (0.00 %) VGPRS: 2220208 -> 2220220 (0.00 %) Spilled SGPRs: 11387 -> 11387 (0.00 %) Spilled VGPRs: 97 -> 97 (0.00 %) Private memory VGPRs: 2528 -> 2528 (0.00 %) Scratch size: 2656 -> 2656 (0.00 %) dwords per thread Code Size: 76002108 -> 76002204 (0.00 %) bytes LDS: 740 -> 740 (0.00 %) blocks Max Waves: 772779 -> 772776 (-0.00 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 176 -> 176 (0.00 %) VGPRS: 144 -> 156 (8.33 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 12104 -> 12200 (0.79 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 28 -> 25 (-10.71 %) Wait states: 0 -> 0 (0.00 %) The few small regressions are due to nir_opt_large_constants kicking in when indirect lowering happens to result in smaller code after optimization since the array is very simple. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-09-05 12:38:22 +02:00
Connor Abbott	7d2d7b5d5f	st/nir: Call nir_remove_unused_variables() in the opt loop This prevents regressions when disabling indirect lowering. Sometimes the only use of an input array was copying it to the array created by nir_lower_io_to_temporaries, and without lowering indirects we wouldn't have eliminated the temporary array until after linking, which was too late to remove unused code in the producer. No shader-db changes with radeonsi NIR. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-09-05 12:37:28 +02:00
Connor Abbott	71a6794200	ac/nir: Enable nir_opt_large_constants vkpipeline-db numbers: Totals: SGPRS: 1740306 -> 1741322 (0.06 %) VGPRS: 1331124 -> 1331712 (0.04 %) Spilled SGPRs: 21201 -> 21316 (0.54 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 256 -> 256 (0.00 %) dwords per thread Code Size: 79022628 -> 78694788 (-0.41 %) bytes LDS: 6500 -> 6500 (0.00 %) blocks Max Waves: 301413 -> 301302 (-0.04 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 53633 -> 54649 (1.89 %) VGPRS: 53000 -> 53588 (1.11 %) Spilled SGPRs: 3454 -> 3569 (3.33 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 5284232 -> 4956392 (-6.20 %) bytes LDS: 2 -> 2 (0.00 %) blocks Max Waves: 4239 -> 4128 (-2.62 %) Wait states: 0 -> 0 (0.00 %) (The biggest VGPR and max wave regression is due to unrolling a loop, which made the scheduler more aggressive, but in this case it's able to effectively hide latency so it's actually probably a win.) shader-db numbers with radeonsi NIR: Totals: SGPRS: 3526496 -> 3526512 (0.00 %) VGPRS: 2198576 -> 2198576 (0.00 %) Spilled SGPRs: 10463 -> 10463 (0.00 %) Spilled VGPRs: 86 -> 86 (0.00 %) Private memory VGPRs: 3182 -> 2528 (-20.55 %) Scratch size: 3308 -> 2640 (-20.19 %) dwords per thread Code Size: 74117280 -> 74106140 (-0.02 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 775846 -> 775844 (-0.00 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 856 -> 872 (1.87 %) VGPRS: 680 -> 680 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 654 -> 0 (-100.00 %) Scratch size: 668 -> 0 (-100.00 %) dwords per thread Code Size: 49652 -> 38512 (-22.44 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 182 -> 180 (-1.10 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-09-05 12:21:46 +02:00
Connor Abbott	91626d0865	ac/nir: Support load_constant intrinsics Setup a constant global variable that LLVM will stick in a .rodata section and generate PC-relative loads for. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-09-05 12:21:42 +02:00
Connor Abbott	5dadbabb47	radv/radeonsi: Don't count read-only data when reporting code size We usually use these counts as a simple way to figure out if a change reduces the number of instructions or shrinks an instruction. However, since .rodata sections aren't executed, we shouldn't be counting their size for this analysis. Make the linker return the total executable size, and use it to report the more useful size in both drivers. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-09-05 12:21:35 +02:00
Heinrich Fink	5cc7cc5f17	headers: remove redundant GL token from GL wrapper Removing GL_FRAMEBUFFER_FLIP_Y_MESA token from glheader.h as it is now provided by glext.h Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-09-05 09:26:35 +02:00
Heinrich Fink	e2c88b7cd6	specs: Sync framebuffer_flip_y text with GL registry Sync extension spec of MESA_framebuffer_flip_y to what has been merged upstream in the GL registry. Update now carries the accepted GL extension no. v2: split GL headers update off to separate commit Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-09-05 09:26:30 +02:00
Heinrich Fink	c9a3f4fe40	include: sync GL headers with registry Integrating headers from upstream registry [0] master branch. Effective GL registry commit integrated: 9d534f9312e56c72df763207e449c6719576fd54 Keeping the following quirks local to Mesa: - glext.h: BUILDING_MESA guard (see !1492) - glxext.h: glXQueryGLXPbufferSGIX: 'int' return type (Mesa) vs while 'void' (GL registry) - glxext.h: GLX_RENDERER_ID_MESA is still expected by some mesa tests, even though its token has been removed from the spec (see docs/specs/MESA_query_renderer.spec) - glxext.h: glXGetTransparentIndexSUN / PFNGLXGETTRANSPARENTINDEXSUNPROC argument pTransparentIndex has type 'unsigned long ' (Mesa) vs. 'long ' (GL registry) [0] https://github.com/KhronosGroup/OpenGL-Registry Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-09-05 09:26:15 +02:00
Hal Gentz	55c912883c	clover: Fix build after clang r370122. ../mesa/src/gallium/state_trackers/clover/llvm/invocation.cpp: In function ‘std::unique_ptr<clang::CompilerInstance> {anonymous}::create_compiler_instance(const clover::device&, const std::vector<std::__cxx11::basic_string<char> >&, std::string&)’: ../mesa/src/gallium/state_trackers/clover/llvm/invocation.cpp:203:81: error: no matching function for call to ‘clang::CompilerInvocation::CreateFromArgs(clang::CompilerInvocation&, const char* const, const char const, clang::DiagnosticsEngine&)’ 203 \| c->getInvocation(), copts.data(), copts.data() + copts.size(), diag)) \| ^ In file included from /opt/llvm64/include/clang/Frontend/CompilerInstance.h:15, from ../mesa/src/gallium/state_trackers/clover/llvm/codegen.hpp:37, from ../mesa/src/gallium/state_trackers/clover/llvm/invocation.cpp:49: /opt/llvm64/include/clang/Frontend/CompilerInvocation.h:157:15: note: candidate: ‘static bool clang::CompilerInvocation::CreateFromArgs(clang::CompilerInvocation&, llvm::ArrayRef<const char>, clang::DiagnosticsEngine&)’ 157 \| static bool CreateFromArgs(CompilerInvocation &Res, \| ^~~~~~~~~~~~~~ /opt/llvm64/include/clang/Frontend/CompilerInvocation.h:157:15: note: candidate expects 3 arguments, 4 provided Signed-off-by: Hal Gentz <zegentzy@protonmail.com> Reviewed-by: Aaron Watry <awatry@gmail.com>	2019-09-04 22:29:52 -05:00
Vinson Lee	e716a9e213	scons: Add coroutines component to build. Fixes: `d32690b43c` ("gallivm: add coroutine pass manager support") Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-09-04 20:05:43 -07:00
Eric Anholt	cc3c217ce0	gallium/osmesa: Move 565 format selection checks where the rest are. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-09-04 16:43:36 -07:00
Eric Anholt	9e7eb9780a	gallium/osmesa: Fix a race in creating the stmgr. Noticed while looking at other OSMesa bugs. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-09-04 16:43:36 -07:00
Eric Anholt	281466332b	gallium/osmesa: Introduce a test. Given that we occasionally touch this code and probably nobody really wants to think about it, introduce a minimal test so that we know we haven't completely broken OSMesa. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-09-04 16:43:36 -07:00
Dylan Baker	d89d075589	docs: Mark 19.2.0-rc2 as done and push back rc3 and rc4/final	2019-09-04 16:00:02 -07:00
Hal Gentz	1591d1fee5	glx: Fix SEGV due to dereferencing a NULL ptr from XCB-GLX. When run in optirun, applications that linked to `libGLX.so` and then proceeded to querying Mesa for extension strings caused a SEGV in Mesa. `glXQueryExtensionsString` was calling a chain of functions that eventually led to `__glXQueryServerString`. This function would call `xcb_glx_query_server_string` then `xcb_glx_query_server_string_reply`. The latter for some unknown reason returned `NULL`. Passing this `NULL` to `xcb_glx_query_server_string_string_length` would cause a SEGV as the function tried to dereference it. The reason behind the function returning `NULL` is yet to be determined, however, simply checking that the ptr is not `NULL` resolves this. A similar check has been added to `__glXGetString` for completeness sake, although not immediately necessary. In addition to that, we stumbled into a similar problem in `AllocAndFetchScreenConfigs` which tries to access the configs to free them if `__glXQueryServerString` fails. This, of course, SEGVs, because the configs are yet to have been allocated. Simply continuing past the configs if their config ptrs are `NULL` resolves this. We also switch to `calloc` to make sure that the config ptrs are `NULL` by default, and not some uninitialized value. Cc: mesa-stable@lists.freedesktop.org Fixes: `24b8a8cfe8` "glx: implement __glXGetString, hide __glXGetStringFromServer" Fixes: `cb3610e37c` "Import the GLX client side library, formerly from xc/lib/GL/glx. Build it " Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Hal Gentz <zegentzy@protonmail.com>	2019-09-04 16:00:10 +00:00
Adam Jackson	9acb94b623	egl: Enable 10bpc EGLConfigs for platform_{device,surfaceless} It's somewhat annoying that these are so similar for so little benefit. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-09-04 11:39:57 -04:00
Neil Roberts	95927c414f	glsl: Store the precision for a function return type The precision for a function return type is now stored in ir_function_signature. This will later be useful to implement mediump to float16 lowering. In the meantime it is also useful to catch errors where a function is redeclared with a different precision. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-09-04 12:41:20 +02:00
Dave Airlie	3a7e92dac5	docs: add llvmpipe features for fb_no_attach and compute shaders Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	c0521ecffb	llvmpipe: enable compute shaders if LLVM has coroutines Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	6453a22612	llvmpipe: add local memory allocation path Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	4e70970507	llvmpipe: add compute shader parameter fetching support Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	0b51e73de2	llvmpipe: add compute shader images support Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	45a8cf95f2	llvmpipe: add ssbo support to compute shaders Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	6ea8e9b415	llvmpipe: add compute sampler + sampler view support. This is ported from the fragment shader code. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	4ca40cc3dc	llvmpipe: add support for compute constant buffers. This is mostly ported from the fragment shader code. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	775fa81d7b	llvmpipe: add compute pipeline statistics support. This just adds the CS invocations counter. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	50fde5b208	llvmpipe: add grid launch This adds the dispatch code. It creates a job for the number of blocks in the grid, and dispatches them to the threadpool implementation. The threadpool then calls the JIT code to execute the coroutines. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	b320830bbd	llvmpipe: add compute shader generation. This creates the coroutine execution environment and the main compute shaders that get executed inside it. Each compute shader block is executed in it's own coroutine execution shader, which each "thread" being a coroutine executed inside it in sequence. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	6ea41df94c	llvmpipe: introduce variant building infrastrucutre. This doesn't actually build any of the shaders yet, but just builds up the framework necessary to start building the shaders and variants. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	fc01fafdbc	llvmpipe: introduce new state dirty tracking for compute. Compute doesn't share dirty state with the fragment pipeline so create a separate path for it. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	a6f6ca37c8	llvmpipe: add initial shader create/bind/destroy variants framework. This is mostly a port of the fragment shader framework Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	a792c5ae3e	llvmpipe: add compute debug option Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	25f46ae9aa	gallivm: add compute jit interface. This adds the jit interface for compute shaders, it's based on the fragment shader one. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	3879f69b50	llvmpipe: add initial compute state structs These mirror the fragment shader structs, this is just a framework. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	add0b151f5	llvmpipe: introduce compute shader context The compute shader will need it's own context like the frag shader has, this just introduces the framework struct and allocates/frees for it in the right places. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	83597ad3f2	gallivm: add barrier support for compute shaders. When the code is executing an hits a barrier, it will suspend the coroutine and return control to the coroutine dispatcher. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	1b24e3ba75	llvmpipe: add compute threadpool + mutex Reviewed-by: Roland Scheidegger <sroland@vmware.com> In order to efficiently run a number of compute blocks, use a threadpool that just allows for jobs with unique sequential ids to be dispatched.	2019-09-04 15:22:20 +10:00
Dave Airlie	e5bf6b7013	gallivm: add support for compute shared memory Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	db6c78f9c8	gallivm: add new compute related intrinsics Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	3312bed7b0	llvmpipe: reogranise jit pointer ordering In order to share the texture/image/sampler code with compute shaders we need to reorg them to be at the front of context same as draw does for vs/gs sharing. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	d32690b43c	gallivm: add coroutine pass manager support coroutines require a proper pass manager, so add the passes to the correct places Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	9cf1340e4f	gallivm: add coroutine support files to gallivm. These wrap the coroutine intrinsics and also add some higher level wrappers around coroutine begin, end and suspend procedures Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	f3f0cbf4f4	gallivm/flow: add counter reset for loops This allows the counter value to be forced to a certain value Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Dave Airlie	6b3c6b91a8	llvmpipe: enable fb no attach Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-09-04 15:22:20 +10:00
Kenneth Graunke	f8887909c6	iris: Report correct number of planes for planar images We were only handling the modifiers case and not counting the number of planes in actual planar images. Fixes Piglit's ext_image_dma_buf_import-export. Fixes: `fc12fd05f5` ("iris: Implement pipe_screen::resource_get_param") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111509 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-09-03 21:55:23 -07:00
Ilia Mirkin	32d458fdff	teximage: ensure that TexSubImage checks format We were previously not doing at least some of the checks. This uses the same logic that is used in glTexImage*. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-09-04 00:35:45 -04:00
Jan Beich	8e92ce9ba5	gallium/hud: add CPU usage support for DragonFly/NetBSD/OpenBSD Each BSD has slightly different sysctl for retrieving per-CPU times. FreeBSD returns long while NetBSD returns uint64_t. On OpenBSD return type differs between summation and per-CPU times. DragonFly is compatible with FreeBSD. Signed-off-by: Jan Beich <jbeich@FreeBSD.org>	2019-09-03 22:53:15 -04:00
Roman Stratiienko	ef621a73f7	lima: Return fence unconditionally Based on the vc4 implementation. Fixes Android RenderEngine::flush() routine: android.googlesource.com/platform/frameworks/native/+/refs/tags/android-o-mr1-iot-release-smart-clock-fcs/services/surfaceflinger/RenderEngine/RenderEngine.cpp#225 Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-09-04 00:32:04 +00:00
Vasily Khoruzhick	1c1890fa70	lima/ppir: clone uniforms and load_coords into each successor Try more aggressive approach with cloning uniform and coord loads. Uniform load can be inserted into any instruction, so let's do that. ARM site claim that penalty for cache miss is one clock, so we don't lose anything if we merge it into instruction that uses the result. As side effect we can also pipeline it and thus decrease reg pressure. Do the same for varyings that hold texture coords, but for different reason: looks like there's a special path for coords that increases precision if varying that holds it is pipelined. If we don't pipeline it and load coords from a register its precision is fp16 and thus only 10 bits which is not enough to accurately sample textures of size 1024 or larger. Since instruction can hold only one uniform load and one varying load, node_to_instr now creates a move using helper introduced in previous commit if slot is already taken. As side effect of this change we can also try to pipeline texture loads and create a move if attempt fails. Reviewed-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-09-04 00:02:13 +00:00
Vasily Khoruzhick	e23fd2c375	lima/ppir: don't assume that load coords gets value from register It can load value from varying directly as well. Also load_regs is the only op that has a source, so add src_num field to load node and set it accordingly. Reviewed-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-09-04 00:02:13 +00:00

1 2 3 4 5 ...

115089 commits