fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-02 12:18:09 +02:00

Author	SHA1	Message	Date
Bas Nieuwenhuizen	e56514f631	radeonsi: update predicate condition for compute dispatches Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen	c3083d841e	radeonsi: implement TGSI compute dispatch v2: - Use radeon_set_sh_reg_seq. - Set predicate bit for conditional rendering. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen	1349dd16ff	radeonsi: only emit compute shader state when switching shaders v2: - Do check if anything changed earlier - Use emitted_program instead of emitted_bo to prevent shaders with shader->bo = NULL confusing the check - Use radeon_set_sh_reg* Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen	ba1f66a73d	radeonsi: rework compute scratch buffer Instead of having a scratch buffer per program, have one per context. Also removed the per kernel wave count calculations, but that only helped if the total number of waves in the dispatch was smaller than sctx->scratch_waves. v2: Fix style issue. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen	107f4d3538	radeonsi: do per cs setup for compute shaders once per cs Also removes PKT3_CONTEXT_CONTROL as that is already being done by si_begin_new_cs, when emitting init_config. v2: - Use radeon_set_sh_reg_seq. - Also set COMPUTE_STATIC_THREAD_MGMT_SE2 / SE3 for CIK+ Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen	52d3584dec	radeonsi: don't pass scratch buffer to user SGPRs As far as I can see we use relocations for clover too. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen	422a19f76f	radeonsi: split input upload off from si_launch_grid Also uses a dynamically allocated buffer using u_upload_alloc. The old buffer per program approach required serializing all dispatches of the same program. v2: - Clarified commit message. - Use radeon_set_sh_reg_seq. - Also upload input buffer for clover kernels, even when input_size is 0, as it contains grid parameters. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen	898298efc9	radeonsi: implement TGSI compute shader creation v2: Moved scratch_enabled initialization after compile. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen	85fd7817ee	radeonsi: update shader count for compute shaders Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen	da88c2a8e8	radeonsi: set maximum work group size based on block size Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen	b082147b78	radeonsi: implement shared atomics v2: - Use single region - Use get_memory_ptr Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen	8acf3e501b	radeonsi: implement shared memory load/store v2: - Use single region - Combine address calculation Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-04-19 18:10:31 +02:00
Bas Nieuwenhuizen	84a6761ae3	radeonsi: add shared memory Declares the shared memory as a global variable so that LLVM is aware of it and it does not conflict with passes like AMDGPUPromoteAlloca. v2: - Use ctx->i8. - Dropped null-check for declare_memory_region. - Changed memory region array to single region. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen	753a3e472b	radeonsi: lower compute shader arguments Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen	008d977d01	radeonsi: Use CE for all descriptors. v2: Load previous list for new CS instead of re-emitting all descriptors. v3: Do radeon_add_to_buffer_list in si_ce_upload. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen	0b6c463dac	gallium/util: Add u_bit_scan_consecutive_range64. For use by radeonsi. v2: Make sure that it works for all 64 bits set. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen	058b54c624	radeonsi: Replace list_dirty with a mask. We can then upload only the dirty ones with the constant engine. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen	aabc7d61d6	radeonsi: Add CE uploader. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen	0d7ddd6819	radeonsi: Allocate chunks of CE ram. v2: Use 32 byte alignment. v3: Don't allocate CE space for vertex buffer descriptors. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen	86c71ff989	radeonsi: Add CE synchronization. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen	fe1ef23b66	radeonsi: Add CE packet definitions. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen	8fee75d606	radeonsi: Create CE IB. Based on work by Marek Olšák. v2: Add preamble IB. Leaves the load packet in the space calculation as the radeon winsys might not be able to support a premable. The added space calculation may look expensive, but is converted to a constant with (at least) -O2 and -O3. v3: - Fix code style. - Remove needed space for vertex buffer descriptors. - Fail when the preamble cannot be created. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen	7201230582	winsys/amdgpu: Enlarge const IB size. Necessary to prevent performance regressions due to extra flushing. Probably should enlarge it even further when also updating uniforms through the CE, but this seems large enough for now. v2: Add preamble IB. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-04-19 18:10:30 +02:00
Marek Olšák	7997b5f005	winsys/amdgpu: Add support for const IB. v2: Use the correct IB to update request (Bas Nieuwenhuizen) v3: Add preamble IB. (Bas Nieuwenhuizen) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-04-19 18:10:30 +02:00
Marek Olšák	e78170f388	winsys/amdgpu: split IB data into a new structure in preparation for CE Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2016-04-19 18:10:30 +02:00
Marek Olšák	f4b77c764a	gallium/radeon: move ring_type into winsyses Not used by drivers. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2016-04-19 18:10:30 +02:00
Jose Fonseca	1d2ac7a7ca	llvmpipe: Call LLVMShutdown before exiting. So that LLVM frees its globals. Trivial.	2016-04-19 12:10:09 +01:00
Jose Fonseca	524042fa35	llvmpipe: Avoid LLVMGetGlobalContext in tests. Trivial.	2016-04-19 12:10:02 +01:00
Jose Fonseca	bb9e8c5090	llvmpipe: Skip false exp2 failure in lp_test_arit due to buggy MSVCRT. 64bits MSVCRT's exp2f(-inf) returns -inf instead of 0. Tested with MSVC 2013's CRT. (I haven't tried 2015 yet.) Also this does not happen with MinGW. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-04-19 11:31:53 +01:00
Jose Fonseca	ee9876be1d	llvmpipe: Test more vector lengths. All power of two of up native vector length. There is actually a bug in lp_build_round for v2, whereby it doesn't round to nearest. Fixing is left to the future, but the test is now able to expect it to fail. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-04-19 11:31:44 +01:00
Jose Fonseca	932b71f17d	gallivm: Avoid llvm::sys::getProcessTriple(). Just use LLVM_HOST_TRIPLE, which is available at least from LLVM 3.3 onwards, and is pretty much what llvm::sys::getProcessTriple() does anyway, Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-04-19 11:31:37 +01:00
Jose Fonseca	b5ca689cee	gallivm: Remove lp_get_module_id. Just keep a copy of the module_name in gallivm. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-04-19 11:31:26 +01:00
Jose Fonseca	969ba8bfa7	gallivm: Fix MCJIT with LLVM 3.3. One needs to call setJITMemoryManager for LLVM 3.3, instead of setMCJITMemoryManager. This regressed in commits 065256df/75ad4fe7 when trying to make the code to build with LLVM 3.6. Tested MCJIT with LLVM 3.3 to 3.6. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-04-19 11:31:17 +01:00
Jose Fonseca	cf4105740f	gallivm: Make MCJIT a runtime option. On the LLVM versions that support it, so we can easily switch between MCJIT/old-jit for testing. The new option is GALLIVM_MCJIT. Unfortunately setting GALLIVM_MCJIT=1 for LLVM 3.3 or 3.4 causes segfault, both on Linux and Windows. I'm almost certain this used to work, so there probably is a regression somewhere. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-04-19 11:31:14 +01:00
Jose Fonseca	7d2151b6ea	scons: Show the unit test full path. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-04-19 11:31:11 +01:00
Jose Fonseca	2211f8d559	gallivm: Use LLVMSetTarget. Instead of LLVM C++ interfaces. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-04-19 11:31:00 +01:00
Jose Fonseca	9aa23b11e4	gallivm: Use LLVMPrintValueToString where available. And llvm::raw_string_ostream where not (LLVM 3.3). Thereby eliminating yet another dependency on unstable LLVM interfaces. As a bonus this also gets LLVM IR on OutputDebugMessageA on MSVC (which was disabled, probably due to C++ issues.) Tested `lp_test_arit -v -v` on LLVM 3.3, 3.4 and 3.8. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-04-19 11:28:37 +01:00
Jose Fonseca	f6621cd3be	gallium/tests: Update UTIL_FORMAT_MAX_* defines. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-04-19 11:28:16 +01:00
Jose Fonseca	121a0cedc8	Revert "nv50/ra: `isinf()` is in namespace `std` since C++11." This reverts commit `f525db6358`. It was superseeded by commit `649704f1f7`.	2016-04-19 11:22:45 +01:00
Eric Anholt	802b9292aa	vc4: Fix fbo-generatemipmap-formats for NPOT. Single-sampled texture miplevels > 1 are stored in POT-aligned areas, but we only get one value to control the stride of the src and dst for single sampled buffers. A RCL tile blit from level != 1 to level == 0 would therefore load from the wrong stride.	2016-04-18 16:55:36 -07:00
Eric Anholt	2402bb6095	vc4: Remove unused "immediates" field This was for TGSI, which we no longer have to deal with.	2016-04-18 16:48:45 -07:00
Ben Widawsky	2408899cb2	i965: Define miptree map functions static (trivial) They were already declared as such. It was changed here: commit `31f0967fb5` Author: Ian Romanick <ian.d.romanick@intel.com> Date: Wed Sep 2 14:43:18 2015 -0700 i965: Make intel_miptree_map_raw static Cc: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>	2016-04-18 16:12:13 -07:00
Matt Turner	b1d9353cb5	glsl: Properly handle ldexp(0.0f, non-zero-exp).	2016-04-18 15:48:54 -07:00
Dave Airlie	3a26ef23e7	gallivm: convert size query to using a set of parameters. This isn't currently that easy to expand, so fix it up before expanding it later to include dynamic samplers. [airlied: use some local variables (Roland)] Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2016-04-19 07:33:39 +10:00
Tim Rowley	3227c10270	swr: dereference cbuf/zbuf/views on context destroy Fixes resource memory leaks. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-04-18 15:52:26 -05:00
Rob Clark	77a9107bf2	freedreno/ir3: fix grouping issue w/ reverse swizzles When we have something like: MOV OUT[n], IN[m].wzyx the existing grouping code was missing a potential conflict. Due to input needing to be sequential scalar regs, we have: IN: x <-> y <-> z <-> w which would be grouped to: OUT: w <-> z2 <-> y2 <-> x (where the 2 denotes a copy/mov) but that can't actually work. We need to realize that x and w are already in the same chain, not just that they aren't both already in new chain being built. With this fixed, we probably no longer need the hack from `f68f6c0`. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-04-18 15:41:32 -04:00
Marek Olšák	ed66c75784	radeonsi: use enums in si_shader.h Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-04-18 19:51:25 +02:00
Marek Olšák	0c52caf7b7	gallium/radeon: use enums in r600_query.h Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-04-18 19:51:25 +02:00
Marek Olšák	dd9ca77cb9	radeonsi: always use PFP_SYNC_ME when doing flushes and waits This is typically used by the closed driver before SURFACE_SYNC. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-04-18 19:51:25 +02:00
Marek Olšák	1db5678688	radeonsi: don't do VS/PS partial flushes if SURFACE_SYNC waits too Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-04-18 19:51:25 +02:00

1 2 3 4 5 ...

80324 commits