fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-29 18:28:14 +02:00

Author	SHA1	Message	Date
Axel Davy	3bf02d383f	st/nine: Partial software vertex processing support Software Vertex Processing allows: . Less limitations for shaders (more loops, etc) . Less limitations for ff (more enabled lights, 255 matrices for VertexBlend) In particular shaders can get more constants. This patch implements support for this (not using software rendering, but hardware rendering, as llvmpipe and dx10+ hw have the same limits...) This is considered a second class path. Even apps asking for "Mixed Vertex processing" (ie the ability to switch to swvp on demand) do not use the feature much. Some just initialize more constants than the normal limit at the start of the application, but never use more than the normal limit. When the apps do not need the software vertex processing features, they do not seem to turn it on. This means it is ok if that path is slow. Thus no care has been made to make the path optimized. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	f8c8f44244	st/nine: Rework vs int and bool constants buffer This will help to support swvp constants. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	a83dce0128	st/nine: Change dirty tracking for vs int and bool constants This change makes easier to introduce tracking for swvp constants. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	f78089b962	st/nine: Drop unused constant upload path This path has been disabled for some time because of some bugs with it. It hasn't been updated to the new features, and is not faster. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Patrick Rudolph <siro@das-labor.org>	2016-10-10 23:43:49 +02:00
Axel Davy	1604efa6fd	st/nine: Add support for swvp constants in shaders swvp has relaxed limits (more nested loops, etc). In particular it enables more constants. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	56ea3df7d4	st/nine: Initial mixed vertex processing support In mixed vertex processing, the user can enable or disable software vertex processing. It is on hardware by default. This feature is not a state, and thus the setting doesn't need to be recorded by stateblocks. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	747f1ef8b6	st/nine: Implement SetNPatchMode Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	ded7a73eb3	st/nine: Implement D3DUSAGE_SOFTWAREPROCESSING Buffers with this flag must be usable with both software and hardware vertex processing. Use Staging for fast cpu access. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Patrick Rudolph <siro@das-labor.org>	2016-10-10 23:43:49 +02:00
Patrick Rudolph	19703f2a36	st/nine: Allocate more space for ATI1 ATIx are "unknown" formats that do not follow block format conventions. Tests showed that pitch*height bytes are allocated. apitrace used to depend on this behaviour. It used to copy more bytes than it has to for the ATI1 block format, but it didn't crash on Windows. Increase buffersize for ATI1 to fix this crash. The same issue was present in WINE but a patch has been sent by me. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Patrick Rudolph	ec6c636722	st/nine: Add missing break Add missing break instruction. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	03f60a3357	st/nine: Implement relative addressing for ps inputs To implement the feature we copy the ps inputs to a temp array. This is not optimal for performance, but it is the simplest solution. This is a feature that is very very rarely used. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	a5d308e51a	st/nine: Wait for pending tasks to execute in swapchain Fixes crash after Reset() when using thread_submit=true Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	f090705075	st/nine: Use fixed size arrays for swapchain buffers Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Patrick Rudolph	a719800cb8	st/nine: Fix buffer count check for Ex devices Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	9ff0dc3129	st/nine: Disable seamless cubemap for d3d d3d9 doesn't have seamless cubemap. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	f0ec54ee32	st/nine: Fix some check flags Uses the new defines introduced in previous commit. See comment in the commit for more explanation. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:49 +02:00
Axel Davy	39e98d351f	st/nine: Unify some check flags The new defines will be reused in a later patch. Signed-off-by: Axel Davy <axel.davy@ens.fr>	2016-10-10 23:43:48 +02:00
Axel Davy	2290eac84e	gallium/util: Really allow aliasing of dst for u_box_union_* Gallium nine relies on aliasing to work with this function. Without this patch, dirty region tracking was incorrect, which could lead to incorrect textures or vertex buffers. Fixes several game bugs with nine. Fixes https://github.com/iXit/Mesa-3D/issues/234 Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Cc: "12.0" <mesa-stable@lists.freedesktop.org>	2016-10-10 23:43:48 +02:00
Axel Davy	5e7f0ebe29	softpipe: Cap to 2 GB on 32 bits On 32 bits system, application memory is quite limited. softpipe uses application memory. To help prevent memory exhaustion, limit reported memory availability to 2GB. Some gallium nine apps do check reported memory by allocating resources until memory is full. Gallium nine refuses allocations when 80% of the reported memory limit is used. This change helps some apps to start. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-10-10 23:43:48 +02:00
Axel Davy	814ca96d0d	llvmpipe: Cap to 2 GB on 32 bits On 32 bits system, application memory is quite limited. llvmpipe uses application memory. To help prevent memory exhaustion, limit reported memory availability to 2GB. Some gallium nine apps do check reported memory by allocating resources until memory is full. Gallium nine refuses allocations when 80% of the reported memory limit is used. This change helps some apps to start. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-10-10 23:43:48 +02:00
Axel Davy	218459771a	gallium/os: Fix overflow on 32 bits On systems with more than 4GB of ram, os_get_total_physical_memory was triggering an integer overflow for the linux and haiku path, when on 32 bits. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94561 Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-10 23:43:48 +02:00
Axel Davy	9904581dc6	st/nine: Memset pipe_resource templates Fixes regression introduced by `ecd6fce261` and is more future proof than just clearing the next field. Other nine usages did already zero out the templates. Signed-off-by: Axel Davy <axel.davy@ens.fr> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-10-10 23:43:48 +02:00
Samuel Pitoiset	d43151318a	nvc0: fix valid range for shader buffers When offset != 0, the valid range was wrong because the second argument of util_range_add() is end, not size. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-10 21:32:16 +02:00
Ilia Mirkin	5239bd5920	nvc0/ir: fix overwriting of value backing non-constant gather offset Normally the value is an immediate, which is moved to some temporary, so there's no problem. In the case of a non-constant offset (as allowed by ARB_gpu_shader5), we have to take care to copy it first before using it to build up the bits. This fixes a compilation error observed in F1 2015. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2016-10-10 14:28:32 -04:00
Ilia Mirkin	ec05331a7b	nv50/ir: only stick one preret per function A function with multiple returns would have had multiple preret settings at the top of the function. While this is unlikely to have caused issues since we don't use functions in earnest, it could have in some cases overflowed the call stack, in case a function had a lot of early returns. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-10 10:45:06 -04:00
Nicolai Hähnle	1f95121626	radeonsi: make more use of si_have_tgsi_compute Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-10 10:38:33 +02:00
Nicolai Hähnle	38cfd5160a	gallium/radeon: assign a name to LLVM output variables in debug builds This can be helpful with R600_DEBUG=preoptir. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-10 10:38:30 +02:00
Nicolai Hähnle	39a29c2431	gallium/radeon: avoid redundant work with overlapping in/out arrays Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-10 10:37:50 +02:00
Nicolai Hähnle	77c81164bc	radeonsi: support ARB_compute_variable_group_size Not sure if it's possible to avoid programming the block size twice (once for the userdata and once for the dispatch). Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-10-10 10:36:42 +02:00
Rob Clark	495ba8884a	gallium: add missing zero-init for resource templates Mostly test code, plus one spot I noticed in r600. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-07 15:50:46 -04:00
Rob Clark	3ebfc44b42	freedreno: don't try to shadow layered textures We will only hit this with multi-planar YUV external images, so we would probably never hit this code path in the first place. But if we did, it wouldn't do the right thing so just bail. Signed-off-by: Rob Clark <robdclark@gmail.com>	2016-10-07 15:50:46 -04:00
Rob Clark	f88f025e8c	freedreno/a3xx+a4xx: fix clip-plane lowering state If enabled clip-planes have changed, we need to mark program state dirty. Signed-off-by: Rob Clark <robdclark@gmail.com>	2016-10-07 15:50:46 -04:00
Eric Anholt	20d91e5ce9	vc4: Don't worry about partial Z/S clear if the other is already cleared. We have to be careful to not smash the value they're clearing to, but other than that we're fine. Avoids quad clears in Processing, which likes to do glClear(Z\|S); glClear(Z). Improves performance of Processing's QuadRendering demo at 5000 quads by 5.46507% +/- 1.35576% (n=15 before, 32 after)	2016-10-06 18:29:16 -07:00
Eric Anholt	cb328123fe	vc4: Try to fix the HW-2116 workaround. We were incrementing the count at the end of vc4_start_draw(), except that that function returns immediately if we've already started drawing on this batch. It also failed to count the statechanges from the GFXH-515 workaround. This incidentally allows repeated glClear() to be coalesced, because the fast clears aren't counted in draw_calls_queued any more. Fixes most of the extra flushes in Processing, which emits glClear(Z\|S); glClear(Z); glClear(C) during its frame setup. Improves performance of Processing's QuadRendering demo at 5000 quads by 3.33538% +/- 2.05846% (n=21 before, 15 after)	2016-10-06 18:29:12 -07:00
Eric Anholt	bca9a58d04	vc4: Drop dead argument from vc4_start_draw().	2016-10-06 18:09:24 -07:00
Eric Anholt	9421a6065c	vc4: Fix fallback to quad clears of depth in GLX. The fix in the vc4-jobs series ended up triggering the fallback path on GLX apps that use depth but not stencil.	2016-10-06 18:09:24 -07:00
Eric Anholt	8810270d06	vc4: Add the format name in miptree_debug. I was curious if my Z/S buffer was actually ZS or ZX, and the vc4 format of "0" didn't tell me much.	2016-10-06 18:09:24 -07:00
Eric Anholt	ee577e7fa7	vc4: Fix perf debug formatting on partial Z/S clear.	2016-10-06 18:09:24 -07:00
Eric Anholt	7c7bcbbc7d	vc4: Drop destination register when it's unused. This slightly reduces instructions on shader-db, but I think it's just perturbing register allocation -- the allocator should have always trivially colored these nodes, before. This commit is just to make QIR code failing more intelligible when register allocation fails.	2016-10-06 18:09:24 -07:00
Eric Anholt	d4ae5ca823	vc4: Fix live intervals analysis for screening defs in if statements. If a conditional assignment is only conditioned on the exec mask, that's still screening off the value in the executed channels (and, since we're not storing to the unexcuted channels, we don't care what's in there). Fixes a bunch of extra register pressure on Processing's Ribbons demo, which is failing to allocate.	2016-10-06 18:09:24 -07:00
Eric Anholt	06cc3dfda4	vc4: Fix simulator when more than one vc4_screen is opened. We would assertion fail in setting up the simulator the second time around. This at least postpones the assertion failure until we've closed all of the first set of screens and started opening a new set.	2016-10-06 18:09:24 -07:00
Eric Anholt	b30205b112	vc4: Fix assertion fails from trying to cast non-ALU instrs to ALU. Fixes 100 piglit tests since the assertions were added to nir.h. What's amazing is that these tests used to pass, even when casting garbage.	2016-10-06 18:09:24 -07:00
Samuel Pitoiset	28ecd3eac2	nv50/ir: fix wrong check when optimizing MAD to SHLADD Checking if MAD is supported is definitely wrong, and it's more likely a typo I introduced few days ago which breaks NV50 because SHLADD is not supported there. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-07 01:13:06 +02:00
Samuel Pitoiset	a198883bf7	nvc0: dump program binary only when NV50_PROG_DEBUG is set When the chipset is forced with NV50_PROG_CHIPSET, we actually only want to output the binary if NV50_PROG_DEBUG is also enabled. Otherwise, this pollutes the shader-db output. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-07 01:01:17 +02:00
Samuel Pitoiset	56a0bed2c1	nvc0: expose ARB_compute_variable_group_size Only expose 512 threads/block on Fermi to not be limited by 32 GPRs/thread. v4: - use 512 threads on Fermi, 1024 on Kepler+ Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-07 00:18:57 +02:00
Samuel Pitoiset	11e75fffeb	nv50/ir: set number of threads/block for variable local size When a variable local size is defined as specified by ARB_compute_variable_group_size, the fixed local size is set to 0 and a SIGFPE occurs when we compute the maximum number of regs. This allows to use 64 GPRs/thread. v4: - use 512 threads on Fermi, 1024 on Kepler+ Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-10-07 00:18:57 +02:00
Samuel Pitoiset	07bb4513c6	gallium: add PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK v3: - use a new case statement in r600_pipe_common.c - fix compilation of softpipe... Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-07 00:18:57 +02:00
Karol Herbst	f96945c5b5	nv50/ir: optimize sub(a, 0) to a helped some ue4 demos and divinity OS shaders total instructions in shared programs : 2818674 -> 2818606 (-0.00%) total gprs used in shared programs : 379273 -> 379273 (0.00%) total local used in shared programs : 9505 -> 9505 (0.00%) total bytes used in shared programs : 25837792 -> 25837192 (-0.00%) local gpr inst bytes helped 0 0 33 33 hurt 0 0 0 0 Signed-off-by: Karol Herbst <karolherbst@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>	2016-10-06 19:39:51 +02:00
Jason Ekstrand	2ed17d46de	nir: Make nir_foo_first/last_cf_node return a block instead One of NIR's invariants is that control flow lists always start and end with blocks. There's no good reason why we should return a cf_node from these functions since we know that it's always a block. Making it a block lets us remove a bunch of code. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-10-06 09:16:37 -07:00
Steven Toth	e00fdd643b	gallium/hud: Remove superfluous debug No longer required. Signed-off-by: Steven Toth <stoth@kernellabs.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-10-06 16:37:06 +01:00

1 2 3 4 5 ...

28888 commits