fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-31 11:38:19 +02:00

Author	SHA1	Message	Date
Connor Abbott	b87817871b	util: Add a helper for faster remainders This should be at least as fast as using fast_idiv_by_const, and has the advantage that the precomputation is simple enough to be evaluated at Mesa-compile time for hash tables and sets which have a fixed table of possible divisors. Acked-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 19:14:27 +02:00
Connor Abbott	983b001c77	util/hash_table: Add specialized resizing add function To keep it in sync with the set implementation. Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 19:14:22 +02:00
Connor Abbott	6f9beb28bb	util/set: Add specialized resizing add function A significant portion of the time spent in nir_opt_cse for the Dolphin ubershaders was in resizing the set. When resizing a hash table, we know in advance that each new element to be inserted will be different from every other element, so we don't have to compare them, and there will be no tombstone elements, so we don't have to worry about caching the first-seen tombstone. We add a specialized add function which skips these steps entirely, speeding up resizing. Compile-time results from my shader-db database: Difference at 95.0% confidence -2.29143 +/- 0.845534 -0.529475% +/- 0.194767% (Student's t, pooled s = 1.08807) Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 19:14:16 +02:00
Connor Abbott	451211741c	util/hash_table: Pull out loop-invariant computations To keep the set and hash table in sync. Note that some of this had already been done for hash tables, in particular pulling out the hash % ht->size computation. Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 19:14:09 +02:00
Connor Abbott	f7ff685649	util/set: Pull out loop-invariant computations Unfortunately GCC can't do this for us, probably because we call the key comparison function which GCC can't prove won't modify arbitrary memory. This is a pretty hot function, so do the optimization manually to be sure the compiler will get it right. While we're here, make the computation of the new probe address use a single conditional subtract instead of a modulo, since we know that it won't ever get as big as 2 * ht->size before the modulo. Modulos tend to be pretty expensive operations. shader-db compile time results for my database: Difference at 95.0% confidence -2.24934 +/- 0.69897 -0.516296% +/- 0.159993% (Student's t, pooled s = 0.983684) Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 19:14:04 +02:00
Connor Abbott	3bd0733011	nir/instr_set: Use _mesa_set_search_or_add() Before this change, we were searching for each instruction twice, once when checking if it exists and once when figuring out where to insert it. By using the new function, we can do everything we need to do in one operation. Compilation time numbers for my shader-db database: Difference at 95.0% confidence -4.04706 +/- 0.669508 -0.922142% +/- 0.151948% (Student's t, pooled s = 0.95824) Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 19:13:59 +02:00
Connor Abbott	8a838e172f	util/set: Add a _mesa_set_search_or_add() function Unlike _mesa_set_search_and_add(), it doesn't replace an entry if it's found, returning it instead. This is useful for nir_instr_set, where we have to know both the original original instruction and its equivalent. Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 19:13:45 +02:00
Jonathan Marek	1db86d8b62	freedreno/ir3: fix input ncomp for vertex shaders ncomp is never set for vertex shaders, but a3xx and a4xx still use it. Fixes: `831f1a05c0` freedreno/ir3: rework varying packing Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@chromium.org>	2019-05-31 12:21:23 -04:00
Ian Romanick	65df6122da	intel/compiler: Use compare rematerialization pass Almost all of the spill / fill benefit is in Deus Ex. Haswell and all Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224438 -> 17196395 (-0.16%) instructions in affected programs: 1518658 -> 1490615 (-1.85%) helped: 1550 HURT: 3 helped stats (abs) min: 1 max: 170 x̄: 18.11 x̃: 2 helped stats (rel) min: 0.04% max: 8.35% x̄: 1.12% x̃: 0.45% HURT stats (abs) min: 5 max: 10 x̄: 6.67 x̃: 5 HURT stats (rel) min: 0.32% max: 0.41% x̄: 0.35% x̃: 0.32% 95% mean confidence interval for instructions value: -19.86 -16.26 95% mean confidence interval for instructions %-change: -1.19% -1.04% Instructions are helped. total cycles in shared programs: 361468455 -> 361288721 (-0.05%) cycles in affected programs: 197367688 -> 197187954 (-0.09%) helped: 990 HURT: 683 helped stats (abs) min: 1 max: 119045 x̄: 806.00 x̃: 16 helped stats (rel) min: <.01% max: 38.56% x̄: 1.06% x̃: 0.26% HURT stats (abs) min: 1 max: 12190 x̄: 905.14 x̃: 22 HURT stats (rel) min: <.01% max: 25.18% x̄: 1.16% x̃: 0.47% 95% mean confidence interval for cycles value: -315.45 100.58 95% mean confidence interval for cycles %-change: -0.31% <.01% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 12147 -> 8948 (-26.34%) spills in affected programs: 5433 -> 2234 (-58.88%) helped: 343 HURT: 0 total fills in shared programs: 25262 -> 21814 (-13.65%) fills in affected programs: 7771 -> 4323 (-44.37%) helped: 343 HURT: 3 LOST: 0 GAINED: 17 Ivy Bridge total instructions in shared programs: 12083517 -> 12081427 (-0.02%) instructions in affected programs: 540744 -> 538654 (-0.39%) helped: 786 HURT: 29 helped stats (abs) min: 1 max: 42 x̄: 2.70 x̃: 2 helped stats (rel) min: 0.06% max: 5.44% x̄: 0.55% x̃: 0.36% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.16% max: 0.95% x̄: 0.38% x̃: 0.31% 95% mean confidence interval for instructions value: -2.83 -2.30 95% mean confidence interval for instructions %-change: -0.57% -0.47% Instructions are helped. total cycles in shared programs: 180153463 -> 180124798 (-0.02%) cycles in affected programs: 72597920 -> 72569255 (-0.04%) helped: 572 HURT: 249 helped stats (abs) min: 1 max: 14830 x̄: 109.48 x̃: 13 helped stats (rel) min: <.01% max: 8.92% x̄: 0.71% x̃: 0.26% HURT stats (abs) min: 1 max: 11060 x̄: 136.37 x̃: 10 HURT stats (rel) min: <.01% max: 10.85% x̄: 0.54% x̃: 0.32% 95% mean confidence interval for cycles value: -96.22 26.39 95% mean confidence interval for cycles %-change: -0.43% -0.23% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 3625 -> 3623 (-0.06%) spills in affected programs: 46 -> 44 (-4.35%) helped: 1 HURT: 0 total fills in shared programs: 4065 -> 4061 (-0.10%) fills in affected programs: 104 -> 100 (-3.85%) helped: 1 HURT: 0 LOST: 0 GAINED: 8 Sandy Bridge total instructions in shared programs: 10879656 -> 10878699 (<.01%) instructions in affected programs: 275167 -> 274210 (-0.35%) helped: 544 HURT: 0 helped stats (abs) min: 1 max: 20 x̄: 1.76 x̃: 1 helped stats (rel) min: 0.06% max: 3.11% x̄: 0.39% x̃: 0.25% 95% mean confidence interval for instructions value: -1.97 -1.55 95% mean confidence interval for instructions %-change: -0.43% -0.36% Instructions are helped. total cycles in shared programs: 154089096 -> 154081132 (<.01%) cycles in affected programs: 4422722 -> 4414758 (-0.18%) helped: 459 HURT: 214 helped stats (abs) min: 1 max: 258 x̄: 26.67 x̃: 8 helped stats (rel) min: <.01% max: 5.45% x̄: 0.51% x̃: 0.14% HURT stats (abs) min: 1 max: 226 x̄: 19.99 x̃: 4 HURT stats (rel) min: <.01% max: 3.15% x̄: 0.34% x̃: 0.09% 95% mean confidence interval for cycles value: -15.51 -8.15 95% mean confidence interval for cycles %-change: -0.31% -0.17% Cycles are helped. total spills in shared programs: 2880 -> 2876 (-0.14%) spills in affected programs: 636 -> 632 (-0.63%) helped: 2 HURT: 0 total fills in shared programs: 3161 -> 3157 (-0.13%) fills in affected programs: 1519 -> 1515 (-0.26%) helped: 2 HURT: 0 LOST: 0 GAINED: 2 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8157361 -> 8155067 (-0.03%) instructions in affected programs: 382491 -> 380197 (-0.60%) helped: 677 HURT: 0 helped stats (abs) min: 1 max: 43 x̄: 3.39 x̃: 2 helped stats (rel) min: 0.09% max: 5.19% x̄: 0.66% x̃: 0.42% 95% mean confidence interval for instructions value: -3.76 -3.01 95% mean confidence interval for instructions %-change: -0.72% -0.59% Instructions are helped. total cycles in shared programs: 188588292 -> 188583040 (<.01%) cycles in affected programs: 3155064 -> 3149812 (-0.17%) helped: 377 HURT: 13 helped stats (abs) min: 2 max: 180 x̄: 14.13 x̃: 6 helped stats (rel) min: <.01% max: 3.96% x̄: 0.39% x̃: 0.12% HURT stats (abs) min: 2 max: 8 x̄: 5.85 x̃: 6 HURT stats (rel) min: <.01% max: 0.22% x̄: 0.06% x̃: 0.04% 95% mean confidence interval for cycles value: -15.67 -11.27 95% mean confidence interval for cycles %-change: -0.45% -0.30% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-31 08:47:03 -07:00
Ian Romanick	3ee2e84c60	nir: Rematerialize compare instructions On some architectures, Boolean values used to control conditional branches or condtional selection must be propagated into a flag. This generally means that a stored Boolean value must be compared with zero. Rather than force the generation of extra compares with zero, re-emit the original comparison instruction. This can save register pressure by not needing to store the Boolean value. There are several possible ares for future improvement to this pass: 1. Be more conservative. If both sources to the comparison instruction are non-constants, it may be better for register pressure to emit the extra compare. The current shader-db results on Intel GPUs (next commit) lead me to believe that this is not currently a problem. 2. Be less conservative. Currently the pass requires that all users of the comparison match the pattern. The idea is that after the pass is complete, no instruction will use the resulting Boolean value. The only uses will be of the flag value. It may be beneficial to relax this requirement in some cases. 3. Be less conservative. Also try to rematerialize comparisons used for discard_if intrinsics. After changing the way the Intel compiler generates cod e for discard_if (see MR!935), I tried implementing this already. The changes were pretty small. Instructions were helped in 19 shaders, but, overall, cycles were hurt. A commit "nir: Rematerialize comparisons for nir_intrinsic_discard_if too" is on my fd.o cgit. 4. Copy the preceeding ALU instruction. If the comparison is a comparison with zero, and it is the only user of a particular ALU instruction (e.g., (a+b) != 0.0), it may be a further improvment to also copy the preceeding ALU instruction. On Intel GPUs, this may enable cmod propagation to make additional progress. v2: Use much simpler method to get the prev_block for an if-statement. Suggested by Tim. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-31 08:47:03 -07:00
Ian Romanick	336eab0630	nir: Add a shallow clone function for nir_alu_instr Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Suggested-by: Matt Turner <mattst88@gmail.com>	2019-05-31 08:47:03 -07:00
Tomeu Vizoso	0e1c5cc78f	panfrost: Remove link stage for jobs And instead, link them as they are added. Makes things a bit clearer and prepares future work such as FB reload jobs. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-31 14:37:10 +02:00
Tomeu Vizoso	da9f7ab6d4	panfrost: ci: Switch to kernel 5.2-rc2 Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-31 13:51:51 +02:00
Tomeu Vizoso	77f5663cf3	panfrost: ci: Update expectations A bunch of tests have been fixed, but some regressions have appeared on T760. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-05-31 13:51:43 +02:00
Connor Abbott	78f33620e8	radeonsi/nir: Remove hack for builtins We now bounds check properly in the uniform loading fast path, so there's no need to disable it by pretending there are other UBO bindings in use. The way this looks at the variable name was causing problems when two piglit shaders, one with a name that triggered the hack and one that didn't, got hashed to the same thing after stripping out the names. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-05-31 11:03:05 +02:00
Connor Abbott	fca1a35163	radeonsi/nir: Use correct location for uniform access bound location is the API-level location, but driver_location is the actual location the uniform gets passed to the driver. This apparently only caused failures with builtins, where the location is 0 because it's represented via the state tokens instead. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-05-31 11:02:57 +02:00
Connor Abbott	6571032af1	radeonsi/nir: Correctly handle double TCS/TES varyings ac expands the store to 32-bit components for us, but we still have to deal with storing up to 8 components, and when a varying is split across two vec4 slots we have to calculate the address again for the second slot, since they aren't adjacent in memory. I didn't do this on the ac level because we should generate better indexing arithmetic for the lds store, where slots are contiguous. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-05-31 11:02:11 +02:00
Christian Gmeiner	ca19f7639a	etnaviv: blt: s/TRUE/true && s/FALSE/false Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-05-31 10:04:49 +02:00
Christian Gmeiner	9e6463e62a	etnaviv: rs: s/TRUE/true && s/FALSE/false Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-05-31 10:04:49 +02:00
Bas Nieuwenhuizen	e24a7840f6	nir: Actually propagate progress in nir_opt_move_load_ubo. Found with Jasons new metadata rework (https://gitlab.freedesktop.org/mesa/mesa/merge_requests/950). Fixes: `af355aaa07` "nir: add nir_opt_move_load_ubo() optimization pass" Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-05-31 07:45:43 +00:00
Samuel Pitoiset	9178076a46	radv: use RADV_CMD_DIRTY_DYNAMIC_* when restoring viewport/scissor Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-31 08:50:16 +02:00
Samuel Pitoiset	0e7b029d00	radv: use CmdPushConstants when restoring constants after meta operations Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-31 08:50:13 +02:00
Jason Ekstrand	f1cb3348f1	nir/split_vars: Properly bail in the presence of complex derefs Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-31 01:08:03 +00:00
Jason Ekstrand	cc59503b16	nir/vars_to_ssa: Properly ignore variables with complex derefs Because the core principle of the vars_to_ssa pass is that it globally (within a function) looks at all of the uses of a never-indirected path and does a full into-SSA on that path, it can't handle a path which has any chance of having aliasing. If a function_temp variable has a cast or anything else which may cause aliasing, we have to assume that all paths to that variable may alias and ignore the entire variable. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-31 01:08:03 +00:00
Jason Ekstrand	911ea2c66f	nir/vars_to_ssa: Use a non-null UNDEF_NODE pointer We're about to change the meaning of get_deref_node returning NULL so we need a non-NULL value to mean properly undefined. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-31 01:08:03 +00:00
Jason Ekstrand	e84194686d	nir/deref: Add a has_complex_use helper This lets passes easily detect derefs which have uses that fall outside the standard load/store/copy pattern so they can bail appropriately. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-31 01:08:03 +00:00
Jason Ekstrand	8948048c6f	nir/dead_cf: Call instructions aren't dead When we inlined cf_node_has_side_effects into node_is_dead, all the conditions flipped and we forgot to flip one. Fortunately, it doesn't matter right now because no one uses this pass on shaders with more than one function. Fixes: `b50465d197` "nir/dead_cf: Inline cf_node_has_side_effects" Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-31 01:08:03 +00:00
Dave Airlie	5441d56243	vtn: create cast with type stride. When creating function parameters, we create pointers from ssa values, this creates nir casts with stride 0, however we have no where else to get this value from. Later passes to lower explicit io need this stride value to do the right thing. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-05-31 09:57:45 +10:00
Rob Clark	372e83b95f	list: add some iterator debug Debugging use of unsafe iterators when you should have used the _safe version sucks. Add some DEBUG build support to catch and assert if someone does that. I didn't update the UPPERCASE verions of the iterators. They should probably be deprecated/removed. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-05-30 22:11:26 +00:00
Caio Marcelo de Oliveira Filho	03ce12c5ed	nir: Accept nir_var_mem_global in derefs used by phis This mode is used by PhysicalStorageBufferEXT storage class. Fixes: `8bdf5a008b` "nir: Allow derefs to be used as phi sources" Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-30 14:07:29 -07:00
Jason Ekstrand	5e43a75950	intel/blorp: Use the hardware op for CCS ambiguate on gen10+ Cannonlake hardware adds a new resolve type in 3DSTATE_PS called FAST_CLEAR_0 which does an ambiguate. Now that the hardware can do it directly, we should use that instead of binding the CCS as a render target and doing it manually. This was tested with a full Vulkan CTS run on Cannonlake. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2019-05-30 13:49:48 -07:00
Jan Zielinski	b31a31bba5	swr/rast: Enable ARB_GL_texture_buffer_range No significant changes in the code needed to enable the extension. Just updating SWR capabilities and the documentation Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-05-30 15:42:15 +00:00
Jan Zielinski	cf673747ce	swr/rast: fix 32-bit compilation on Linux Removing unused but problematic code from simdlib header to fix compilation problem on 32-bit Linux. Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-05-30 15:31:15 +00:00
Jason Ekstrand	9e403dc56e	intel/fs: Do a stalling MFENCE in endInvocationInterlock() Fixes: `939312702e` "i965: Add ARB_fragment_shader_interlock support" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-30 14:00:26 +00:00
Jason Ekstrand	859de4a748	intel/fs,vec4: Use g0 as the header for MFENCE We set header_present but then pass it some random garbage. Give it g0 instead. I'm not actually sure this does anything but g0 is the usual header data and this is what the windows driver does so it seems like a good idea. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-30 14:00:26 +00:00
Samuel Pitoiset	43cc3dc9c0	radv: enable transformFeedbackStreamsLinesTriangles The driver should already support this without any changes. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-30 15:42:36 +02:00
Samuel Pitoiset	da26013eb7	radv: implement VK_EXT_sample_locations and disable it Basically, this extension allows applications to use custom sample locations. It doesn't support variable sample locations during subpass. Note that we don't have to upload the user sample locations because the spec doesn't allow this. The extension is currently disabled because the driver needs to support variable sample locations during layout transitions. The depth decompress needs to know them and that's a bit invasive. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-30 09:52:16 +02:00
Kenneth Graunke	e917bb7ad4	iris: Avoid holding the lock while allocating pages. We only need the lock for: 1. Rummaging through the cache 2. Allocating VMA We don't need it for alloc_fresh_bo(), which does GEM_CREATE, and also SET_DOMAIN to allocate the underlying pages. The idea behind calling SET_DOMAIN was to avoid a lock in the kernel while allocating pages, now we avoid our own global lock as well. We do have to re-lock around VMA. Hopefully this shouldn't happen too much in practice because we'll find a cached BO in the right memzone and not have to reallocate it. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2019-05-30 00:46:37 -07:00
Kenneth Graunke	0cb380a6b3	iris: Move SET_DOMAIN to alloc_fresh_bo() Chris pointed out that the order between SET_DOMAIN and SET_TILING doesn't matter, so we can just do the page allocation when creating a new BO. Simplifies the flow a bit. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2019-05-30 00:15:26 -07:00
Kenneth Graunke	53878f7a89	iris: Be lazy about cleaning up purged BOs in the cache. Mathias Fröhlich reported that commit `6244da8e23` crashes. list_for_each_entry_safe is safe against removing the current entry, but iris_bo_cache_purge_bucket was potentially removing next entries too, which broke our saved next pointer. To fix this, don't bother with the iris_bo_cache_purge_bucket step. We just detected a single entry where the kernel has purged the BO's memory, and so it isn't a usable entry for our cache. We're about to continue the search with the next BO. If that one's purged, we'll clean it up too. And so on. We may miss cleaning up purged BOs that are further down the list after non-purged BOs...but that's probably fine. We still have the time-based cleaner (cleanup_bo_cache) which will take care of them eventually, and the kernel's already freed their memory, so it's not that harmful to have a few kicking around a little longer. Fixes: `6244da8e23` iris: Dig through the cache to find a BO in the right memzone Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2019-05-29 23:38:01 -07:00
Kenneth Graunke	6244da8e23	iris: Dig through the cache to find a BO in the right memzone This saves some util_vma thrash when the first entry in the cache happens to be in a different memory zone, but one just a tiny bit ahead is already there and instantly reusable. Hopefully the cost of a little extra searching won't break the bank - if it does, we can consider having separate list heads or keeping a separate VMA cache. Improves OglDrvRes performance by 22%, restoring a regression from deleting the bucket allocators in `694d1a08d3`. Thanks to Clayton Craft for alerting me to the regression. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 20:03:45 -07:00
Kenneth Graunke	4c2d9729df	iris: Tidy BO sizing code and comments Buckets haven't been power of two sized in over a decade. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:42:15 -07:00
Kenneth Graunke	7acc88a47c	iris: Move some field setting after we drop the lock. It's not much, but we may as well hold the lock for a bit less time. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:42:04 -07:00
Kenneth Graunke	76c5a19668	iris: Move cached BO allocation into a helper function. There's enough going on here to warrant a helper. This also simplifies the control flow and eliminates the last non-error-case goto. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:41:52 -07:00
Kenneth Graunke	cea6671395	iris: Fall back to fresh allocations of mapping for zero-memset fails. It is unlikely that we would fail to map a cached BO in order to zero its contents. When we did, we would free the first BO in the cache and try again with the second. It's possible that this next BO already had a map setup, in which case we'd succeed. But if it didn't, we'd likely fail again in the same manner. There's not much point in optimizing this case (and frankly, if we're out of CPU-side VMA we should probably dump the cache entirely)...so instead, just fall back to allocating a fresh BO from the kernel which will already be zeroed so we don't have to try and map it. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:41:50 -07:00
Kenneth Graunke	042f8514e6	iris: Move fresh BO allocation into a helper function. There's enough going on here to warrant a helper. More cleaning coming. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:41:22 -07:00
Kenneth Graunke	06421e5be7	iris: Do SET_TILING at a single point rather than in two places. Both the from-cache and fresh-from-GEM cases were calling SET_TILING. In the cached case, we would retry the allocation on failure, pitching one BO from the cache each time. This is silly, because the only time it should fail is if the tiling or stride parameters are unacceptable, which has nothing to do with the particular BO in question. So there's no point in retrying - we should simply fail the allocation. This patch moves both calls to bo_set_tiling_internal() below the cache/fresh split, so we have it at a single point in time instead of two. To preserve the ordering between SET_TILING and SET_DOMAIN, we move that below as well. (I am unsure if the order matters.) Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:41:08 -07:00
Kenneth Graunke	43d835cb0f	iris: Use the BO cache even for coherent buffers on non-LLC. We mark snooped BOs as non-reusable, so we never return them to the cache. This means that we'd need to call I915_GEM_SET_CACHING to make any BO we find in the cache snooped. But then again, any BO we freshly allocate from the kernel will also be non-snooped, so it has the same issue. There's really no reason to skip the cache - we may as well use it to avoid the I915_GEM_CREATE overhead. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:40:18 -07:00
Kenneth Graunke	78003014d0	iris: Fix locking around vma_alloc in iris_bo_create_userptr util_vma needs to be protected by a lock. All other callers of vma_alloc and vma_free appear to be holding a lock already. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:40:16 -07:00
Kenneth Graunke	5fc11fd988	iris: Fix lock/unlock mismatch for non-LLC coherent BO allocation. The goto jumped over the mtx_lock, but proceeded to hit the mtx_unlock. We can simply set the bucket to NULL and it will skip the cache without goto, and without messing up locking. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:40:15 -07:00

1 2 3 4 5 ...

111318 commits