fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-20 11:18:11 +02:00

Author	SHA1	Message	Date
Jason Ekstrand	cd4ffb376f	intel/fs: Account for live range lengths in spill costs The current register allocator has a concept of "spill benefit" which is based on the number of nodes with which a given node interferes. The idea is that you want to spill stuff with high interference because those are the most likely registers to help when spilling. However, this fails to take into account the length of the live range so the allocator frequently picks "cheap" (not many uses) registers which are actually very short lived and so spilling them doesn't help with the pressure situation. This commit takes into account the length of the live range to make long-lived registers more likely to get spilled than short-lived ones. This encourages the spill chooser to choose slightly larger registers which will affect a larger area of the program and hopefully we have to spill fewer of them to get the same reduction in over-all register pressure. Shader-db results on Kaby Lake: total spills in shared programs: 23664 -> 12050 (-49.08%) spills in affected programs: 19243 -> 7629 (-60.35%) helped: 296 HURT: 8 total fills in shared programs: 32028 -> 25139 (-21.51%) fills in affected programs: 20378 -> 13489 (-33.81%) helped: 295 HURT: 16 Of course, most of that is in Deus Ex... Shader-db results on Kaby Lake (without Deus Ex): total spills in shared programs: 6479 -> 5834 (-9.96%) spills in affected programs: 3231 -> 2586 (-19.96%) helped: 40 HURT: 4 total fills in shared programs: 17165 -> 17099 (-0.38%) fills in affected programs: 6951 -> 6885 (-0.95%) helped: 40 HURT: 7 Even without Deus Ex, the spill help is pretty respectable. The worst hurt shaders were one compute shader in Aztec Ruins and one fragment shader in KSP that were each hurt by around 13% fill 9% spill. VkPipeline-db results on Kaby Lake: total spills in shared programs: 9149 -> 8069 (-11.80%) spills in affected programs: 5197 -> 4117 (-20.78%) helped: 27 HURT: 16 total fills in shared programs: 26390 -> 25477 (-3.46%) fills in affected programs: 12662 -> 11749 (-7.21%) helped: 24 HURT: 22 The Vulkan results were decidedly more mixed but we don't have nearly as many apps in that database yet. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 23:04:45 +00:00
Gurchetan Singh	1fd635862f	virgl/vtest: bump up protocol version + support encoded transfers This more accurately reflects what the drm winsys does. Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Piotr Rak <p.rak@samsung.com>	2019-04-18 15:39:23 -07:00
Gurchetan Singh	b5698562e4	virgl/vtest: wait after issuing a transfer get Otherwise, there's artifacts when running Unigine Valley with protocol version 2. We can get away with not waiting for most buffers, but let's be conservative. Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Piotr Rak <p.rak@samsung.com>	2019-04-18 15:39:18 -07:00
Gurchetan Singh	581ab2bc70	virgl/vtest: modify sending and receiving data for shared memory We need to copy the shared memory region to the display target. Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Piotr Rak <p.rak@samsung.com>	2019-04-18 15:39:12 -07:00
Gurchetan Singh	96c3418e06	virgl/vtest: receive and handle shared memory fd The only tricky part is with protocol 0 we can either have a display target or resource backing store. With protocol 2 we can have both. Make the map/unmap functions only deal with the resource backing store. v2: Handle MSAA texture case. v3: spelling v4: Fix dangling else (@prak) v5: mmap --> os_mmap (@prak) + added comments (@gerddie) Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Piotr Rak <p.rak@samsung.com>	2019-04-18 15:39:05 -07:00
Gurchetan Singh	9a638bc7c2	virgl/vtest: plumb support for shared memory Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Piotr Rak <p.rak@samsung.com>	2019-04-18 15:38:58 -07:00
Gurchetan Singh	9881733e32	virgl/vtest: add utilities for receiving fds v2: recieve --> receive (airlied@) Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Piotr Rak <p.rak@samsung.com>	2019-04-18 15:38:52 -07:00
Gurchetan Singh	0dd661777a	virgl/vtest: execute a transfer_get when flushing the front buffer This just moves everything to a helper function -- "flush_front_buffer" will be used later. virgl_vtest_resource_map / virgl_vtest_resource_unmap already take care to map the display target. Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Piotr Rak <p.rak@samsung.com>	2019-04-18 15:38:44 -07:00
Gurchetan Singh	599d55371c	virgl: wait after a flush We really need to wait under certain circumstances, or we can end up writing to memory the same time the host is reading. Partial revert of d6dc68 ("virgl: use uint16_t mask instead of separate booleans"). Test cases: - dEQP-GLES31.functional.texture.texture_buffer.render_modify.as_vertex_array.bufferdata on vtest protocol version 2 - Flickering during Alien Isolation Fixes: d6dc68 ("virgl: use uint16_t mask instead of separate booleans") Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Piotr Rak <p.rak@samsung.com>	2019-04-18 15:38:04 -07:00
Lionel Landwerlin	dfd79079da	anv: fix uninitialized pthread cond clock domain Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `843775bab7` ("anv: Rework fences") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 23:23:03 +01:00
Eric Anholt	12f6c34806	v3d: Fix atomic cmpxchg in shaders on hardware. In what might be my first case of finding a divergence between hardware and simpenrose for v3d 4.x, it seems that despite what the spec claims, you actually need specific values in the TYPE field for atomic ops. Fixes dEQP-GLES31.functional..compswap.	2019-04-18 13:24:55 -07:00
Eric Anholt	1ce143ca19	v3d: Fix an invalid reuse of flags generation from before a thrsw. Noticed while debugging the last GLES 3.1 failure, though it doesn't seem to affect that bug.	2019-04-18 13:24:55 -07:00
Jason Ekstrand	db4a70e678	anv: Drop some unneeded ANV_FROM_HANDLE for physical devices Ever since `48ed2a7bb0`, we've had one at the top of the function. Reviewed-by: Caio Marcelo de Oliveira Filho caio.oliveira@intel.com	2019-04-18 20:12:57 +00:00
Jason Ekstrand	981209d175	anv: Re-sort the GetPhysicalDeviceFeatures2 switch statement Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-18 20:12:57 +00:00
Marek Olšák	7bc33a5cd5	radeonsi/gfx9: use the correct condition for the DPBB + QUANT_MODE workaround Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-18 15:58:45 -04:00
Ian Romanick	6b97fa9a99	nir/algebraic: Strength reduce some compares of x and -x Converting the x vs -x comparison to an x vs 0 comparison enable cmod propagation to help. The seems to be a win everywhere except Gen7. Skylake and Broadwell had similar results. (Broadwell shown) total instructions in shared programs: 15566733 -> 15566014 (<.01%) instructions in affected programs: 72617 -> 71898 (-0.99%) helped: 302 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.38 x̃: 2 helped stats (rel) min: 0.15% max: 7.69% x̄: 1.28% x̃: 0.98% 95% mean confidence interval for instructions value: -2.55 -2.21 95% mean confidence interval for instructions %-change: -1.40% -1.16% Instructions are helped. total cycles in shared programs: 413014786 -> 413015475 (<.01%) cycles in affected programs: 707594 -> 708283 (0.10%) helped: 227 HURT: 101 helped stats (abs) min: 1 max: 612 x̄: 36.07 x̃: 20 helped stats (rel) min: 0.04% max: 19.39% x̄: 2.25% x̃: 1.49% HURT stats (abs) min: 2 max: 334 x̄: 87.90 x̃: 45 HURT stats (rel) min: 0.07% max: 14.51% x̄: 4.54% x̃: 3.36% 95% mean confidence interval for cycles value: -8.12 12.32 95% mean confidence interval for cycles %-change: -0.67% 0.34% Inconclusive result (value mean confidence interval includes 0). Haswell and Ivy Bridge had similar results. (Haswell shown) total instructions in shared programs: 13828220 -> 13827881 (<.01%) instructions in affected programs: 60887 -> 60548 (-0.56%) helped: 253 HURT: 6 helped stats (abs) min: 1 max: 5 x̄: 1.36 x̃: 1 helped stats (rel) min: 0.16% max: 3.85% x̄: 0.81% x̃: 0.64% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.26% max: 0.89% x̄: 0.47% x̃: 0.27% 95% mean confidence interval for instructions value: -1.39 -1.23 95% mean confidence interval for instructions %-change: -0.85% -0.70% Instructions are helped. total cycles in shared programs: 386870095 -> 386894412 (<.01%) cycles in affected programs: 1537307 -> 1561624 (1.58%) helped: 127 HURT: 188 helped stats (abs) min: 1 max: 381 x̄: 17.89 x̃: 4 helped stats (rel) min: 0.02% max: 14.33% x̄: 1.00% x̃: 0.33% HURT stats (abs) min: 2 max: 5585 x̄: 141.43 x̃: 14 HURT stats (rel) min: 0.03% max: 11.50% x̄: 1.65% x̃: 1.06% 95% mean confidence interval for cycles value: 21.95 132.45 95% mean confidence interval for cycles %-change: 0.32% 0.85% Cycles are HURT. Sandy Bridge total instructions in shared programs: 10896339 -> 10896276 (<.01%) instructions in affected programs: 10757 -> 10694 (-0.59%) helped: 49 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.29 x̃: 1 helped stats (rel) min: 0.12% max: 1.85% x̄: 0.87% x̃: 0.89% 95% mean confidence interval for instructions value: -1.42 -1.15 95% mean confidence interval for instructions %-change: -1.03% -0.72% Instructions are helped. total cycles in shared programs: 155091003 -> 155090480 (<.01%) cycles in affected programs: 102761 -> 102238 (-0.51%) helped: 51 HURT: 0 helped stats (abs) min: 1 max: 36 x̄: 10.25 x̃: 4 helped stats (rel) min: 0.02% max: 2.57% x̄: 0.76% x̃: 0.36% 95% mean confidence interval for cycles value: -12.98 -7.53 95% mean confidence interval for cycles %-change: -0.97% -0.56% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8234667 -> 8234652 (<.01%) instructions in affected programs: 2063 -> 2048 (-0.73%) helped: 15 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.30% max: 1.56% x̄: 0.82% x̃: 0.81% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.97% -0.67% Instructions are helped. total cycles in shared programs: 188700906 -> 188700598 (<.01%) cycles in affected programs: 283480 -> 283172 (-0.11%) helped: 83 HURT: 3 helped stats (abs) min: 2 max: 8 x̄: 3.78 x̃: 4 helped stats (rel) min: 0.04% max: 0.55% x̄: 0.15% x̃: 0.12% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.04% 95% mean confidence interval for cycles value: -3.87 -3.29 95% mean confidence interval for cycles %-change: -0.16% -0.12% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 12:37:48 -07:00
Ian Romanick	f3d6df719c	nir/algebraic: Fix some 1-bit Boolean weirdness Skylake, Broadwell, and Haswell had similar results. (Skylake shown) total cycles in shared programs: 372594532 -> 372594460 (<.01%) cycles in affected programs: 46854 -> 46782 (-0.15%) helped: 9 HURT: 0 helped stats (abs) min: 2 max: 22 x̄: 8.00 x̃: 2 helped stats (rel) min: 0.02% max: 0.41% x̄: 0.16% x̃: 0.09% 95% mean confidence interval for cycles value: -14.34 -1.66 95% mean confidence interval for cycles %-change: -0.28% -0.04% Cycles are helped. Ivy Bridge total instructions in shared programs: 12038379 -> 12038373 (<.01%) instructions in affected programs: 1278 -> 1272 (-0.47%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.31% max: 0.77% x̄: 0.54% x̃: 0.55% total cycles in shared programs: 180889027 -> 180888997 (<.01%) cycles in affected programs: 29979 -> 29949 (-0.10%) helped: 5 HURT: 0 helped stats (abs) min: 1 max: 16 x̄: 6.00 x̃: 5 helped stats (rel) min: 0.02% max: 0.34% x̄: 0.11% x̃: 0.07% 95% mean confidence interval for cycles value: -13.40 1.40 95% mean confidence interval for cycles %-change: -0.27% 0.05% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total cycles in shared programs: 155091021 -> 155091003 (<.01%) cycles in affected programs: 8842 -> 8824 (-0.20%) helped: 2 HURT: 0 No changes on Iron Lake or GM45. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 12:37:48 -07:00
Ian Romanick	403aac7500	nir/algebraic: Replace a pattern where iand with a Boolean is used as a bcsel All of the affected shaders are in Mad Max. I noticed this while looking at some other things. I tried a couple similar patterns, but the affect on cycles was general negative. It may be worth revisiting this later. v2: Rebase on 1-bit Boolean changes. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15282073 -> 15282053 (<.01%) instructions in affected programs: 1192 -> 1172 (-1.68%) helped: 14 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.43 x̃: 1 helped stats (rel) min: 1.16% max: 2.17% x̄: 1.65% x̃: 1.39% 95% mean confidence interval for instructions value: -1.73 -1.13 95% mean confidence interval for instructions %-change: -1.91% -1.38% Instructions are helped. total cycles in shared programs: 372595954 -> 372594532 (<.01%) cycles in affected programs: 11477 -> 10055 (-12.39%) helped: 14 HURT: 0 helped stats (abs) min: 76 max: 122 x̄: 101.57 x̃: 104 helped stats (rel) min: 7.76% max: 15.62% x̄: 12.94% x̃: 14.78% 95% mean confidence interval for cycles value: -111.05 -92.09 95% mean confidence interval for cycles %-change: -14.90% -10.98% Cycles are helped. No changes on any Gen6 or earlier platforms. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 12:37:48 -07:00
Ian Romanick	25bfba3335	nir/algebraic: Recognize open-coded copysign(1.0, a) All of the affected shaders are in Mad Max. The inner part of the pattern is itself an open-coded sign(a). I tried using that as a pattern, but the results were not good. A bunch of shaders were helped for instructions, but overall cycles, spill, and fills were hurt. v2: Rebase on 1-bit Boolean changes. v3: Fix order of copysign() parameters in comments and commit message. Noticed by Matt. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15282141 -> 15282073 (<.01%) instructions in affected programs: 6106 -> 6038 (-1.11%) helped: 17 HURT: 0 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 1.02% max: 2.20% x̄: 1.15% x̃: 1.06% 95% mean confidence interval for instructions value: -4.00 -4.00 95% mean confidence interval for instructions %-change: -1.30% -1.00% Instructions are helped. total cycles in shared programs: 372597886 -> 372595954 (<.01%) cycles in affected programs: 32701 -> 30769 (-5.91%) helped: 17 HURT: 0 helped stats (abs) min: 6 max: 216 x̄: 113.65 x̃: 118 helped stats (rel) min: 0.40% max: 21.86% x̄: 6.20% x̃: 5.83% 95% mean confidence interval for cycles value: -152.84 -74.45 95% mean confidence interval for cycles %-change: -8.89% -3.51% Cycles are helped. No changes on any Gen6 or earlier platforms. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 12:37:48 -07:00
Ian Romanick	1711bf6cf2	intel/fs: Generate better code for fsign multiplied by a value v2: Rebase on v2 changes in previous two commits. v3: Rebase on `85c35885b3` ("nir: Rework nir_src_as_alu_instr to not take a pointer"). shader-db results: Skylake and Broadwell had similar results. (Skylake shown) total instructions in shared programs: 15297100 -> 15282141 (-0.10%) instructions in affected programs: 956685 -> 941726 (-1.56%) helped: 4527 HURT: 0 helped stats (abs) min: 1 max: 221 x̄: 3.30 x̃: 2 helped stats (rel) min: 0.07% max: 10.53% x̄: 1.85% x̃: 1.37% 95% mean confidence interval for instructions value: -3.48 -3.12 95% mean confidence interval for instructions %-change: -1.88% -1.81% Instructions are helped. total cycles in shared programs: 372809551 -> 372597886 (-0.06%) cycles in affected programs: 13645512 -> 13433847 (-1.55%) helped: 4362 HURT: 125 helped stats (abs) min: 1 max: 2088 x̄: 50.73 x̃: 28 helped stats (rel) min: 0.01% max: 28.20% x̄: 2.77% x̃: 2.39% HURT stats (abs) min: 1 max: 1836 x̄: 76.90 x̃: 28 HURT stats (rel) min: <.01% max: 34.36% x̄: 3.03% x̃: 1.42% 95% mean confidence interval for cycles value: -50.98 -43.37 95% mean confidence interval for cycles %-change: -2.67% -2.55% Cycles are helped. total spills in shared programs: 23465 -> 23463 (<.01%) spills in affected programs: 42 -> 40 (-4.76%) helped: 1 HURT: 0 total fills in shared programs: 31766 -> 31763 (<.01%) fills in affected programs: 69 -> 66 (-4.35%) helped: 1 HURT: 0 Haswell total instructions in shared programs: 13839992 -> 13828311 (-0.08%) instructions in affected programs: 712503 -> 700822 (-1.64%) helped: 3477 HURT: 0 helped stats (abs) min: 1 max: 221 x̄: 3.36 x̃: 2 helped stats (rel) min: 0.07% max: 10.64% x̄: 1.96% x̃: 1.52% 95% mean confidence interval for instructions value: -3.58 -3.14 95% mean confidence interval for instructions %-change: -2.01% -1.92% Instructions are helped. total cycles in shared programs: 387026330 -> 386872483 (-0.04%) cycles in affected programs: 11329966 -> 11176119 (-1.36%) helped: 3307 HURT: 139 helped stats (abs) min: 2 max: 1776 x̄: 49.58 x̃: 18 helped stats (rel) min: 0.01% max: 20.38% x̄: 2.27% x̃: 1.79% HURT stats (abs) min: 1 max: 2314 x̄: 72.68 x̃: 20 HURT stats (rel) min: <.01% max: 33.99% x̄: 2.28% x̃: 0.96% 95% mean confidence interval for cycles value: -49.31 -39.98 95% mean confidence interval for cycles %-change: -2.15% -2.01% Cycles are helped. LOST: 1 GAINED: 0 Ivy Bridge total instructions in shared programs: 12045602 -> 12038463 (-0.06%) instructions in affected programs: 623837 -> 616698 (-1.14%) helped: 2498 HURT: 0 helped stats (abs) min: 1 max: 39 x̄: 2.86 x̃: 2 helped stats (rel) min: 0.05% max: 10.00% x̄: 1.30% x̃: 1.05% 95% mean confidence interval for instructions value: -2.96 -2.75 95% mean confidence interval for instructions %-change: -1.34% -1.26% Instructions are helped. total cycles in shared programs: 181025675 -> 180891323 (-0.07%) cycles in affected programs: 11329329 -> 11194977 (-1.19%) helped: 2439 HURT: 47 helped stats (abs) min: 1 max: 1565 x̄: 57.06 x̃: 26 helped stats (rel) min: 0.02% max: 24.56% x̄: 2.02% x̃: 1.64% HURT stats (abs) min: 1 max: 1269 x̄: 102.51 x̃: 43 HURT stats (rel) min: 0.11% max: 52.94% x̄: 4.15% x̃: 1.34% 95% mean confidence interval for cycles value: -59.91 -48.17 95% mean confidence interval for cycles %-change: -1.99% -1.82% Cycles are helped. Sandy Bridge, Iron Lake, and GM45 had similar results. (Sandy Bridge shown) total instructions in shared programs: 10896368 -> 10896339 (<.01%) instructions in affected programs: 3767 -> 3738 (-0.77%) helped: 17 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 1.71 x̃: 1 helped stats (rel) min: 0.13% max: 9.52% x̄: 3.58% x̃: 2.73% 95% mean confidence interval for instructions value: -2.27 -1.14 95% mean confidence interval for instructions %-change: -5.14% -2.03% Instructions are helped. total cycles in shared programs: 155091109 -> 155091021 (<.01%) cycles in affected programs: 47241 -> 47153 (-0.19%) helped: 15 HURT: 8 helped stats (abs) min: 2 max: 81 x̄: 15.73 x̃: 4 helped stats (rel) min: 0.03% max: 10.59% x̄: 1.55% x̃: 0.71% HURT stats (abs) min: 14 max: 32 x̄: 18.50 x̃: 17 HURT stats (rel) min: 0.32% max: 2.79% x̄: 2.43% x̃: 2.71% 95% mean confidence interval for cycles value: -14.59 6.93 95% mean confidence interval for cycles %-change: -1.41% 1.08% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]	2019-04-18 12:38:05 -07:00
Ian Romanick	06d2c11641	intel/fs: Add a scale factor to emit_fsign Normally fsign generates -1, 0, or +1. The new scale factor, S, causes fsign to generate -S, 0, or +S. v2: Rebase on v2 changes in previous commit. v3: Rebase on `85c35885b3` ("nir: Rework nir_src_as_alu_instr to not take a pointer"). Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]	2019-04-18 12:37:48 -07:00
Ian Romanick	ad98fbc217	intel/fs: Refactor code generation for nir_op_fsign to its own function v2: Call emit_fsign from inside the existing switch statement. Suggested by Matt. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 12:37:48 -07:00
Ian Romanick	90430d0488	intel/fs: Eliminate dead code first This simplifies the later patch "i965/fs: Generate better code for fsign multiplied by a value". shader-db results: Broadwell and Skylake had similar results. (Skylake shown) total cycles in shared programs: 372808735 -> 372809551 (<.01%) cycles in affected programs: 1519520 -> 1520336 (0.05%) helped: 243 HURT: 277 helped stats (abs) min: 1 max: 226 x̄: 34.05 x̃: 5 helped stats (rel) min: 0.01% max: 13.88% x̄: 1.46% x̃: 0.27% HURT stats (abs) min: 1 max: 1810 x̄: 32.82 x̃: 5 HURT stats (rel) min: 0.01% max: 16.03% x̄: 1.56% x̃: 0.29% 95% mean confidence interval for cycles value: -7.18 10.32 95% mean confidence interval for cycles %-change: -0.17% 0.46% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge, Haswell and Ivy Bridge had similar results. (Sandy Bridge shown) total cycles in shared programs: 155091458 -> 155091109 (<.01%) cycles in affected programs: 370797 -> 370448 (-0.09%) helped: 24 HURT: 36 helped stats (abs) min: 1 max: 331 x̄: 103.17 x̃: 41 helped stats (rel) min: 0.02% max: 7.70% x̄: 2.07% x̃: 0.56% HURT stats (abs) min: 1 max: 291 x̄: 59.08 x̃: 10 HURT stats (rel) min: 0.02% max: 5.29% x̄: 1.02% x̃: 0.15% 95% mean confidence interval for cycles value: -37.92 26.28 95% mean confidence interval for cycles %-change: -0.88% 0.45% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (GM45 shown) total cycles in shared programs: 129133970 -> 129133978 (<.01%) cycles in affected programs: 111966 -> 111974 (<.01%) helped: 3 HURT: 1 helped stats (abs) min: 2 max: 4 x̄: 2.67 x̃: 2 helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% HURT stats (abs) min: 16 max: 16 x̄: 16.00 x̃: 16 HURT stats (rel) min: 0.07% max: 0.07% x̄: 0.07% x̃: 0.07% 95% mean confidence interval for cycles value: -12.93 16.93 95% mean confidence interval for cycles %-change: -0.05% 0.08% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 12:37:48 -07:00
Kristian H. Kristensen	a90aa14f5a	freedreno: Fix format string warning Modifiers are uin64_t. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-04-18 11:46:13 -07:00
Kristian H. Kristensen	9c82a55efc	freedreno/a6xx: Add helper for incrementing regid Increments the regid by specified amount unless regid is is r63.x (invalid). Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-04-18 11:46:13 -07:00
Kristian H. Kristensen	6aa211b316	freedreno: Use enum values from matching enum We get a couple of warnings from using mismatched enum values. This fixes that. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-04-18 11:46:13 -07:00
Kristian H. Kristensen	c34b285b38	freedreno/a2xx: Fix redundant if statement We test the condition, declare a few variables, then test the exact same condition again. Let's not do that. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-04-18 11:46:13 -07:00
Kristian H. Kristensen	18ce6ac632	freedreno/ir3: Mark ir3_context_error() as NORETURN Fixes a few warnings. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-04-18 11:46:13 -07:00
Jason Ekstrand	c6463f8ac2	nir: Add a nir_src_as_intrinsic() helper Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-18 17:12:44 +00:00
Jason Ekstrand	85c35885b3	nir: Rework nir_src_as_alu_instr to not take a pointer Other nir_src_as_* functions just take a nir_src. It's not that much more memory copying and the constness preserving really isn't worth the cognitive dissonance. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-18 17:12:44 +00:00
Jason Ekstrand	eee994e769	nir: Drop "struct" from some nir_* declarations Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-18 17:12:44 +00:00
Lionel Landwerlin	db5b372bb9	anv: implement WaEnableStateCacheRedirectToCS This 3d performance workaround was initially put in the kernel but the media driver requires different settings so the register has been whitelisted in i915 [1] and userspace drivers are left initializing it as they wish. [1] : https://patchwork.freedesktop.org/series/59494/ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-04-18 17:43:08 +01:00
Lionel Landwerlin	eaadb62c9e	i965: implement WaEnableStateCacheRedirectToCS This 3d performance workaround was initially put in the kernel but the media driver requires different settings so the register has been whitelisted in i915 [1] and userspace drivers are left initializing it as they wish. [1] : https://patchwork.freedesktop.org/series/59494/ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-04-18 17:43:08 +01:00
Lionel Landwerlin	d1be67db39	iris: implement WaEnableStateCacheRedirectToCS This 3d performance workaround was initially put in the kernel but the media driver requires different settings so the register has been whitelisted in i915 [1] and userspace drivers are left initializing it as they wish. [1] : https://patchwork.freedesktop.org/series/59494/ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-04-18 17:43:08 +01:00
Iago Toral Quiroga	c2b8fb9a81	anv/device: expose VK_KHR_shader_float16_int8 in gen8+ v2 (Jason): - Merge shaderFloat16 and shaderInt8 enablement into a single patch. - Merge extension enable. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)	2019-04-18 13:23:03 +02:00
Iago Toral Quiroga	5a5d44b713	anv/pipeline: support Float16 and Int8 SPIR-V capabilities in gen8+ v2: - Merge Float16 and Int8 capabilities into a single patch (Jason) - Merged patch that enabled SPIR-V front-end checks for these caps (except for Int8, which was already merged) v3: - Keep capabilities sorted (Jason) v4: - SpvCapabilityFloat16 support already added in master (Juan) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)	2019-04-18 13:23:03 +02:00
Iago Toral Quiroga	e6ee07a664	compiler/spirv: move the check for Int8 capability So it is right after the checks for the other various Int* capabilities. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 13:23:03 +02:00
Iago Toral Quiroga	8ed6d74c92	intel/compiler: validate region restrictions for mixed float mode v2: - Adapted unit tests to make them consistent with the changes done to the validation of half-float conversions. v3 (Curro): - Check all the accummulators - Constify declarations - Do not check src1 type in single-source instructions. - Check for all instructions that read accumulator (either implicitly or explicitly) - Check restrictions in src1 too. - Merge conditional block - Add invalid test case. v4 (Curro): - Assert on 3-src instructions, as they are not validated. - Get rid of types_are_mixed_float(), as we know instruction is mixed float at that point. - Remove conditions from not verified case. - Fix brackets on conditional. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-04-18 13:22:46 +02:00
Iago Toral Quiroga	58d6417e59	intel/compiler: validate conversions between 64-bit and 8-bit types v2: - Add some tests with UB type too (Jason) v3: - consider implicit conversions from 2src instructions too (Curro). v4: - Do not check src1 type in single-source instructions (Curro). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	7376d57a9c	intel/compiler: validate region restrictions for half-float conversions v2: - Consider implicit conversions in 2-src instructions too (Curro) - For restrictions that involve destination stride requirements only validate them for Align1, since Align16 always requires packed data. - Skip general rule for the dst/execution type size ratio for mixed float instructions on CHV and SKL+, these have their own set of rules that we'll be validated separately. v3 (Curro): - Do not check src1 type in single-source instructions. - Check restriction on src1. - Remove invalid test. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	6ff52f0628	intel/compiler: also set F execution type for mixed float mode in BDW The section 'Execution Data Types' of 3D Media GPGPU volume, which describes execution types, is exactly the same in BDW and SKL+. Also, this section states that there is a single execution type, so it makes sense that this is the wider of the two floating point types involved in mixed float mode, which is what we do for SKL+ and CHV. v2: - Make sure we also account for the destination type in mixed mode (Curro). Acked-by: Francisco Jerez <currojerez@riseup.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	100debc3c9	intel/compiler: implement SIMD16 restrictions for mixed-float instructions v2: f32to16/f16to32 can use a :W destination (Curro) v3: check destination is packed (Curro). Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	6d87c651c9	intel/compiler: skip MAD algebraic optimization for half-float or mixed mode It is very likely that this optimzation is never useful and we'll probably just end up removing it, so let's not bother adding more cases to it for now. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	64b93292ac	intel/compiler: remove inexact algebraic optimizations from the backend NIR already has these and correctly considers exact/inexact qualification, whereas the backend doesn't and can apply the optimizations where it shouldn't. This happened to be the case in a handful of Tomb Raider shaders, where NIR would skip the optimizations because of a precise qualification but the backend would then (incorrectly) apply them anyway. Besides this, considering that we are not emitting much math in the backend these days it is unlikely that these optimizations are useful in general. A shader-db run confirms that MAD and LRP optimizations, for example, were only being triggered in cases where NIR would skip them due to precise requirements, so in the near future we might want to remove more of these, but for now we just remove the ones that are not completely correct. Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	ddd1706ab3	intel/compiler: fix cmod propagation for non 32-bit types v2: - Do not propagate if the bit-size changes Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	66002eeebe	intel/compiler: add a brw_reg_type_is_integer helper v2: - Fixed typo: meant BRW_REGISTER_TYPE_UB instead BRW_REGISTER_TYPE_UV Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	44e1affaec	intel/compiler: implement is_zero, is_one, is_negative_one for 8-bit/16-bit There are no 8-bit immediates, so assert in that case. 16-bit immediates are replicated in each word of a 32-bit immediate, so we only need to check the lower 16-bits. v2: - Fix is_zero with half-float to consider -0 as well (Jason). - Fix is_negative_one for word type. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	e64be391dd	intel/compiler: generalize the combine constants pass At the very least we need it to handle HF too, since we are doing constant propagation for MAD and LRP, which relies on this pass to promote the immediates to GRF in the end, but ideally we want it to support even more types so we can take advantage of it to improve register pressure in some scenarios. v2 (Jason): - Support 64-bit types too. - Check if we need to set the half-float flag if the immediate already existed. - Multiply the size of the immediate by the width of the copy Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	fb990bd76e	intel/eu: force stride of 2 on NULL register for Byte instructions The hardware only allows a stride of 1 on a Byte destination for raw byte MOV instructions. This is required even when the destination is the NULL register. Rather than making sure that we emit a proper NULL:B destination every time we need one, just fix it at emission time. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	ce68a061de	intel/compiler: ask for an integer type if requesting an 8-bit type v2: - Assign BRW_REGISTER_TYPE_B directly for 8-bit (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00

1 2 3 4 5 ...

101590 commits