fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-02 20:58:04 +02:00

Author	SHA1	Message	Date
Jason Ekstrand	6212326941	intel/fs: Stop doing extra RA calls In the last phase of the schedule and RA loop, the RA call is redundant if we spill. Immediately afterwards, we're going to see that we couldn't allocate without spilling and call back into RA and tell it to go ahead and spill. We've known about it for a while but we've always brushed over it on the theory that, if you're going to spill, you'll be calling RA a bunch anyway and what does one extra RA hurt? As it turns out, it hurts more than you'd expect. Because the RA interference graph gets sparser with each spill and the RA algorithm is more efficient on sparser graphs, the RA call that we're duplicating is actually the most expensive call in the RA-and-spill loop. There's another extra RA call we do that's a bit harder to see which this also removes. If we try to compile a shader that isn't the minimum dispatch width and it fails to allocate without spilling we call fail() to set an error but then go ahead and do the first spilling RA pass and only after that's complete do we detect the fail and bail out. By making minimum dispatch widths part of the spill condition, we side-step this problem. Getting rid of these extra spills takes the compile time of a nasty Aztec Ruins shader from about 28 seconds to about 26 seconds on my laptop. It also makes shader-db 1.5% faster Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311100 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355468050 -> 355468050 (0.00%) cycles in affected programs: 0 -> 0 helped: 0 HURT: 0 Total CPU time (seconds): 2524.31 -> 2486.63 (-1.49%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Nanley Chery	29a13eb71d	isl: Add restrictions to isl_surf_get_hiz_surf() Import some restrictions from intel_tiling_supports_hiz() and intel_miptree_supports_hiz(). Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-14 16:23:12 +00:00
Nanley Chery	d57242190e	isl: Add restriction and comments to isl_surf_get_ccs_surf() Import some restrictions and comments from intel_miptree_supports_ccs(). Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-14 16:23:12 +00:00
Nanley Chery	1de089797c	isl: Modify restrictions in isl_surf_get_mcs_surf() Import some restrictions from intel_miptree_supports_mcs() and don't assume that the caller knows which device generations are supported. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-14 16:23:12 +00:00
Jason Ekstrand	0745d4bd96	anv: Implement VK_KHR_uniform_buffer_standard_layout There's no real work to do here since we already support scalar block layout which is a direct superset of what this extension allows. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-13 17:20:33 -05:00
Vinson Lee	20b42fad9b	intel/tools: Fix build with glibc < 2.27. glibc < 2.27 defines OVERFLOW in /usr/include/math.h. This patch fixes this build error. In file included from ../include/c99_math.h:37:0, from ../src/util/u_math.h:44, from ../src/mesa/main/macros.h:35, from ../src/intel/compiler/brw_reg.h:47, from ../src/intel/tools/i965_asm.h:32, from ../src/intel/tools/i965_gram.y:29: src/intel/tools/i965_gram.tab.c:562:5: error: expected identifier before numeric constant OVERFLOW = 412, ^ Fixes: `70308a5a8a` ("intel/tools: New i965 instruction assembler tool") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110656 Signed-off-by: Vinson Lee <vlee@freedesktop.org> Acked-by: Eric Engestrom <eric@engestrom.ch>	2019-05-13 11:05:48 -07:00
Mike Blumenkrantz	7b2468bf6e	intel: drop misleading driver name from gen_get_device_info()	2019-05-11 04:14:06 +00:00
Caio Marcelo de Oliveira Filho	3610081daa	anv: Fix limits when VK_EXT_descriptor_indexing is used Update various limits in VkPhysicalDeviceDescriptorIndexingPropertiesEXT that were previously zero to their values from VkPhysicalDeviceLimits. When using VK_EXT_descriptor_indexing, the former limits will apply to all the descriptor layout sets -- not only those using the new feature bits. For the reference, VK_EXT_descriptor_indexing says "There are new descriptor set layout and descriptor pool creation flags that are required to opt in to the update-after-bind functionality, and there are separate maxPerStage* and maxDescriptorSet* limits that apply to these descriptor set layouts which may be much higher than the pre-existing limits. The old limits only count descriptors in non-updateAfterBind descriptor set layouts, and the new limits count descriptors in all descriptor set layouts in the pipeline layout." Fixes: `6e230d7607` "anv: Implement VK_EXT_descriptor_indexing" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-10 15:15:11 -07:00
Jonathan Marek	d0bff89159	nir: allow specifying a set of opcodes in lower_alu_to_scalar This can be used by both etnaviv and freedreno/a2xx as they are both vec4 architectures with some instructions being scalar-only. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-10 15:10:41 +00:00
Jason Ekstrand	f8bda81887	intel/fs/copy-prop: Don't walk all the ACPs for each instruction In order to set up KILL sets, the dataflow code was walking the entire array of ACPs for every instruction. If you assume the number of ACPs increases roughly with the number of instructions, this is O(n^2). As it turns out, regions_overlap() is not nearly as cheap as one would like and shows up as a significant chunk on perf traces. This commit changes things around and instead first builds an array of exec_lists which it uses like a hash table (keyed off ACP source or destination) similar to what's done in the rest of the copy-prop code. By first walking the list of ACPs and populating the table and then walking instructions and only looking at ACPs which probably have the same VGRF number, we can reduce the complexity to O(n). This takes the execution time of the piglit vs-isnan-dvec test from about 56.4 seconds on an unoptimized debug build (what we run in CI) with NIR_VALIDATE=0 to about 38.7 seconds. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-10 09:10:17 -05:00
Jason Ekstrand	20bbc175a4	intel/fs/copy-prop: Purge unused ACPs If the destination of an ACP entry exists only within this block, then there's no need to keep it for dataflow analysis. We can delete it from the out_acp table and avoid growing the bitsets any bigger than we absolutely have to. This reduces the maximum number of global ACP entries in the vs-isnan-dvec with software fp64 on Kaby Lake from 8630 to 3942 and takes the execution time of the piglit vs-isnan-dvec test from about 1:16.2 on an unoptimized debug build (what we run in CI) with NIR_VALIDATE=0 to about 56.4 seconds. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-10 09:10:17 -05:00
Jason Ekstrand	0b6da5bac6	intel/fs/copy-prop: Bump the hash table size to 64 While the number of ACPs is generally not huge compared to the number of blocks, 16 does seem a bit small. Bumping it to 64 takes the execution time of the piglit vs-isnan-dvec test from about 1:18.1 on an unoptimized debug build (what we run in CI) with NIR_VALIDATE=0 to about 1:16.2. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-10 09:10:17 -05:00
Caio Marcelo de Oliveira Filho	f7d53fffa2	anv: Remove special allocation for anv_push_constants The key reason for that mechanism is gone: all the extra optional data that could be in the anv_push_constants was moved elsewhere. At this point, just put anv_push_constants directly in anv_cmd_state (part of anv_cmd_buffer). v2: Remove a NULL check we don't need anymore in anv_cmd_buffer_push_constants(). (Lionel) Fix size we consider for valid push params. (Lionel) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-09 19:01:14 -07:00
Lionel Landwerlin	f2f6ac1c08	anv: Use corresponding type from the vector allocation We didn't notice this issue much because the 2 struct share a similar layout, expect for the additional fields... We run into that issue in Anv : ==15236== Invalid write of size 8 ==15236== at 0x8CF3939C: anv_state_table_expand_range (anv_allocator.c:211) ==15236== by 0x8CF394D5: anv_state_table_grow (anv_allocator.c:264) ==15236== by 0x8CF3967E: anv_state_table_add (anv_allocator.c:312) ==15236== by 0x8CF3B13C: anv_state_pool_alloc_no_vg (anv_allocator.c:1167) ==15236== by 0x8CF3B2B0: anv_state_pool_alloc (anv_allocator.c:1190) ==15236== by 0x8CF60871: alloc_surface_state (anv_image.c:1122) ==15236== by 0x8CF61FF9: anv_CreateImageView (anv_image.c:1519) ==15236== by 0x8BCBD2ED: vkCreateImageView (trampoline.c:1358) ==15236== Address 0x8994ef10 is 0 bytes after a block of size 128 alloc'd ==15236== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==15236== by 0x8D2578E6: u_vector_init (u_vector.c:47) ==15236== by 0x8CF3929A: anv_state_table_init (anv_allocator.c:168) ==15236== by 0x8CF3A99A: anv_state_pool_init (anv_allocator.c:921) ==15236== by 0x8CF56517: anv_CreateDevice (anv_device.c:1909) ==15236== by 0x8BCB4FBA: terminator_CreateDevice (loader.c:6073) ==15236== by 0x8DD2CB3D: ??? (in /home/djdeath/.steam/ubuntu12_64/libVkLayer_steam_fossilize.so) ==15236== by 0x8DF4D241: vkCreateDevice (in /home/djdeath/.steam/ubuntu12_64/steamoverlayvulkanlayer.so) ==15236== by 0x8BCB35C6: loader_create_device_chain (loader.c:5449) ==15236== by 0x8BCBC230: vkCreateDevice (trampoline.c:838) v2: Rename mmap_cleanups to avoid confusion (Caio) v3: s/fail_mmap_cleanups/fail_cleanups/ (Caio) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110648 Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-09 21:57:26 +01:00
Eric Engestrom	6c6af0c8b0	i965_asm: avoid free()ing uninitialized pointers Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-09 10:03:15 +00:00
Eric Engestrom	51597eca84	i965_asm: fix memleak Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-09 10:03:15 +00:00
Lionel Landwerlin	43596e5f34	anv: fix use after free Once mem->bo is removed from the cache, it is likely to be freed. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `b80930a6fe` ("anv: add support for VK_EXT_memory_budget") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-08 12:02:13 +01:00
Lionel Landwerlin	a07d06f103	anv: rework queries writes to ensure ordering memory writes We use a mix of MI & PIPE_CONTROL commands to write our queries' data (results & availability). Those commands' memory write order is not guaranteed with regard to their order in the command stream, unless CS stalls are inserted between them. This is problematic for 2 reasons : 1. We copy results from the device using MI commands even though the values are generated from PIPE_CONTROL, meaning we could copy unlanded values into the results and then copy the availability that is inconsistent with the values. 2. We allow the user to poll on the availability values of the query pool from the CPU. If the availability lands in memory before the values then we could return invalid values. This change does 2 things to address this problem : - We use either PIPE_CONTROL or MI commands to write both queries values and availability, so that the ordering of the memory writes guarantees that if availability is visible, results are also visible. - For the occlusion & timestamp queries we apply a CS stall before copying the results on the device, to ensure copying with MI commands see the correct values of previous PIPE_CONTROL writes of availability (required by the Vulkan spec). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reported-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-08 09:49:09 +00:00
Matt Turner	e8c74a1e16	intel/compiler: Unset flag reg when FB write is not predicated In the FS IR we pretend that the instruction is predicated with (+f0.1) just for flag dependency tracking purposes. Since the instruction doesn't support predication before Haswell, we unset the predicate so we should also unset the flag register so that we can round-trip the disassembly. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-07 14:33:48 -07:00
Sagar Ghuge	5d7a9e0811	intel/disasm: Disassemble immediate value properly for dim On haswell, for dim instruction we encode immediate float value operand into double float, v2: Fix comment (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-07 14:33:48 -07:00
Sagar Ghuge	6c83a68ebc	intel/disasm: Disassemble JIP offset for while Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-07 14:33:48 -07:00
Sagar Ghuge	9db616e8a2	intel/compiler: Replicate 16 bit immediate value correctly For the W or UW (signed or unsigned word) source types, the 16-bit value must be replicated in both the low and high words of the 32-bit immediate value. v2: Fix replication in other places as well V3: fix a few nits (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-07 14:33:48 -07:00
Sagar Ghuge	5211159b5b	intel/compiler: Print quad value in hex format Print quad value same as unsigned quad so that we can distinguish in between quater control disassembled values for e.g 1/2/3[Q] and immediate quad value for e.g 1Q. This allows round-tripping through the assembler/disassembler. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-07 14:33:48 -07:00
Sagar Ghuge	4e828bb48a	intel/tools: Add unit tests for assembler v1: Pass executable object from meson to test(Dylan Baker) v2: Ignore generated output files from git status(Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-07 14:33:48 -07:00
Mika Kuoppala	1fb5ce0a11	intel/tools: Initialize offset correctly for i965_asm If we leave offset uninitialized, access to store will be random depending on stack value and can segfault. Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-07 14:33:48 -07:00
Mika Kuoppala	85da1194ec	intel/tools: Add meson pthread dependancy for i965_asm Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-07 14:33:48 -07:00
Sagar Ghuge	70308a5a8a	intel/tools: New i965 instruction assembler tool Tool is inspired from igt's assembler tool. Thanks to Matt Turner, who mentored me through out this project. v2: Fix memory leaks and naming convention (Caio) v3: Fix meson changes (Dylan Baker) v4: Fix usage options (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/141	2019-05-07 14:33:38 -07:00
Samuel Iglesias Gonsálvez	bc66cebc0d	anv: fix alphaToCoverage when there is no color attachment There are tests in CTS for alpha to coverage without a color attachment that are failing. This happens because we remove the shader color outputs when we don't have a valid color attachment for them, but when alpha to coverage is enabled we still want to preserve the the output at location 0 since we need the alpha component. In that case we will also need to create a null render target for RT 0. v2: - We already create a null rt when we don't have any, so reuse that for this case (Jason) - Simplify the code a bit (Iago) v3: - Take alpha to coverage from the key and don't tie this to depth-only rendering only, we want the same behavior if we have multiple render targets but the one at location 0 is not used. (Jason). - Rewrite commit message (Iago) v4: - Make sure we take into account the array length of the shader outputs, which we were no handling correctly either and make sure we also create null render targets for any invalid array entries too. v5: - Simplify removal of unused outputs by using rt_used[] so we don't have to special case alpha to coverage there too. Fixes the following CTS tests: dEQP-VK.pipeline.multisample.alpha_to_coverage_no_color_attachment.* Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-07 09:35:47 +02:00
Ian Romanick	c866500525	intel/compiler: Don't always require precise lowering of flrp No changes on any other Intel platforms. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8164367 -> 8135551 (-0.35%) instructions in affected programs: 3271235 -> 3242419 (-0.88%) helped: 13636 HURT: 90 helped stats (abs) min: 1 max: 30 x̄: 2.13 x̃: 1 helped stats (rel) min: 0.04% max: 10.77% x̄: 1.16% x̃: 0.97% HURT stats (abs) min: 1 max: 4 x̄: 1.80 x̃: 2 HURT stats (rel) min: 0.26% max: 11.11% x̄: 1.76% x̃: 0.78% 95% mean confidence interval for instructions value: -2.13 -2.07 95% mean confidence interval for instructions %-change: -1.16% -1.13% Instructions are helped. total cycles in shared programs: 188719974 -> 188586222 (-0.07%) cycles in affected programs: 70415766 -> 70282014 (-0.19%) helped: 12563 HURT: 515 helped stats (abs) min: 2 max: 600 x̄: 10.90 x̃: 6 helped stats (rel) min: <.01% max: 5.48% x̄: 0.48% x̃: 0.27% HURT stats (abs) min: 2 max: 54 x̄: 6.07 x̃: 4 HURT stats (rel) min: 0.01% max: 4.48% x̄: 0.24% x̃: 0.08% 95% mean confidence interval for cycles value: -10.56 -9.90 95% mean confidence interval for cycles %-change: -0.47% -0.45% Cycles are helped. LOST: 0 GAINED: 13 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	dd7135d55d	intel/compiler: Use the flrp lowering pass for all stages on Gen4 and Gen5 Previously lower_flrp32 was only set for vertex shaders. Fragment shaders performed a(1-c)+bc lowering during code generation. The shaders with loops hurt are SIMD8 and SIMD16 shaders for a text-identical fragment shader. v2: Rebase on `26391cceaa` ("intel/compiler: Lower ffma on Gen4 and Gen5"). v3: Rebase on `a004e95dd7` ("radeonsi/nir: create si_nir_opts() helper") Iron Lake total instructions in shared programs: 8211385 -> 8185974 (-0.31%) instructions in affected programs: 2503898 -> 2478487 (-1.01%) helped: 9936 HURT: 921 helped stats (abs) min: 1 max: 155 x̄: 2.86 x̃: 2 helped stats (rel) min: 0.10% max: 35.48% x̄: 1.67% x̃: 1.11% HURT stats (abs) min: 1 max: 12 x̄: 3.24 x̃: 2 HURT stats (rel) min: 0.21% max: 13.64% x̄: 1.86% x̃: 0.89% 95% mean confidence interval for instructions value: -2.43 -2.25 95% mean confidence interval for instructions %-change: -1.41% -1.33% Instructions are helped. total cycles in shared programs: 188523186 -> 188401198 (-0.06%) cycles in affected programs: 71541604 -> 71419616 (-0.17%) helped: 11649 HURT: 1871 helped stats (abs) min: 2 max: 930 x̄: 12.62 x̃: 6 helped stats (rel) min: <.01% max: 44.61% x̄: 0.68% x̃: 0.25% HURT stats (abs) min: 2 max: 138 x̄: 13.38 x̃: 8 HURT stats (rel) min: <.01% max: 10.99% x̄: 0.49% x̃: 0.17% 95% mean confidence interval for cycles value: -9.42 -8.63 95% mean confidence interval for cycles %-change: -0.54% -0.50% Cycles are helped. total loops in shared programs: 852 -> 856 (0.47%) loops in affected programs: 0 -> 4 helped: 0 HURT: 4 HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.00% max: 0.00% x̄: 0.00% x̃: 0.00% 95% mean confidence interval for loops value: 1.00 1.00 95% mean confidence interval for loops %-change: 0.00% 0.00% Loops are HURT. LOST: 3 GAINED: 12 GM45 total instructions in shared programs: 5046407 -> 5033694 (-0.25%) instructions in affected programs: 1303584 -> 1290871 (-0.98%) helped: 5010 HURT: 464 helped stats (abs) min: 1 max: 155 x̄: 2.85 x̃: 2 helped stats (rel) min: 0.10% max: 34.38% x̄: 1.63% x̃: 1.08% HURT stats (abs) min: 1 max: 75 x̄: 3.39 x̃: 2 HURT stats (rel) min: 0.20% max: 13.04% x̄: 1.84% x̃: 0.87% 95% mean confidence interval for instructions value: -2.45 -2.20 95% mean confidence interval for instructions %-change: -1.40% -1.28% Instructions are helped. total cycles in shared programs: 128889476 -> 128812366 (-0.06%) cycles in affected programs: 44845402 -> 44768292 (-0.17%) helped: 6079 HURT: 940 helped stats (abs) min: 2 max: 930 x̄: 15.16 x̃: 8 helped stats (rel) min: <.01% max: 41.03% x̄: 0.71% x̃: 0.25% HURT stats (abs) min: 2 max: 138 x̄: 16.01 x̃: 8 HURT stats (rel) min: <.01% max: 10.99% x̄: 0.50% x̃: 0.17% 95% mean confidence interval for cycles value: -11.63 -10.34 95% mean confidence interval for cycles %-change: -0.58% -0.52% Cycles are helped. total loops in shared programs: 633 -> 635 (0.32%) loops in affected programs: 0 -> 2 helped: 0 HURT: 2 total spills in shared programs: 60 -> 69 (15.00%) spills in affected programs: 54 -> 63 (16.67%) helped: 0 HURT: 1 total fills in shared programs: 92 -> 105 (14.13%) fills in affected programs: 80 -> 93 (16.25%) helped: 0 HURT: 1 LOST: 15 GAINED: 15 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v2] Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]	2019-05-06 22:52:29 -07:00
Ian Romanick	d41cdef2a5	nir: Use the flrp lowering pass instead of nir_opt_algebraic I tried to be very careful while updating all the various drivers, but I don't have any of that hardware for testing. :( i965 is the only platform that sets always_precise = true, and it is only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32 only for vertex shaders. For fragment shaders, nir_op_flrp is lowered during code generation as a(1-c)+bc. On all other platforms 64-bit nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old nir_opt_algebraic method. No changes on any other Intel platforms. v2: Add panfrost changes. Iron Lake and GM45 had similar results. (Iron Lake shown) total cycles in shared programs: 188647754 -> 188647748 (<.01%) cycles in affected programs: 5096 -> 5090 (-0.12%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Christian Gmeiner	4e110eca42	nir: nir_shader_compiler_options: drop native_integers Driver which do not support native integers should use a lowering pass to go from integers to floats. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-07 07:35:52 +02:00
Jason Ekstrand	30fa15e36b	anv,i965: Stop warning about incomplete gen11 support Both drivers are feature-complete and should be running more-or-less at perf at this point. Drop the warning. Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-05-03 22:57:35 +00:00
Lionel Landwerlin	80dc78407d	anv: fix crash when application does not provide push constants Found while running Talos Principle. As far as I can tell running a draw call with a pipeline having push constants without the application having called vkCmdPushConstants gives undefined push constant values. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: mesa-stable@lists.freedesktop.org	2019-05-03 10:21:40 +01:00
Caio Marcelo de Oliveira Filho	aa675cef5e	intel/fs: Assert when brw_fs_nir sees a nir_deref_instr Since `09f1de97a7` "anv,i965: Lower away image derefs in the driver" the backend compiler is not expected to handle any derefs, so let's assert on it. This helps identifying problems when a deref is not lowered and "leaks" into the backend compiler. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-02 23:25:30 -07:00
Jason Ekstrand	be7e9870d6	anv: Stop including POS in FS input limits It is an input but it comes in as part of the shader payload and doesn't count towards the limits. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-02 18:56:51 -05:00
Eric Engestrom	b80930a6fe	anv: add support for VK_EXT_memory_budget Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-30 15:40:33 +00:00
Juan A. Suarez Romero	8d621e8ff7	anv: enable descriptor indexing capabilities This enables the remaining capabilities in SPV_EXT_descriptor_indexing. Fixes: `6e230d7607` "anv: Implement VK_EXT_descriptor_indexing" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-30 09:23:46 +02:00
Rafael Antognolli	9175c7058e	intel/blorp: Make blorp update the clear color in gen11. Hardware docs say that Gen11 requires the use of two MI_ATOMICs of size QWORD when updating the clear color. The second MI_ATOMIC also needs CS Stall and Return Data Control set. v2: Remove include of srgb header (Lionel) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-29 21:19:59 +00:00
Rafael Antognolli	f8c3f408a6	intel/genxml: Update MI_ATOMIC genxml definition. Change some of the single bit fields to booleans, and add an enum with the definition of the ATOMIC_OPCODE. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-29 21:19:59 +00:00
Jordan Justen	38ffd7ce79	intel/genxml: Support base-16 in value & start fields in gen_sort_tags.py With python's int(), if the optional second parameter is 0, then python will support the 0x prefix for hex numbers. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-29 21:19:58 +00:00
Plamena Manolova	232c0f6489	isl: Set ClearColorConversionEnable. The ClearColorConversionEnable bit needs to be set for GEN11 when inderect clear colors are used. Signed-off-by: Plamena Manolova <plamena.n.manolova@gmail.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2019-04-29 21:19:58 +00:00
Eric Engestrom	7ca8ba199f	delete autotools .gitignore files One special case, `src/util/xmlpool/.gitignore` is not entirely deleted, as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`). Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-29 21:17:19 +00:00
Lionel Landwerlin	9628631a38	Revert "anv: limit URB reconfigurations when using blorp" In commit 0d46e404 ("anv: limit URB reconfigurations when using blorp") we tried to limit the number of URB reconfiguration by checking if the last allocation is large enough to fit the blorp dispatch. We used the last bound pipeline to compare the allocation. The problem with this is that the pipeline is bound but its commands might not have been emitted into the command buffer yet. Let's just revert commit `0d46e40467` since it didn't seem to yield any performance improvement. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 0d46e404 ("anv: limit URB reconfigurations when using blorp") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110535 Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-29 11:41:27 +00:00
Kenneth Graunke	9dcf90d7ba	intel/fs: Don't emit empty ELSE blocks. While we can clean this up later, it's trivial to not generate the stupid code in the first place, which saves some optimization work. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-28 22:36:09 -07:00
Tapani Pälli	376c3e8f87	anv: expose VK_EXT_queue_family_foreign on Android VK_ANDROID_external_memory_android_hardware_buffer requires this extension. It is safe to enable it since currently aux usage is disabled for ahw buffers. Fixes following dEQP extension dependency test on Android: dEQP-VK.api.info.device#extensions Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-29 07:31:02 +03:00
Jason Ekstrand	934f178341	anv/descriptor_set: Don't fully destroy sets in pool destroy/reset In `105002bd2d`, we fixed a memory leak bug where we weren't properly destroying descriptor when destroying/resetting a descriptor pool. However, the only real leak that happened was that we we take a reference to the descriptor set layout in the descriptor set and we weren't dropping our reference. Everything else in the descriptor set is tied to the pool itself and doesn't need to be freed on a per-set basis. This commit changes the destroy/reset functions to only bother walking the list of sets to unref the layouts and otherwise we just assume that the whole-pool destroy/reset takes care of the rest. Now that we're doing more non-trivial things with descriptor sets such as allocating things with util_vma_heap, per-set destruction is starting to show up on perf traces. This takes reset back to where it's supposed to be as a cheap whole-pool operation. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-26 05:40:28 +00:00
Jason Ekstrand	baf4802e3e	anv: Better handle 32-byte alignment of descriptor set buffers In `c520f4dec9`, we chose to align the sizes of descriptor set buffers to 32 bytes. We have to align the descriptor set buffer to 32B so that it's valid for using with push constants. We align the size as well so we don't leave lots of holes with util_vma_heap_alloc. Unfortunately, we were only aligning it for alloc and not for free so we were still creating piles of holes when we delete descriptor sets. This causes terrible perf for the allocator once we've deleted piles of descriptor sets. This commit reworks the code so that we align the descriptor set buffer size to 32B for both alloc and free. The result is that it takes the new crucible vkResetDescriptorPool from 104.567719 to 2.898354 seconds. Fixes: `c520f4dec9` "anv: Add a concept of a descriptor buffer" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110497 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-26 05:40:28 +00:00
Caio Marcelo de Oliveira Filho	055f6281d4	intel/fs: Don't handle texop_tex for shaders without implicit LOD These will be lowered by nir_lower_tex() with the lower_tex_when_implicit_lod_not_supported, so don't need the extra handling here. Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-25 12:13:06 -07:00
Topi Pohjolainen	ff642fb0e6	intel/compiler/fs/icl: Use dummy masked urb write for tess eval One cannot write the URB arbitrarily and therefore the message has to be carefully constructed. The clever tricks originate from Kenneth and Jason, I'm just writing the patch. Fixes GPU hangs on ICL with Vulkan CTS. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2019-04-25 22:00:43 +03:00

1 2 3 4 5 ...

4212 commits