fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-28 09:58:22 +02:00

Author	SHA1	Message	Date
Lionel Landwerlin	ec526d6ba0	intel/decoders: add address space indicator to get BOs Some commands like MI_BATCH_BUFFER_START have this indicator. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-07 15:08:31 +00:00
Tapani Pälli	4900c0cff4	anv: call blob_finish when done with it Fixes leaks from anv_device_upload_nir: ==7345== 8,192 bytes in 2 blocks are definitely lost in loss record 24 of 24 ==7345== at 0x4C2ED78: malloc (vg_replace_malloc.c:308) ==7345== by 0x4C31393: realloc (vg_replace_malloc.c:836) ==7345== by 0x54E0848: grow_to_fit (blob.c:67) ==7345== by 0x54E0BE5: blob_reserve_bytes (blob.c:166) ==7345== by 0x54E0C7C: blob_reserve_intptr (blob.c:186) ==7345== by 0x54704A7: nir_serialize (nir_serialize.c:1091) ==7345== by 0x512F97D: anv_device_upload_nir (anv_pipeline_cache.c:756) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-07 07:39:48 +02:00
Tapani Pälli	a9555f37d5	anv: use anv_gem_munmap in block pool cleanup Use anv_gem_munmap for unmap when softpin in use, this corresponds to anv_gem_mmap used in anv_block_pool_expand_range. This fixes valgrind errors seen for each pool when softpin is in use: ==25581== 262,144 bytes in 1 blocks are definitely lost in loss record 31 of 31 ==25581== at 0x50E77E8: anv_gem_mmap (anv_gem.c:96) ==25581== by 0x50EEE2B: anv_block_pool_expand_range (anv_allocator.c:543) ==25581== by 0x50EEB51: anv_block_pool_init (anv_allocator.c:477) ==25581== by 0x50EF7EF: anv_state_pool_init (anv_allocator.c:920) ==25581== by 0x510B8EB: anv_CreateDevice (anv_device.c:2031) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-07 07:36:28 +02:00
Jason Ekstrand	656ace3dd8	intel/nir: Move 64-bit lowering later Now that we have a loop unrolling cost function and loop unrolling isn't going to kill us the moment we have a 64-bit op in a loop, we can go ahead and move 64-bit lowering later. This gives us the opportunity to do more optimizations and actually let the full optimizer run even on 64-bit ops rather than hoping one round of opt_algebraic will fix everything. This substantially reduces both fp64 shader compile times and the resulting code size. On the vs-isnan-dvec test from piglit: Before this commit: 1684.63s user 17.29s system 99% cpu 28:28.24 total 101479 instructions. 0 loops. 802452 cycles. 79:369 spills:fills. Peak memory usage (according to massif): 1.435 GB After this commit: 179.64s user 7.75s system 99% cpu 3:07.92 total 57316 instructions. 0 loops. 459287 cycles. 0:0 spills:fills. Peak memory usage (according to massif): 531.0 MB Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	e02959f442	nir/lower_doubles: Inline functions directly in lower_doubles Instead of trusting the caller to already have created a softfp64 function shader and added all its functions to our shader, we simply take the softfp64 shader as an argument and do the function inlining ouselves. This means that there's no more nasty functions lying around that the caller needs to worry about cleaning up. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	8993e0973f	intel/nir: Drop an unneeded lower_constant_initializers call Even though this is technically a step in the function inlining process as laid out in nir_inline_functions.c, it's not really needed. We already have constant initializers lowered here and no new ones are added by appending the softfp64 functions. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	fa4824c1db	intel/debug: Add a debug flag to force software fp64 Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Ian Romanick	55e6454d5e	intel/fs: Fix extract_u8 of an odd byte from a 64-bit integer In the old code, we would generate the exact same instruction for extract_u8(some_u64, 0) and extract_u8(some_u64, 1). The mask-a-word trick only works for even numbered bytes. This fixes the (new) piglit test tests/spec/arb_gpu_shader_int64/execution/fs-ushr-and-mask.shader_test. v2: Use a SHR instead of an AND. This saves an instruction compared to using two moves. Suggested by Jason. Fixes: `6ac2d16901` ("i965/fs: Fix extract_i8/u8 to a 64-bit destination") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-06 08:35:45 -08:00
Ian Romanick	4aaf139ea4	intel/fs: nir_op_extract_i8 extracts a byte, not a word Fixes: `6ac2d16901` ("i965/fs: Fix extract_i8/u8 to a 64-bit destination") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-06 08:35:42 -08:00
Ian Romanick	bbf20a1ca3	intel/compiler: Silence unused parameter warning in brw_interpolation_map.c The parameter is never used, and it's not part of a common interface idiom. Remove it. src/intel/compiler/brw_interpolation_map.c: In function ‘brw_setup_vue_interpolation’: src/intel/compiler/brw_interpolation_map.c:62:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter] const struct gen_device_info *devinfo) ^~~~~~~ Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-06 08:35:36 -08:00
Ian Romanick	dea19138dd	intel/compiler: Silence many unused parameter warnings in brw_eu.h In file included from src/intel/compiler/brw_eu_util.c:34:0: src/intel/compiler/brw_eu.h: In function ‘brw_message_desc_header_present’: src/intel/compiler/brw_eu.h:288:63: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_message_desc_header_present(const struct gen_device_info devinfo, ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_message_ex_desc’: src/intel/compiler/brw_eu.h:296:51: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_message_ex_desc(const struct gen_device_info devinfo, ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_message_ex_desc_ex_mlen’: src/intel/compiler/brw_eu.h:303:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_message_ex_desc_ex_mlen(const struct gen_device_info devinfo, ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_binding_table_index’: src/intel/compiler/brw_eu.h:337:68: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_sampler_desc_binding_table_index(const struct gen_device_info devinfo, ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_sampler’: src/intel/compiler/brw_eu.h:344:56: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_sampler_desc_sampler(const struct gen_device_info devinfo, uint32_t desc) ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_return_format’: src/intel/compiler/brw_eu.h:371:62: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_sampler_desc_return_format(const struct gen_device_info devinfo, ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_dp_desc_binding_table_index’: src/intel/compiler/brw_eu.h:405:63: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_dp_desc_binding_table_index(const struct gen_device_info devinfo, ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_dp_a64_untyped_atomic_desc’: src/intel/compiler/brw_eu.h:754:41: warning: unused parameter ‘exec_size’ [-Wunused-parameter] unsigned exec_size, /< 0 for SIMD4x2 / ^~~~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_dp_a64_untyped_atomic_float_desc’: src/intel/compiler/brw_eu.h:775:47: warning: unused parameter ‘exec_size’ [-Wunused-parameter] unsigned exec_size, ^~~~~~~~~ Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-06 08:35:31 -08:00
Timothy Arceri	81ee2cd8ba	glsl: rename is_record() -> is_struct() Replace was done using: find ./src -type f -exec sed -i -- \ 's/is_record(/is_struct(/g' {} \; Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 13:10:02 +11:00
Caio Marcelo de Oliveira Filho	69cc6272fb	anv: Implement VK_EXT_external_memory_host v2: Ignore the import if handleType == 0. (Jason) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-05 12:59:50 -08:00
Jason Ekstrand	43f40dc7cb	anv: Implement VK_EXT_inline_uniform_block Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	61e009d2c4	spirv: Use the same types for resource indices as pointers We need more space than just a 32-bit scalar and we have to burn all that space anyway so we may as well expose it to the driver. This also fixes a subtle bug when UBOs and SSBOs have different pointer types. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	c520f4dec9	anv: Add a concept of a descriptor buffer This buffer goes along side the CPU data structure and may contain pointers, bindless handles, or any other descriptor information. Currently, all descriptors are size zero and nothing goes in the buffer but this commit sets up the framework we will need later. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	5c30fffeec	anv: Take references to push descriptor set layouts Technically, descriptor set layouts aren't required to survive past the function they're passed into so we need to reference them. Cc: "19.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	8ab95b849e	anv: Refactor descriptor pushing a bit Pull the common code out of the two entrypoints into the helper which fetches the push descriptor set for us. Now that it does more than just get a thing, call it anv_cmd_buffer_push_descriptor_set. Cc: "19.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	cab064bc10	anv: drop add_var_binding from anv_nir_apply_pipeline_layout.c It has exactly one caller. Just inline it. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	49cf61c6aa	anv: Clean up descriptor set layouts The descriptor set layout code in our driver has undergone many changes over the years. Some of the fields which were once essential are now useless or nearly so. The has_dynamic_offsets field was completely unused accept for the code to set and hash it. The per-stage indices were only being used to determine if a particular binding had images, samplers, etc. The fact that it's per-stage also doesn't matter because that binding should never be accessed by a shader of the wrong stage. This commit deletes a pile of cruft and replaces it all with a descriptive bitfield which states what a particular descriptor contains. This merely describes the data available and doesn't necessarily dictate how it will be lowered in anv_nir_apply_pipeline_layout. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	4c50b7c92c	anv: Count image param entries rather than images This is what we're actually storing in the descriptor set and consuming when we bind surface states. This commit renames image_count to image_param_count a few places and moves the decision to not count image params on gen9+ into anv_descriptor_set.c when we build the layout. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	3822c7495a	anv: Stop allocating buffer views for dynamic buffers We emit the surface states for those on-the-fly so we don't need the buffer view. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	8c6d410a50	anv: Rework arguments to anv_descriptor_set_write_* Make them all take a device followed by a set. This is consistent with how the actual Vulkan entrypoint parameters are laid out. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	5b7a9e7398	anv/descriptor_set: Refactor alloc/free of descriptor sets This commit just puts the free list code together as part of the pool instead of having it inlined into the descriptor set create code. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Eric Engestrom	3d4238d26c	anv: use the platform defines in vk.xml instead of hard-coding them Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-05 11:57:10 +00:00
Lionel Landwerlin	e21c201c96	anv: update supported patch version Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-03-05 10:39:17 +00:00
Tapani Pälli	3bb8768b9d	anv: toggle on support for VK_EXT_ycbcr_image_arrays We already propagate coord_components correctly and did not have layer restrictions for ycbcr formats. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:39:17 +00:00
Tapani Pälli	33bf3d510c	anv: retain the is_array state in create_plane_tex_instr_implicit This does not seem to fix anything ATM but is the right thing todo. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Fixes: `f3e91e78a3` ("anv: add nir lowering pass for ycbcr textures") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:38:31 +00:00
Jason Ekstrand	0010d0348a	anv/pipeline: Drop anv_fill_binding_table We zero out the prog data anyway and, now that bias is always zero, this function is accomplishing nothing. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-04 23:56:40 +00:00
Jason Ekstrand	65ee5cc0da	anv: Use an actual binding for gl_NumWorkgroups This commit moves our handling of gl_NumWorkgroups over to work like our handling of other special bindings in the Vulkan driver. We give it a magic descriptor set number and teach emit_binding_tables to handle it. This is better than the bias mechanism we were using because it allows us to do proper accounting through the bind map mechanism. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-04 23:56:40 +00:00
Jason Ekstrand	5c96120b5c	intel,nir: Lower TXD with min_lod when the sampler index is not < 16 When we have a larger sampler index, we get into the "high sampler" scenario and need an instruction header. Even in SIMD8, this pushes the instruction over the sampler message size maximum of 11 registers. Instead, we have to lower TXD to TXL. Fixes: `cb98e0755f` "intel/fs: Support min_lod parameters on texture..." Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-04 23:56:39 +00:00
Jason Ekstrand	5049fbddb4	anv: Count surfaces for non-YCbCr images in GetDescriptorSetLayoutSupport We were accidentally not counting those surfaces Fixes: `ddc4069122` "anv: Implement VK_KHR_maintenance3" Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-04 23:56:39 +00:00
Sagar Ghuge	e551040c60	nir/glsl: Add another way of doing lower_imul64 for gen8+ On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We can reduce our 64x64 int multiplication from 4 instructions to 3. Also instead of emitting two mul instructions, we can emit single mul instuction and extract low/high 32 bits from 64 bit result for [i/u]mulExtended v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand) 2) Add lower_mul_2x32_64 flag (Matt Turner) 3) Remove associative property as bit size is different (Connor Abbott) v3: Fix indentation and variable naming convention (Jason Ekstrand) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-04 15:50:25 -08:00
Mauro Rossi	ec0f465bc5	android: anv: fix libexpat shared dependency Fixes undefined reference building errors for XML_* functions Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Cc: "19.0" <mesa-stable@lists.freedesktop.org>	2019-03-04 20:53:59 +01:00
Mauro Rossi	14e7e26a09	android: anv: fix generated files depedencies (v2) Fix anv_extrypoints.{c,h} and anv_extensions.{c,h} missing dependencies Rename the variable labels according to targets and python scripts Align the building rules as per Automake for simplification Fixes building errors during rebuils due to missing dependencies (v2) Fixed a missing $(VULKAN_API_XML) reference Fixes: `9a508b7` ("android: anv/extensions: fix generated sources build") Fixes: `dd088d4bec` ("anv/extensions: Generate a header file with extension tables") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Cc: "19.0" <mesa-stable@lists.freedesktop.org>	2019-03-04 20:53:51 +01:00
Jordan Justen	10c5579921	intel/compiler: Move int64/doubles lowering options Instead of calculating the int64 and doubles lowering options each time a shader is preprocessed, save and use the values in nir_shader_compiler_options. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-02 14:33:44 -08:00
Ian Romanick	d1d56f5f9a	intel/fs: Don't assert on b2f with a saturate modifier This ran afoul of Iris's use of nir_lower_clamp_color_outputs which applies fsat() before writes to vertex shader color outpus. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Fixes: `7725d60938` ("intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))")	2019-03-02 13:58:50 -08:00
Lionel Landwerlin	32ffd90002	anv: add support for INTEL_DEBUG=bat As requested by Ken ;) v2: Also decode simple batches (Caio) Fix u_vector usage issues (Lionel) v3: Make binding/instruction/state/surface available (Lionel) v4: Going through device pools for simple batches (Lionel) Centralize search BO callbacks into anv_device.c (Lionel) v5: Clear decoded batch buffer var after use (Caio) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-02 12:53:21 +00:00
Matt Turner	e0148bbcfd	intel/compiler: Add commas on final values of compaction table arrays Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-03-01 13:56:25 -08:00
Ian Romanick	1edf67fc3f	intel/fs: Generate if instructions with inverted conditions Per-platform results were all over the place, so I have included all the results here. There is an important note at the bottom of the commit message. Skylake total instructions in shared programs: 15184683 -> 15184679 (<.01%) instructions in affected programs: 2786 -> 2782 (-0.14%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.05% max: 0.84% x̄: 0.44% x̃: 0.44% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.96% 0.07% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 370961367 -> 370961173 (<.01%) cycles in affected programs: 205867 -> 205673 (-0.09%) helped: 5 HURT: 1 helped stats (abs) min: 1 max: 149 x̄: 39.60 x̃: 16 helped stats (rel) min: 0.02% max: 1.05% x̄: 0.45% x̃: 0.55% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03% 95% mean confidence interval for cycles value: -93.01 28.34 95% mean confidence interval for cycles %-change: -0.82% 0.08% Inconclusive result (value mean confidence interval includes 0). Broadwell total instructions in shared programs: 15465366 -> 15465362 (<.01%) instructions in affected programs: 2799 -> 2795 (-0.14%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.04% max: 0.84% x̄: 0.44% x̃: 0.44% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.96% 0.07% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 410938419 -> 410938531 (<.01%) cycles in affected programs: 566028 -> 566140 (0.02%) helped: 18 HURT: 17 helped stats (abs) min: 1 max: 16 x̄: 3.50 x̃: 1 helped stats (rel) min: <.01% max: 1.05% x̄: 0.13% x̃: <.01% HURT stats (abs) min: 1 max: 12 x̄: 10.29 x̃: 12 HURT stats (rel) min: <.01% max: 0.16% x̄: 0.08% x̃: 0.09% 95% mean confidence interval for cycles value: 0.31 6.09 95% mean confidence interval for cycles %-change: -0.10% 0.05% Inconclusive result (%-change mean confidence interval includes 0). Haswell total instructions in shared programs: 13749760 -> 13749759 (<.01%) instructions in affected programs: 2241 -> 2240 (-0.04%) helped: 1 HURT: 0 total cycles in shared programs: 385398913 -> 385398363 (<.01%) cycles in affected programs: 554914 -> 554364 (-0.10%) helped: 31 HURT: 1 helped stats (abs) min: 1 max: 453 x̄: 18.00 x̃: 6 helped stats (rel) min: <.01% max: 0.25% x̄: 0.03% x̃: 0.05% HURT stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8 HURT stats (rel) min: 0.06% max: 0.06% x̄: 0.06% x̃: 0.06% 95% mean confidence interval for cycles value: -45.88 11.51 95% mean confidence interval for cycles %-change: -0.05% -0.02% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total cycles in shared programs: 180663626 -> 180663881 (<.01%) cycles in affected programs: 472350 -> 472605 (0.05%) helped: 15 HURT: 30 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% HURT stats (abs) min: 8 max: 10 x̄: 9.00 x̃: 9 HURT stats (rel) min: 0.06% max: 0.14% x̄: 0.10% x̃: 0.10% 95% mean confidence interval for cycles value: 4.21 7.12 95% mean confidence interval for cycles %-change: 0.05% 0.08% Cycles are HURT. Sandy Bridge total cycles in shared programs: 154568664 -> 154569225 (<.01%) cycles in affected programs: 356486 -> 357047 (0.16%) helped: 1 HURT: 31 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.02% max: 0.02% x̄: 0.02% x̃: 0.02% HURT stats (abs) min: 4 max: 33 x̄: 18.16 x̃: 8 HURT stats (rel) min: 0.05% max: 0.23% x̄: 0.14% x̃: 0.10% 95% mean confidence interval for cycles value: 12.19 22.87 95% mean confidence interval for cycles %-change: 0.10% 0.16% Cycles are HURT. Iron Lake total instructions in shared programs: 8206589 -> 8206565 (<.01%) instructions in affected programs: 3024 -> 3000 (-0.79%) helped: 12 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.75% max: 0.83% x̄: 0.80% x̃: 0.80% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -0.82% -0.77% Instructions are helped. total cycles in shared programs: 187657428 -> 187656228 (<.01%) cycles in affected programs: 95748 -> 94548 (-1.25%) helped: 12 HURT: 0 helped stats (abs) min: 80 max: 120 x̄: 100.00 x̃: 100 helped stats (rel) min: 1.00% max: 1.66% x̄: 1.27% x̃: 1.21% 95% mean confidence interval for cycles value: -113.27 -86.73 95% mean confidence interval for cycles %-change: -1.43% -1.11% Cycles are helped. GM45 total instructions in shared programs: 5037569 -> 5037557 (<.01%) instructions in affected programs: 1521 -> 1509 (-0.79%) helped: 6 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.75% max: 0.83% x̄: 0.79% x̃: 0.79% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -0.83% -0.75% Instructions are helped. total cycles in shared programs: 128101478 -> 128100758 (<.01%) cycles in affected programs: 52746 -> 52026 (-1.37%) helped: 6 HURT: 0 helped stats (abs) min: 120 max: 120 x̄: 120.00 x̃: 120 helped stats (rel) min: 1.16% max: 1.66% x̄: 1.41% x̃: 1.41% 95% mean confidence interval for cycles value: -120.00 -120.00 95% mean confidence interval for cycles %-change: -1.70% -1.12% Cycles are helped. This change has almost no effect right now. However, removing this patch (but leaving the patch "nir/algebraic: Replace a bcsel of a b2f with a b2f(!(a \|\| b))") after adding a patch that removes !(a < b) -> (a >= b) optimizations (like https://patchwork.freedesktop.org/patch/264787/) has the following results on Skylake: Skylake total instructions in shared programs: 15071022 -> 15089710 (0.12%) instructions in affected programs: 1022219 -> 1040907 (1.83%) helped: 1 HURT: 3937 helped stats (abs) min: 41 max: 41 x̄: 41.00 x̃: 41 helped stats (rel) min: 1.01% max: 1.01% x̄: 1.01% x̃: 1.01% HURT stats (abs) min: 1 max: 256 x̄: 4.76 x̃: 4 HURT stats (rel) min: 0.05% max: 11.18% x̄: 2.59% x̃: 2.60% 95% mean confidence interval for instructions value: 4.56 4.93 95% mean confidence interval for instructions %-change: 2.54% 2.64% Instructions are HURT. total cycles in shared programs: 369777134 -> 370092923 (0.09%) cycles in affected programs: 17516573 -> 17832362 (1.80%) helped: 115 HURT: 3624 helped stats (abs) min: 1 max: 1721 x̄: 81.18 x̃: 28 helped stats (rel) min: <.01% max: 10.74% x̄: 1.24% x̃: 0.65% HURT stats (abs) min: 1 max: 12640 x̄: 89.71 x̃: 54 HURT stats (rel) min: <.01% max: 28.24% x̄: 4.72% x̃: 4.52% 95% mean confidence interval for cycles value: 75.21 93.71 95% mean confidence interval for cycles %-change: 4.43% 4.64% Cycles are HURT. total spills in shared programs: 9450 -> 9442 (-0.08%) spills in affected programs: 166 -> 158 (-4.82%) helped: 2 HURT: 0 total fills in shared programs: 21115 -> 21094 (-0.10%) fills in affected programs: 438 -> 417 (-4.79%) helped: 2 HURT: 0 LOST: 1 GAINED: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	7725d60938	intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a)) Since Boolean values are either -1 (true) or 0 (false), b2f(inot(a)) maps -1 => 0.0 and 0 => 1.0. This is equivalent to 1.0 + float(boolBitsToInt(a)). On Intel GPUs, ADD is one of the few instructions that can type-convert during write to destination, so we can achieve this in a single instruction: add g47F, g26D, 1D v2: Fix swizzles. v3: Fix typos in comments. Noticed by Ken. All Gen6+ platforms had similar results. (Skylake shown) Skylake total instructions in shared programs: 15185583 -> 15184683 (<.01%) instructions in affected programs: 239389 -> 238489 (-0.38%) helped: 899 HURT: 1 helped stats (abs) min: 1 max: 2 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.15% max: 1.85% x̄: 0.49% x̃: 0.44% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.09% max: 0.09% x̄: 0.09% x̃: 0.09% 95% mean confidence interval for instructions value: -1.01 -0.99 95% mean confidence interval for instructions %-change: -0.51% -0.48% Instructions are helped. total cycles in shared programs: 370964249 -> 370961508 (<.01%) cycles in affected programs: 1487586 -> 1484845 (-0.18%) helped: 420 HURT: 268 helped stats (abs) min: 1 max: 232 x̄: 22.41 x̃: 6 helped stats (rel) min: 0.05% max: 22.60% x̄: 1.30% x̃: 0.41% HURT stats (abs) min: 1 max: 230 x̄: 24.90 x̃: 10 HURT stats (rel) min: <.01% max: 21.60% x̄: 1.45% x̃: 0.52% 95% mean confidence interval for cycles value: -7.61 -0.36 95% mean confidence interval for cycles %-change: -0.44% -0.02% Cycles are helped. No changes on Iron Lake or GM45. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	cb3e21cd19	intel/fs: Use De Morgan's laws to avoid logical-not of a logic result on Gen8+ Instead of emitting ~(a & b), emit (~a \| ~b) since logical-not of operands is free on Gen8+. v2: Fix swizzles. Fix types for cmod propagation. v3: Simplify logic for inverting source of inot(ixor(a, b)). Suggested by Ken. Skylake and Broadwell had similar results. (Skylake shown) Skylake total instructions in shared programs: 15185593 -> 15185583 (<.01%) instructions in affected programs: 5673 -> 5663 (-0.18%) helped: 12 HURT: 1 helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.30% max: 5.88% x̄: 1.50% x̃: 0.70% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for instructions value: -1.66 0.13 95% mean confidence interval for instructions %-change: -2.60% -0.15% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 370977726 -> 370964249 (<.01%) cycles in affected programs: 869987 -> 856510 (-1.55%) helped: 15 HURT: 2 helped stats (abs) min: 2 max: 6640 x̄: 902.20 x̃: 16 helped stats (rel) min: <.01% max: 4.92% x̄: 1.71% x̃: 1.53% HURT stats (abs) min: 14 max: 42 x̄: 28.00 x̃: 28 HURT stats (rel) min: 1.08% max: 3.18% x̄: 2.13% x̃: 2.13% 95% mean confidence interval for cycles value: -1654.87 69.34 95% mean confidence interval for cycles %-change: -2.29% -0.23% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	8eb36c9129	intel/fs: Emit logical-not of operands on Gen8+ On Gen8+ specifying negation of a logical operation such as AND actually performs a logical-not. Take advantage of this to generate fewer instructions. v2: Major rebase. Use nir_src_as_alu_instr. Fix swizzle handling. No changes on any pre-Gen8 platform. Skylake and Broadwell had similar results. (Broadwell shown) total instructions in shared programs: 15466902 -> 15466274 (<.01%) instructions in affected programs: 1262953 -> 1262325 (-0.05%) helped: 682 HURT: 4 helped stats (abs) min: 1 max: 5 x̄: 1.02 x̃: 1 helped stats (rel) min: 0.03% max: 2.40% x̄: 0.18% x̃: 0.04% HURT stats (abs) min: 1 max: 62 x̄: 17.50 x̃: 3 HURT stats (rel) min: 0.03% max: 1.89% x̄: 0.53% x̃: 0.10% 95% mean confidence interval for instructions value: -1.10 -0.73 95% mean confidence interval for instructions %-change: -0.19% -0.15% Instructions are helped. total cycles in shared programs: 410996093 -> 410950440 (-0.01%) cycles in affected programs: 144389048 -> 144343395 (-0.03%) helped: 519 HURT: 51 helped stats (abs) min: 1 max: 1060 x̄: 104.46 x̃: 140 helped stats (rel) min: 0.01% max: 10.98% x̄: 0.34% x̃: 0.03% HURT stats (abs) min: 1 max: 4060 x̄: 167.90 x̃: 22 HURT stats (rel) min: <.01% max: 8.20% x̄: 0.96% x̃: 0.25% 95% mean confidence interval for cycles value: -97.16 -63.02 95% mean confidence interval for cycles %-change: -0.32% -0.13% Cycles are helped. total spills in shared programs: 95311 -> 95329 (0.02%) spills in affected programs: 881 -> 899 (2.04%) helped: 0 HURT: 4 total fills in shared programs: 93629 -> 93634 (<.01%) fills in affected programs: 794 -> 799 (0.63%) helped: 1 HURT: 2 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	06eaaf2de9	intel/fs: Refactor ALU source and destination handling to a separate function Other places will need to do this soon to properly handle source swizzles. The patch looks a little odd, but the change is pretty straight forward. All of the swizzle and mask handling is moved out, but the code for handling move instructions and vecN instructions remains in nir_emit_alu. I'm not terribly pleased with the "need_dest" parameter, but get_nir_dest is (somewhat surprisingly) destructive. I am open to suggestions of alternatives. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	fb3ca9109c	intel/fs: Handle OR source modifiers in algebraic optimization Found by inspection. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	c9d5bd050c	intel/fs: Relax type matching rules in cmod propagation from MOV instructions To allow cmod propagation from a MOV in a sequence like: and(16) g31<1>UD g20<8,8,1>UD g22<8,8,1>UD mov.nz.f0(16) null<1>F g31<8,8,1>D A similar change to the vec4 backend had no effect. Somewhere between `c1ec582059` and `40fc4b5acd` (1,094 commits) the effectiveness of this patch diminished, and as of commit `d7e0d47b9d` (nir: Add a bunch of b2[if] optimizations) this optimization no longer has any effect on any platform. A later patch "intel/fs: Use De Morgan's laws to avoid logical-not of a logic result on Gen8+," generates some instruction sequences that require this change in order for cmod propagation to make progress. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	d2056ab993	intel/vec4: Emit constants for some ALU sources as immediate values In some cases of flow control, the constant propagation is not able to determine that the source of an instruction must be a constant value. When we still have NIR SSA values, we can easily determine this. Emit the immediate value during code generation to possible avoid spurious loads of constants into registers. I wrote this patch to prevent a couple trivial regressions in vec4 shaders caused by "nir/algebraic: Replace i2b used by bcsel or if-statement with comparison". The final result was quite a bit better than that... No shader-db changes on any Gen8+ platform. v2: Assert that we never get a negation source modifier on Gen8+. Suggested by Ken. This should never happen because we don't normally use vec4 for Gen8+ (requires and environment variable to force it), and there's no code to generate these negations. Still, erring on the side of caution is better. Haswell total instructions in shared programs: 13776218 -> 13764783 (-0.08%) instructions in affected programs: 663931 -> 652496 (-1.72%) helped: 3495 HURT: 1 helped stats (abs) min: 1 max: 30 x̄: 3.28 x̃: 2 helped stats (rel) min: 0.21% max: 10.00% x̄: 1.79% x̃: 1.49% HURT stats (abs) min: 24 max: 24 x̄: 24.00 x̃: 24 HURT stats (rel) min: 12.24% max: 12.24% x̄: 12.24% x̃: 12.24% 95% mean confidence interval for instructions value: -3.39 -3.15 95% mean confidence interval for instructions %-change: -1.84% -1.75% Instructions are helped. total cycles in shared programs: 386818984 -> 386511910 (-0.08%) cycles in affected programs: 20379636 -> 20072562 (-1.51%) helped: 3052 HURT: 476 helped stats (abs) min: 2 max: 12516 x̄: 110.40 x̃: 6 helped stats (rel) min: 0.05% max: 24.68% x̄: 1.58% x̃: 0.69% HURT stats (abs) min: 2 max: 416 x̄: 62.76 x̃: 24 HURT stats (rel) min: 0.10% max: 10.75% x̄: 4.03% x̃: 2.18% 95% mean confidence interval for cycles value: -115.57 -58.51 95% mean confidence interval for cycles %-change: -0.93% -0.73% Cycles are helped. total spills in shared programs: 100482 -> 100480 (<.01%) spills in affected programs: 79 -> 77 (-2.53%) helped: 3 HURT: 1 total fills in shared programs: 96883 -> 96877 (<.01%) fills in affected programs: 85 -> 79 (-7.06%) helped: 4 HURT: 0 Ivy Bridge total instructions in shared programs: 12000562 -> 11990113 (-0.09%) instructions in affected programs: 572581 -> 562132 (-1.82%) helped: 3106 HURT: 0 helped stats (abs) min: 1 max: 30 x̄: 3.36 x̃: 2 helped stats (rel) min: 0.21% max: 10.00% x̄: 1.86% x̃: 1.49% 95% mean confidence interval for instructions value: -3.49 -3.23 95% mean confidence interval for instructions %-change: -1.91% -1.81% Instructions are helped. total cycles in shared programs: 180958504 -> 180664500 (-0.16%) cycles in affected programs: 19991810 -> 19697806 (-1.47%) helped: 2654 HURT: 486 helped stats (abs) min: 2 max: 12516 x̄: 121.61 x̃: 6 helped stats (rel) min: 0.05% max: 20.66% x̄: 1.48% x̃: 0.68% HURT stats (abs) min: 2 max: 396 x̄: 59.18 x̃: 24 HURT stats (rel) min: 0.05% max: 9.62% x̄: 3.82% x̃: 2.16% 95% mean confidence interval for cycles value: -125.62 -61.64 95% mean confidence interval for cycles %-change: -0.76% -0.56% Cycles are helped. Sandy Bridge total instructions in shared programs: 10842336 -> 10835438 (-0.06%) instructions in affected programs: 395340 -> 388442 (-1.74%) helped: 1926 HURT: 0 helped stats (abs) min: 1 max: 22 x̄: 3.58 x̃: 2 helped stats (rel) min: 0.10% max: 9.68% x̄: 1.78% x̃: 1.42% 95% mean confidence interval for instructions value: -3.73 -3.43 95% mean confidence interval for instructions %-change: -1.84% -1.72% Instructions are helped. total cycles in shared programs: 154590074 -> 154569050 (-0.01%) cycles in affected programs: 8159932 -> 8138908 (-0.26%) helped: 1670 HURT: 228 helped stats (abs) min: 2 max: 260 x̄: 18.13 x̃: 6 helped stats (rel) min: 0.02% max: 8.70% x̄: 0.74% x̃: 0.28% HURT stats (abs) min: 2 max: 1798 x̄: 40.58 x̃: 14 HURT stats (rel) min: 0.03% max: 12.97% x̄: 1.04% x̃: 0.31% 95% mean confidence interval for cycles value: -13.51 -8.64 95% mean confidence interval for cycles %-change: -0.60% -0.46% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8212357 -> 8206587 (-0.07%) instructions in affected programs: 323664 -> 317894 (-1.78%) helped: 1457 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 3.96 x̃: 3 helped stats (rel) min: 0.33% max: 11.49% x̄: 1.86% x̃: 1.44% 95% mean confidence interval for instructions value: -4.14 -3.78 95% mean confidence interval for instructions %-change: -1.93% -1.78% Instructions are helped. total cycles in shared programs: 187668016 -> 187657422 (<.01%) cycles in affected programs: 14856234 -> 14845640 (-0.07%) helped: 1372 HURT: 83 helped stats (abs) min: 2 max: 24 x̄: 7.92 x̃: 6 helped stats (rel) min: 0.02% max: 1.14% x̄: 0.12% x̃: 0.08% HURT stats (abs) min: 2 max: 14 x̄: 3.20 x̃: 2 HURT stats (rel) min: 0.03% max: 0.60% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for cycles value: -7.65 -6.91 95% mean confidence interval for cycles %-change: -0.11% -0.10% Cycles are helped. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:41:46 -08:00
Eric Engestrom	2793417ec6	anv: fix typo Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-01 11:20:28 +00:00
Eric Engestrom	258e463db5	anv: remove spaces around kwargs assignment pylint complains: > C0326: No space allowed around keyword argument assignment Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-01 11:20:28 +00:00
Eric Engestrom	7b704fd2fd	anv: drop unused parameter I'm guessing a previous version of this script used an index-based map of entrypoints, but that's not the case anymore. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-01 11:20:28 +00:00

... 6 7 8 9 10 ...

4278 commits