fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 22:28:06 +02:00

Author	SHA1	Message	Date
Samuel Pitoiset	d82dfca872	radv: enable fast depth/stencil clears with separate aspects on GFX8 It's similar to GFX9+. Shadow of Mordor (Vulkan beta) hits that path and it works fine. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-28 07:54:11 +00:00
Eric Engestrom	c2430f3edc	radv: fix empty-body instruction Fixes: `8d43e2b2de` ("meson: add -Werror=empty-body to disallow `if(x);`") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-10-27 22:10:31 +00:00
Timothy Arceri	cff53da374	radv: enable secure compile support Can be enabled via the environment variable which tells the driver how many compilation threads are expected to be called, and therefore how many forked processes the driver should create. For example we would expect to call fossilize replay with something like this: RADV_SECURE_COMPILE_THREADS=8 ./fossilize-replay --num-threads 8 \ --shader-cache-size 0 --ignore-derived-pipelines pipeline_cache.foz Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-26 13:04:12 +11:00
Timothy Arceri	57c95d2ce2	radv: a support for a secure compile fork at device creation This added support for the fork, the installation of the seccomp filter, and the main loop for the actual compilation to be called from i.e. run_secure_compile_device(). Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-26 13:04:12 +11:00
Timothy Arceri	3f2283b3e2	radv: add radv_secure_compile() This function will be called by the parent process when doing a secure compile. It first selects a free process to work with then passes it all the information it needs to compile the pipeline. Once the pipeline information has been passed to the secure process, it then waits around to read/write any disk cache entries required before exiting. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-26 13:04:12 +11:00
Timothy Arceri	07692f703f	radv: for secure compile exit early from radv_shader_variant_create() We don't have permission to be creating shared memory etc. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-26 13:04:12 +11:00
Timothy Arceri	5cd437b1ed	radv: allow the secure process to read and write from disk cache This allows the secure process to read and write to the disk cache via the parent process. This commit just adds the functionality needed for the secure process, the following commit will add the functionality for the parent process. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-26 13:04:12 +11:00
Timothy Arceri	5d25aee005	radv: add radv_device_use_secure_compile() helper Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-26 13:04:12 +11:00
Timothy Arceri	d33f2165c9	radv: add some new members to radv device and instance for secure compile These will be used by the following commits to hold information about the forked secure compile processes. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-26 13:04:12 +11:00
Timothy Arceri	e8cb13d499	radv: add radv_secure_compile_type enum This will be used to identify information being passed between the parent and secure process during a secure compile. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-26 13:04:12 +11:00
Timothy Arceri	2d2b113e86	radv: add radv_create_shaders() to radv_shader.h In a follwing commit we want to be able to call this for secure compiles from radv_device.c Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-26 13:04:12 +11:00
Timothy Arceri	6571000071	radv: add debug option to turn off in memory cache This can be usefull for debugging the on disk cache, but is also useful in the following patch for secure compiles which will be used to compile huge pipeline collections. These pipeline collections can be multiple GBs and the in memory cache grows to multiple GBs very quickly when they are compiled so we want to be able to turn off the in memory cache. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-26 13:04:12 +11:00
Timothy Arceri	637776629d	radv: get topology from pipeline key rather than VkGraphicsPipelineCreateInfo This is cleaner and avoids having to read/write an additional copy of topology for use with secure compile. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-26 13:04:12 +11:00
Timur Kristóf	c580f134ae	aco: Refactor hazard mitigations, separate pass for GFX10. GFX10 hazards require a different approach compared to previous generations, for example it doesn't need s_nop, and most hazards can't be solved by adding NOPs at all. Also, they are not resolved by branch instructions. This commit reorganizes aco_insert_NOPs so that there is now a separate pass for GFX10. The new GFX10 pass also respects the control flow of the shader. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-25 10:10:42 +02:00
Timur Kristóf	b01847bd94	aco/gfx10: Fix mitigation of VMEMtoScalarWriteHazard. This commit refines the VMEMtoScalarWriteHazard mitigation, based upon a closer look at what LLVM does. Also changes the code to match the structure of the other hazard mitigations. * The hazard is not only triggered by VMEM, FLAT and GLOBAL but also SCRATCH and DS instructions. * The SMEM/SALU instructions only cause a hazard when they write a register that the VMEM/etc. are reading. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-25 10:10:42 +02:00
Timur Kristóf	c037ba1bb7	aco/gfx10: Mitigate LdsBranchVmemWARHazard. There is a hazard caused by there is a branch between a VMEM/GLOBAL/SCRATCH instruction and a DS instruction. This commit adds a workaround that avoids the problem. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-25 10:10:42 +02:00
Timur Kristóf	09d676d81a	aco/gfx10: Mitigate SMEMtoVectorWriteHazard. There is a hazard that happens when an SMEM instruction reads an SGPR and then a VALU instruction writes that same SGPR. This commit adds a workaround that avoids the problem. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-25 10:10:42 +02:00
Timur Kristóf	d6dfce02d0	aco/gfx10: Mitigate VcmpxExecWARHazard. There is a hazard when a non-VALU instruction reads the EXEC mask and then a VALU instruction writes the EXEC mask. This commit adds a workaround that avoids the problem. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-25 10:10:42 +02:00
Timur Kristóf	e5a8616973	aco/gfx10: Mitigate VcmpxPermlaneHazard. Any permlane instruction that follows any VOPC instruction can cause a hazard, this commit implements a workaround that avoids this causing a problem. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-25 10:10:42 +02:00
Timur Kristóf	99aed688d3	aco/gfx10: Add notes about some GFX10 hazards. ACO currently mitigates VMEMtoScalarWriteHazard and Offset3fBug (names from LLVM). There are some bugs that ACO needn't care about. Just to be on the safe side, add an assertion that makes sure that we aren't hit by FlatSegmentOffsetBug. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-25 10:10:41 +02:00
Samuel Pitoiset	2bf8a9b337	radv: fix VK_KHR_shader_float_controls dependency on GFX6-7 From the Vulkan spec 1.1.126 : "VK_SHADER_FLOAT_CONTROLS_INDEPENDENCE_32_BIT_ONLY_KHR specifies that shader float controls for 32-bit floating point can be set independently; other bit widths must be set identically to each other." Forgot to update this when I enabled that extension recently. Fixes dEQP-VK.spirv_assembly.instruction.compute.float_controls.independence_settings.independence_setting Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-25 07:49:20 +02:00
Samuel Pitoiset	4b17311e52	radv: compute the number of records correctly for vertex buffers On GFX8 the number of records is in bytes while on other chips it's in units of "stride". Fixes dEQP-VK.robustness.vertex_access..draw.vertex_ on RAVEN. Tested on GFX6, GFX8, GFX10 and RAVEN. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-24 17:14:43 +02:00
Rhys Perry	fc04a2fc31	aco: take LDS into account when calculating num_waves pipeline-db (Vega): SGPRS: 344 -> 344 (0.00 %) VGPRS: 424 -> 524 (23.58 %) Spilled SGPRs: 84 -> 80 (-4.76 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 52812 -> 52484 (-0.62 %) bytes LDS: 135 -> 135 (0.00 %) blocks Max Waves: 56 -> 53 (-5.36 %) v2: consider WGP, rework to be clearer and apply the "maximum 16 workgroups per CU" limit properly v2: use "SIMD" instead of "EU" v2: fix spiller by introducing "Program::max_waves" v2: rename "lds_size" to "lds_limit" v3: make max_waves actually independant of register usage v3: fix issue where max_waves was way too high v3: use DIV_ROUND_UP(a, b) instead of max(a / b, 1) v3: rename "workgroups_per_cu" to "workgroups_per_cu_wgp" v4: fix typo from "workgroups_per_cu" rename Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)	2019-10-23 19:11:21 +01:00
Rhys Perry	08d510010b	aco: increase accuracy of SGPR limits SGPRs are allocated in groups of 16 on GFX8/GFX9. GFX10 allocates a fixed number of SGPRs and has 106 addressable SGPRs. pipeline-db (Vega): SGPRS: 5912 -> 6232 (5.41 %) VGPRS: 1772 -> 1780 (0.45 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 88228 -> 87904 (-0.37 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 559 -> 571 (2.15 %) piepline-db (Navi): SGPRS: 341256 -> 363384 (6.48 %) VGPRS: 171536 -> 170960 (-0.34 %) Spilled SGPRs: 832 -> 581 (-30.17 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 14207332 -> 14190872 (-0.12 %) bytes LDS: 33 -> 33 (0.00 %) blocks Max Waves: 18072 -> 18251 (0.99 %) v2: unconditionally count vcc as an extra sgpr on GFX10+ v3: pass SGPRs rounded to 8 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-23 19:11:21 +01:00
Rhys Perry	7453c1adff	radv: round vgprs/sgprs before calculating max_waves Note that ACO doesn't correctly round SGPR counts on GFX8/GFX9. pipeline-db (ACO/Vega): SGPRS: 11000 -> 11000 (0.00 %) VGPRS: 3120 -> 3120 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 164328 -> 164328 (0.00 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 1125 -> 1000 (-11.11 %) v2: consider wave32 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-23 19:11:20 +01:00
Samuel Pitoiset	f11ea22666	radv: fix a performance regression with graphics depth/stencil clears I recently changed the slow depth/stencil clear path to make sure depth values are explicitly exported by the fragment shader. This is actually only useful when VK_EXT_depth_range_unrestricted is enabled. While this path is correct, it introduced a performance regression with Heroes of the Storm, Shadow of Mordor (Vulkan beta) and probably more titles. This is because it prevents the hardware to do some optimizations like discarding fragments. This commit re-introduces the previous (a bit faster) slow depth/stencil clear path and it selects the unrestricted path only if VK_EXT_depth_range_unrestricted is enabled. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/863 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-23 10:23:47 +02:00
Samuel Pitoiset	7562a2cbe3	radv: fix vkUpdateDescriptorSets with inline uniform blocks descriptorCount is the number of bytes into the descriptor, so it shouldn't be used as an index. srcArrayElement/dstArrayElement specify the starting byte offset within the binding to copy from/to. This fixes new CTS tests: dEQP-VK.binding_model.descriptor_copy..inline_uniform_block_ dEQP-VK.binding_model.descriptor_copy..mix_3 dEQP-VK.binding_model.descriptor_copy..mix_array1 Fixes: `8d2654a419` ("radv: Support VK_EXT_inline_uniform_block.") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-23 09:59:22 +02:00
Samuel Pitoiset	9c92a21fe5	radv/gfx10: fix 3D images GFX10 does act like GFX9 actually. This fixes dEQP-VK.glsl.texture_functions.query.texturesize.sampler3d_. Cc: 19.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-23 09:45:49 +02:00
Samuel Pitoiset	41ace1d939	radv/gfx10: re-enable fast depth/stencil clears with separate aspects It used to cause weird issues on GFX10 in the past with vkmark and Wreckfest, and they can't be reproduced now. Shadow Of Mordor (Vulkan beta) hits that path and it works fine. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-23 09:18:06 +02:00
Samuel Pitoiset	956d825ed8	radv: do not emit rbplus if attachments are undefined Fixes some crashes with dEQP-VK.geometry.layered.*.secondary_cmd_buffer on Raven and other chips that allow rbplus. This just prevents a crash and rbplus probaby needs more work. Cc: 19.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-23 08:57:31 +02:00
Samuel Pitoiset	411ad8e7c5	radv: add an assertion in radv_gfx10_compute_bin_size() To prevent out of bounds access. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-23 08:33:12 +02:00
Samuel Pitoiset	f4ab58c1a0	radv: do not create meta pipelines with 16 samples The driver only supports up to 8 samples, so it's useless to create more pipelines than needed. This fixes a conditional jump reported by Valgrind on GFX10: ==194282== Conditional jump or move depends on uninitialised value(s) ==194282== at 0xDBF925A: radv_gfx10_compute_bin_size (radv_pipeline.c:3242) ==194282== by 0xDBF95A6: radv_pipeline_generate_binning_state (radv_pipeline.c:3334) ==194282== by 0xDBFC1A0: radv_pipeline_generate_pm4 (radv_pipeline.c:4440) ==194282== by 0xDBFD15E: radv_pipeline_init (radv_pipeline.c:4764) ==194282== by 0xDBFD23E: radv_graphics_pipeline_create (radv_pipeline.c:4788) ==194282== by 0xDBB95A3: create_pipeline (radv_meta_clear.c:114) ==194282== by 0xDBB9AC5: create_color_pipeline (radv_meta_clear.c:297) ==194282== by 0xDBBCF05: radv_device_init_meta_clear_state (radv_meta_clear.c:1277) ==194282== by 0xDB9ACD9: radv_device_init_meta (radv_meta.c:363) ==194282== by 0xDB7FE3A: radv_CreateDevice (radv_device.c:2080 This is caused by an out of bound access of 'fmask_array' (ie. index is 4 as for 16 samples). Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-23 08:33:08 +02:00
Rhys Perry	118a32e5ba	Revert "aco: only emit waitcnt on loop continues if we there was some load or export" We don't properly pass on ctx.lgkm_cnt/ctx.barrier_imm/etc, so this waitcnt was necessary for barriers and correctly waiting for SMEM before s_dcache_wb on GFX10. Totals from affected shaders: SGPRS: 33200 -> 33200 (0.00 %) VGPRS: 31376 -> 31376 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 2431804 -> 2433956 (0.09 %) bytes LDS: 316 -> 316 (0.00 %) blocks Max Waves: 1609 -> 1609 (0.00 %) This reverts commit `2c050b49b3`. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-22 18:52:29 +00:00
Rhys Perry	964ce47abc	aco: add missing bld.scc() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-22 18:52:29 +00:00
Rhys Perry	c96289a70e	aco: keep can_reorder/barrier when combining addition into SMEM Affects 30 shaders in the pipeline-db (all youngblood). Totals from affected shaders: SGPRS: 2656 -> 2456 (-7.53 %) VGPRS: 2260 -> 2260 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 240680 -> 240944 (0.11 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 90 -> 90 (0.00 %) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-22 18:52:29 +00:00
Rhys Perry	57c2cfb608	aco: add a few missing checks in value numbering Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-22 18:52:29 +00:00
Rhys Perry	a8d0101d69	aco: use ds_read2_b64/ds_write2_b64 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-22 18:52:29 +00:00
Rhys Perry	bdf47a1273	aco: properly combine additions into ds_write2_b64/ds_read2_b64 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-22 18:52:29 +00:00
Rhys Perry	58d4aee5df	aco: fix sparse store_lds() p_extract_vector's second operand is in units of the definition size, not dwords. v2: move extract_subvector() to right before ds_write_helper Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-22 18:52:29 +00:00
Rhys Perry	a856629e8f	aco: create load_lds/store_lds helpers We'll want these for GS, since VS->GS IO on Vega is done using LDS. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-22 18:52:29 +00:00
Rhys Perry	a400928f4a	aco: fix 64-bit p_extract_vector on 32-bit p_create_vector Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-22 18:52:29 +00:00
Rhys Perry	f6f15859de	aco: small stage corrections Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-22 18:52:29 +00:00
Daniel Schürmann	3a20ef4a32	aco: refactor value numbering Previously, we used one hashset per BB, so that we could always initialize the current hashset from the immediate dominator. This patch changes the behavior to a single hashmap using the block index per instruction to resolve dominance. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-10-22 17:18:59 +02:00
Samuel Pitoiset	a13320370e	radv: fix updating bound fast ds clear values with different aspects On GFX9, the driver is able to do an optimized fast depth/stencil clear with only one aspect (ie. clear the stencil part of a depth/stencil image). When this happens, the driver should only update the clear values of the given aspect. Note that it's currently only supported on GFX9 but I have some local patches that extend this optimized path for other gens. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1967 Cc: 19.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-22 11:16:13 +02:00
Samuel Pitoiset	39760793b5	ac/llvm: fix ac_to_integer_type() for 32-bit const addr space pointers This fixes some crashes with dEQP-VK.descriptor_indexing.* when read_first_invocation has its source from a descriptor. Most of these tests still fail because of an LLVM bug (they work with ACO). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-21 22:32:01 +02:00
Rhys Perry	73184e51d1	aco: run opt_algebraic in a loop Totals from affected shaders: SGPRS: 13920 -> 13656 (-1.90 %) VGPRS: 12972 -> 12960 (-0.09 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 1005680 -> 1000648 (-0.50 %) bytes LDS: 91 -> 91 (0.00 %) blocks Max Waves: 688 -> 688 (0.00 %) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-21 19:18:30 +00:00
Rhys Perry	132ae89b19	aco: use nir_lower_idiv_precise v7: rename _nv50/_llvm to _fast/_precise Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-21 18:49:46 +00:00
Rhys Perry	8b98d0954e	nir/lower_idiv: add new llvm-based path v2: make variable names snake_case v2: minor cleanups in emit_udiv() v2: fix Panfrost build failure v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature v4: remove nir_op_urcp v5: drop nv50 path v5: rebase v6: add back nv50 path v6: add comment for nir_lower_idiv_path enum v7: rename _nv50/_llvm to _fast/_precise v8: fix etnaviv build failure Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-21 18:49:46 +00:00
Daniel Schürmann	0e4bd261b1	aco: ensure that uniform booleans are computed in WQM if their uses happen in WQM This fixes graphical corruption in SC2. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-10-21 17:39:46 +00:00
Timur Kristóf	7e5f87b533	aco/gfx10: Update constant addresses in fix_branches_gfx10. Due to a bug in GFX10 hardware, s_nop instructions must be added if a branch is at 0x3f. We already do this, but forgot to also update the constant addresses that come after this instruction. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-21 14:33:54 +00:00

1 2 3 4 5 ...

4134 commits