fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-08 13:28:06 +02:00

Author	SHA1	Message	Date
Chih-Wei Huang	352d91ce5b	android: radv: remove unused LOCAL_EXPORT_C_INCLUDE_DIRS The vulkan module is the final HAL. No need to export its headers since none will import it. Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw> Acked-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 08:55:50 +02:00
Chih-Wei Huang	4fb11c01c5	android: anv: remove unused LOCAL_EXPORT_C_INCLUDE_DIRS The vulkan module is the final HAL. No need to export its headers since none will import it. Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw> Acked-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 08:55:42 +02:00
Jason Ekstrand	7e0fcea727	nir/loop_analyze: Pass nir_const_values directly to helpers Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	ff972c7a3a	nir/loop_analyze: Properly handle swizzles in loop conditions This commit re-plumbs all of nir_loop_analyze to use nir_ssa_scalar for all intermediate values so that we can properly handle swizzles. Even though if conditions are required to be scalars, they may still consume swizzles so you could have ((a.yzw < b.zzx).xz && c.xx).y == 0 as your loop termination condition. The old code would just bail the moment it saw its first non-zero swizzle but we can now properly chase the scalar from the if condition to all the way to a, b, and c. Shader-db results on Kaby Lake: total loops in shared programs: 4388 -> 4364 (-0.55%) loops in affected programs: 29 -> 5 (-82.76%) helped: 29 HURT: 5 Shader-db results on Haswell: total loops in shared programs: 4370 -> 4373 (0.07%) loops in affected programs: 2 -> 5 (150.00%) helped: 2 HURT: 5 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	0333649e63	nir/loop_analyze: Refactor detection of limit vars This commit reworks both get_induction_and_limit_vars() and try_find_trip_count_vars_in_iand to return true on success and not modify their output parameters on failure. This makes their callers significantly simpler. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	8f7405ed9d	nir: Add some helpers for chasing SSA values properly There are various cases in which we want to chase SSA values through ALU ops ranging from hand-written optimizations to back-end translation code. In all these cases, it can be very tricky to do properly because of swizzles. This set of helpers lets you easily work with a single component of an SSA def and chase through ALU ops safely. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	9a3cb6f5fe	nir/loop_analyze: Bail if we encounter swizzles None of the current code knows what to do with swizzles. Take the safe option for now and bail if we see one. This does have a small shader-db impact but it is at least safe. Shader-db results on Kaby Lake: total loops in shared programs: 4364 -> 4388 (0.55%) loops in affected programs: 5 -> 29 (480.00%) helped: 5 HURT: 29 Shader-db results on Haswell: total loops in shared programs: 4373 -> 4370 (-0.07%) loops in affected programs: 5 -> 2 (-60.00%) helped: 5 HURT: 2 Fixes: `6772a17acc` "nir: Add a loop analysis pass" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	6455fa9710	nir/loop_analyze: Use new eval_const_* helpers in test_iterations Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	268ad47c11	nir/loop_analyze: Handle bit sizes correctly in calculate_iterations The current code assumes everything is 32-bit which is very likely true but not guaranteed by any means. Instead, use nir_eval_const_opcode to do the calculations in a bit-size-agnostic way. We also use the new constant constructors to build the correct size constants. Fixes: `6772a17acc` "nir: Add a loop analysis pass" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	9f7ffe41dd	nir/loop_analyze: Fix phi-of-identical-alu detection One issue was that the original version didn't check that swizzles matched when comparing ALU instructions so it could end up matching very different instructions. Using the nir_instrs_equal function from nir_instr_set.c which we use for CSE should be much more reliable. Another was that the loop assumes it will only run two iterations which may not be true. If there's something which guarantees that this case only happens for phis after ifs, it wasn't documented. Fixes: `9e6b39e1d5` "nir: detect more induction variables" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	6e984bcb92	nir/instr_set: Expose nir_instrs_equal() Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	64328f947e	nir/builder: Use nir_const_value_for_* for constructing immediates Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	3acddc733f	nir: Refactor nir_src_as_* constant functions Now that we have the nir_const_value_as_* helpers, every one of these functions is effectively the same except for the suffix they use so we can easily define them with a repeated macro. This also means that they're inline and the fact that the nir_src is being passed by-value should no longer really hurt anything. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	ce5581e23e	nir: Add more helpers for working with const values Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Chia-I Wu	b44bb8bded	virgl: remove virgl_transfer_queue_lists COMPLETED_LIST is always empty. We only need one list. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-07-09 14:26:55 -07:00
Chia-I Wu	48aefcbd6b	virgl: simplify virgl_transfer_queue_extend We can reuse virgl_transfer_queue_find_pending. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-07-09 14:26:55 -07:00
Chia-I Wu	eae4527551	virgl: remove transfer after transfer_write Now that virgl_transfer_queue_is_queued does not search COMPLETED_LIST, we don't need to move transfers to that list. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-07-09 14:26:55 -07:00
Chia-I Wu	bec2a85c48	virgl: improve virgl_transfer_queue_is_queued Search only the pending list and return immediately on the first hit. When the transfer queue was introduced, the function was used to deal with write transfer -> draw -> write transfer sequence. It was used to tell if the second transfer intersects with the first transfer. If yes, the transfer queue avoided reordering the second transfer to before the draw (by flushing) in case the draw uses the transferred data. With the recent changes to the transfer code, the function is used to deal with write transfer -> readback transfer We want to avoid reordering the readback transfer to before the first transfer (also by flushing). In the old code, we needed to track the compeleted transfers as well to avoid reordering. But in the new code, a readback transfer is guaranteed to see the data from the completed transfers (in other words, it cannot be reoderered to before the already completed transfers). We don't need to search the COMPLETED_LIST. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-07-09 14:26:55 -07:00
Chia-I Wu	5f6aab2ee2	virgl: fix transfers_intersect for mipmaps We never use transfers_intersect with textures, but fix it anyway to avoid confusion. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-07-09 14:26:55 -07:00
Chia-I Wu	6ca1bbabbe	virgl: fix some false positives in transfers_overlap Rewrite the function and check z/depth more carefully. We intentionally avoid u_box_test_intersection_2d because it returns true when two boxes touch but do not intersect and can be confusing. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-07-09 14:26:55 -07:00
Marek Olšák	2b2093961e	radeonsi/gfx10: enable primitive binning by default Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	9f68367d19	radeonsi/gfx10: implement primitive binning Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	4e56a2aaa8	radeonsi: simplify primitive binning enablement Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	3521297251	radeonsi: set primitive binning tunables for dGPUs Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	d7e80ba1e7	radeonsi: set FLUSH_ON_BINNING_TRANSITION when needed Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	9dbe63ceea	radeonsi/gfx10: use the new scan converter when binning is disabled Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	80b3f4b4bd	radeonsi/gfx9: fix an oversight in primitive binning code Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	1f53a3e766	radeonsi: use BREAK_BATCH instead of FLUSH_DFSM when CB_TARGET_MASK changes Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	605900d7dd	radeonsi/gfx10: don't expose unimplemented PIPE_CAP_QUERY_SO_OVERFLOW Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	270a8ab648	radeonsi/gfx10: launch 2 compute waves per CU before going onto the next CU Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	ab1f36a1d3	radeonsi/gfx10: set more registers and fields Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	9b65f6618c	radeonsi/gfx10: enable LATE_ALLOC_GS Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	4985c3ee22	radeonsi/gfx10: set HS/GS/CS.WGP_MODE Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	329406ec9c	radeonsi/gfx10: set GE_PC_ALLOC Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	9d1483de3b	radeonsi/gfx10: enable 1D textures Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	1d3bffaf9c	radeonsi/gfx10: enable image stores with DCC Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	5b50fb9b7f	radeonsi/gfx10: no need to invalidate L2 for framebuffer -> texture coherency Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	fbf781e401	radeonsi/gfx10: support pixel shaders without exports It only works if there are not color and no Z exports. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	2adc8e2736	radeonsi/gfx10: enable vertex shaders without param space allocation Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	07fe51156d	radeonsi: update DCC settings from PAL Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	4002913f8d	radeonsi: reorder shader IO indices for better IO space usage for tess and GS The highest used index determines the stride for shader outputs in shaders that use LDS or memory for outputs. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	1c99a13f89	radeonsi: decrease maximum supported GENERIC varying index from 42 to 31 This can decrease LDS and/or memory usage for shader outputs when geometry shaders or tessellation is used. Only PS inputs support higher indices and those aren't eliminated by kill_outputs. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	6335cc6a58	radeonsi: cosmetic cleanup in si_shader_io_get_unique_index Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	3be4ed2fe1	radeonsi: fix and clean up shader_type passing - don't pass it via a parameter if it can be derived from other parameters - set shader_type for ac_rtld_open - use enum pipe_shader_type instead of unsigned Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	37b26671a7	radeonsi: enable RB+ for pixel shaders with no/non-contiguous color outputs Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie airlied@redhat.com	2019-07-09 17:24:16 -04:00
Marek Olšák	5058d62b05	radeonsi: don't set READ_ONLY for const_uploader to fix bindless texture hangs Bindless textures can update descriptors with WRITE_DATA. Cc: 19.1 <mesa-stable@lists.freedesktop.org> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie airlied@redhat.com	2019-07-09 17:24:16 -04:00
Alyssa Rosenzweig	6074eae753	gallium: Add util_format_is_unorm8 check Useful for formats that would work with the same driver code path as RGBA8 UNORM but that don't meet the util_format_is_rgba8_variant criteria due to a smaller channel count. v2: Use simpler logic (suggested by Iago). v3: Fix spelling erorr. boolean->bool (thank you airlied). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 21:17:47 +00:00
Alyssa Rosenzweig	15000c79da	nir: Add Panfrost-specific blending intrinsic This gives more flexibility than the normal store_deref/store_output versions (particularly, it allows us to abuse the type system in awful ways, which is necessary for efficient format conversion in blend shaders.) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Karol Herbst <kherbst@redhat.com>	2019-07-09 14:07:23 -07:00
Pratik Vishwakarma	177a3df7b0	radeonsi: Expose support for 10-bit VP9 decode Fix si_vid_is_format_supported to expose support for 10-bit VP9 decode using P016 format. Without this change, 10-bit decode will be exposed only for HEVC even though newer hardware support 10-bit decode for VP9. Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2019-07-09 15:26:54 -04:00
Alyssa Rosenzweig	4a4b48fb05	nir: Add nir_imm_vec4_16 We already have nir_imm_float16 and nir_imm_vec4; let's add the ability to easily make immediate fp16 vectors as well, now that fp16 support is maturing in NIR/GLSL. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-09 18:43:07 +00:00

1 2 3 4 5 ...

112935 commits