fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-06 19:40:10 +01:00

Author	SHA1	Message	Date
Jason Ekstrand	b28bad89b9	nir: Get rid of nir_register::is_packed All we ever do is initialize it to zero, clone it, print it, and validate it. No one ever sets or uses it. Acked-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-09 00:29:36 -05:00
Dave Airlie	ff852fdc05	virgl: add support for ARB_indirect_parameters The protocol changes are already in place for it. Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2019-04-09 14:25:01 +10:00
Dave Airlie	05ff2dbf13	virgl: add support for ARB_multi_draw_indirect This will pass the multi draw through to the host if it has support for it instead of using the st to emulate it Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2019-04-09 14:15:24 +10:00
Dave Airlie	316b785c59	virgl: add support for missing command buffer binding. When I added indirect support I forgot this, however to use it now we need to check for a new enough capability on the host side. Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2019-04-09 14:15:12 +10:00
Caio Marcelo de Oliveira Filho	899fd66b44	docs: Add NV_compute_shader_derivatives to 19.1.0 relnotes	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	45a4129392	anv: Implement VK_NV_compute_shader_derivatives Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	bd73531677	spirv: Add support for DerivativeGroup capabilities As defined in SPV_NV_compute_shader_derivatives. These control how the invocations are arranged in a CS when doing derivative and related operations (which are also enabled by the extension). Since we expect valid SPIR-V, we don't need to do more work at SPIR-V level to enable the derivative and related operations to be called. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	956226c8ba	iris: Enable NV_compute_shader_derivatives Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	f9b29c4a58	gallium: Add PIPE_CAP_COMPUTE_SHADER_DERIVATIVES To enable NV_compute_shader_derivatives, which allows derivatives (and texture lookups with implicit derivatives) in compute shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	c9d1569689	i965: Advertise NV_compute_shader_derivatives Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	94abc53030	intel/fs: Use NIR_PASS_V when lowering CS intrinsics This will make that step visible in NIR_PRINT=1. v2: Also use the macro for the cleanup passes. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	0425b34b79	intel/fs: Don't loop when lowering CS intrinsics This was needed when certain intrinsics were lowered to other ones that were defined by the same pass. After `060817b2` "intel,nir: Move gl_LocalInvocationID lowering to nir_lower_system_values" we don't need the loop anymore. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	3ee3024804	intel/fs: Add support for CS to group invocations in quads When using quads, instead of mapping the elements to the next 4 local invocation indices, we map the two next in the "current" row and two next in the "next row". A side effect is that a thread will execute the indices in a different order. We now perform the lowering of both local invocation ID and index together -- and don't rely anymore on lowering done by nir_lower_system_values. That is convenient when doing the math for quads, because we need X and Y to get the right invocation index. When the pass progresses, fold the constants and clean up to reduce the noise from the indexing math. This implements the derivative_group_quadsNV semantics from NV_compute_shader_derivatives. v2: Take subgroup_id into account, otherwise only values in the first subgroup would be used. (Jason) v3: Calculate invocation index and ID together, to avoid duplicating some math in the quads case when both index and ID are used. (Jason) v4: Don't call cleanup passes as part of the lowering, let that to the call site. (Jason) Change calculation to use less instructions. (Jason) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v3) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	ef0339d5ea	intel/fs: Use TEX_LOGICAL whenever implicit lod is supported Make sure we include compute shaders that have a derivative group defined. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	fcbc5ccaae	nir: Don't set LOD=0 for compute shader that has derivative group When using NV_compute_shader_derivatives to set a derivative group, a compute shader supports texture with implicit LOD calculation, so don't set an explicit LOD. Note if the extension is used but the derivative group is not specified, it will default to LOD=0 as before. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	d08a74d2bf	nir/algebraic: Lower CS derivatives to zero when no group defined In compute shaders if no derivative group is defined, the derivatives will always be zero. Specified in NV_compute_shader_derivatives. To make the check more convenient, add a "info" local variable to the generated code so we can refer to it in the Python rules. (Jason) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho	3c5ddaeacd	glsl: Parse and propagate derivative_group to shader_info NV_compute_shader_derivatives allow selecting between two possible arrangements (quads and linear) when calculating derivatives and certain subgroup operations in case of Vulkan. So parse and propagate those up to shader_info.h. v2: Do not fail when ARB_compute_variable_group_size is being used, since we are still clarifying what is the right thing to do here. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho	ca60f0b7ba	glsl: Enable texture builtins for NV_compute_shader_derivatives Renamed a few predicates from "fs_only" to be "derivative_only" (or similar pairs). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho	09a3273fe7	glsl: Enable derivative builtins for NV_compute_shader_derivatives Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho	289478ea89	glsl: Remove redundant conditions when asserting in_qualifier As the code evolved, we ended up with a redundant conditions. Clean this up. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho	163655b33e	mesa: Extension boilerplate for NV_compute_shader_derivatives Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:32 -07:00
Timothy Arceri	e30804c602	nir/radv: remove restrictions on opt_if_loop_last_continue() When I implemented opt_if_loop_last_continue() I had restricted this pass from moving other if-statements inside the branch opposite the continue. At the time it was causing a bunch of spilling in shader-db for i965. However Samuel Pitoiset noticed that making this pass more aggressive significantly improved the performance of Doom on RADV. Below are the statistics he gathered. 28717 shaders in 14931 tests Totals: SGPRS: 1267317 -> 1267549 (0.02 %) VGPRS: 896876 -> 895920 (-0.11 %) Spilled SGPRs: 24701 -> 26367 (6.74 %) Code Size: 48379452 -> 48507880 (0.27 %) bytes Max Waves: 241159 -> 241190 (0.01 %) Totals from affected shaders: SGPRS: 23584 -> 23816 (0.98 %) VGPRS: 25908 -> 24952 (-3.69 %) Spilled SGPRs: 503 -> 2169 (331.21 %) Code Size: 2471392 -> 2599820 (5.20 %) bytes Max Waves: 586 -> 617 (5.29 %) The codesize increases is related to Wolfenstein II it seems largely due to an increase in phis rather than the existing jumps. This gives +10% FPS with Doom on my Vega56. Rhys Perry also benchmarked Doom on his VEGA64: Before: 72.53 FPS After: 80.77 FPS v2: disable pass on non-AMD drivers Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1) Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-09 11:29:41 +10:00
Dave Airlie	c6cf602121	softpipe: add support for vertex streams (v2) This enables the ARB_gpu_shader5 vertex streams on softpipe. v2: only enable when not using llvm. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2019-04-09 11:20:39 +10:00
Dave Airlie	7720ce32aa	draw: add support to tgsi paths for geometry streams. (v2) This hooks up the geometry shader processing to the TGSI support added in the previous commits. It doesn't change the llvm interface other than to keep things building. v2: fix some regressions caused by primitiveoffsets Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2019-04-09 11:19:38 +10:00
Dave Airlie	ddb9ad363d	softpipe: add support for indexed queries. We need indexed queries to retrieve the geom shader info. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2019-04-09 11:19:38 +10:00
Dave Airlie	00fe67c015	tgsi: add support for geometry shader streams. This adds support to retrieve the primitive counts for each stream, along with the offset for each primitive into the output array. It also adds support for parsing the stream argument to the emit and end instructions. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2019-04-09 11:19:38 +10:00
Dave Airlie	333746011d	draw: add stream member to stats callback This just adds space for the member to the callback, doesn't change anything else. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2019-04-09 11:19:38 +10:00
Chia-I Wu	63b823130d	vulkan/wsi: make wl_drm optional When wl_drm is missing and the driver supports modifiers, use zwp_linux_dmabuf_v1 for the list of supported formats and for buffer creation. Limit the supported formats to those with modifiers, which are WL_DRM_FORMAT_{ARGB8888,XRGB8888} currently. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-04-09 00:42:30 +00:00
Chia-I Wu	5318858f35	vulkan/wsi: add wsi_wl_display_dmabuf Add wsi_wl_display_dmabuf for zwp_linux_dmabuf_v1-related states. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-04-09 00:42:30 +00:00
Chia-I Wu	fd7fecf59a	vulkan/wsi: add wsi_wl_display_drm Add wsi_wl_display_drm for wl_drm-related states. We will move formats into the struct in a later commit. Remove the unnecessary check for wl_registry_bind failures. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-04-09 00:42:30 +00:00
Chia-I Wu	22dcb080d9	vulkan/wsi: refactor drm_handle_format Refactor the swtich statement in drm_handle_format out to wsi_wl_display_add_wl_format. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-04-09 00:42:30 +00:00
Chia-I Wu	2d214d9405	vulkan/wsi: create wl_drm wrapper as needed When modifiers are specified, we have to use dmabuf rather than wl_drm. We don't need the wrapper in that case. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-04-09 00:42:30 +00:00
Chia-I Wu	ab74937b2c	vulkan/wsi: move modifier array into wsi_wl_swapchain This avoids repeated checks for each wsi_wl_image. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-04-09 00:42:30 +00:00
Adam Jackson	52426ce4a9	drisw: Try harder to probe whether MIT-SHM works XQueryExtension merely tells you whether the extension exists, it doesn't tell you whether you're local enough for it to work. XShmQueryVersion is not enough to discover this either, you need to provoke the server to do actual work, and if it thinks you're remote it will throw BadRequest at you. So send an invalid ShmDetach and use the error code to distinguish local from remote. [airlied: fixed bug not resetting xshm_error to 0 on success, which made later stuff fail completely.] Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Adam Jackson <ajax@redhat.com>	2019-04-09 09:50:24 +10:00
Jason Ekstrand	50f3535d1f	nir/search: Search for all combinations of commutative ops Consider the following search expression and NIR sequence: ('iadd', ('imul', a, b), b) ssa_2 = imul ssa_0, ssa_1 ssa_3 = iadd ssa_2, ssa_0 The current algorithm is greedy and, the moment the imul finds a match, it commits those variable names and returns success. In the above example, it maps a -> ssa_0 and b -> ssa_1. When we then try to match the iadd, it sees that ssa_0 is not b and fails to match. The iadd match will attempt to flip itself and try again (which won't work) but it cannot ask the imul to try a flipped match. This commit instead counts the number of commutative ops in each expression and assigns an index to each. It then does a loop and loops over the full combinatorial matrix of commutative operations. In order to keep things sane, we limit it to at most 4 commutative operations (16 combinations). There is only one optimization in opt_algebraic that goes over this limit and it's the bitfieldReverse detection for some UE4 demo. Shader-db results on Kaby Lake: total instructions in shared programs: 15310125 -> 15302469 (-0.05%) instructions in affected programs: 1797123 -> 1789467 (-0.43%) helped: 6751 HURT: 2264 total cycles in shared programs: 357346617 -> 357202526 (-0.04%) cycles in affected programs: 15931005 -> 15786914 (-0.90%) helped: 6024 HURT: 3436 total loops in shared programs: 4360 -> 4360 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 23675 -> 23666 (-0.04%) spills in affected programs: 235 -> 226 (-3.83%) helped: 5 HURT: 1 total fills in shared programs: 32040 -> 32032 (-0.02%) fills in affected programs: 190 -> 182 (-4.21%) helped: 6 HURT: 2 LOST: 18 GAINED: 5 Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-04-08 21:38:48 +00:00
Lionel Landwerlin	48e48b8560	intel: add dependency on genxml generated files Drivers using genxml will start compilation before generated files are created, so add a dependency to it. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Cc: mesa-stable@lists.freedesktop.org	2019-04-08 20:52:47 +00:00
Marek Olšák	4b63f57cbc	radeonsi: fix a crash when unbinding sampler states Acked-by: James Zhu <James.Zhu@amd.com>	2019-04-08 15:23:32 -04:00
Samuel Pitoiset	775191cd99	radv: fix getting the vertex strides if the bindings aren't contiguous Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110349 Fixes: `a66b186beb` ("radv: use typed buffer loads for vertex input fetches") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-08 21:17:15 +02:00
Lionel Landwerlin	ce790c96a9	anv: implement VK_KHR_swapchain revision 70 This revision allows for images to be : - created by reusing image parameters from swapchain - bound to memory from a swapchain v2: Add color attachment flag Use same implicit WSI parameters (tiling, samples, usage) v3: Fix missing break in vk_foreach_struct_const() switch (Lionel) v4: Fix accessing image aspects before android resolve (Tapani) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-04-08 18:27:02 +01:00
Eric Engestrom	ed91ca0629	vk/util: remove unneeded array index This is an array of 1, so [0] is the only content, and meson already flattens the list so this is unnecessary. Also, all the other uses of vk_api_xml don't do that. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-08 17:03:00 +00:00
Samuel Pitoiset	27b8f3ecc3	ac/nir: fix intrinsic names for atomic operations with LLVM 9+ This fixes the following LLVM error when using RADV_DEBUG=checkir: Intrinsic name not mangled correctly for type arguments! Should be: llvm.amdgcn.buffer.atomic.add.i32 i32 (i32, <4 x i32>, i32, i32, i1)* @llvm.amdgcn.buffer.atomic.add The cmpswap operation still uses the old intrinsic. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-08 13:16:50 +02:00
Alyssa Rosenzweig	4209a27c61	panfrost: Remove "mali_unknown6" nonsense This structure was used maaaany moons ago as a placeholder for the varying meta (now unified with mali_attr_meta and essentially fully decoded). I don't know why it's still in the file. Let's wack it. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-07 16:05:42 +00:00
Alyssa Rosenzweig	b19d1a1e63	panfrost/midgard: Enable lower_find_lsb This is exactly what the blob does. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-07 16:01:49 +00:00
Alyssa Rosenzweig	65816ad6e8	panfrost/midgard: Add ibitcount8 op The mechanics of this opcode are a little opaque, but essentially, it's used in 8-bit mode to do a bit count in parallel of a uint and then doing a ton of clever iadd/imov ops to recombine. v2: Correct opcode. Thank you to jernej on IRC for noticing this awkward typo! Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-07 16:01:12 +00:00
Alyssa Rosenzweig	6cba9acb75	panfrost/midgard: Add ilzcnt op Used for implementing findLSB/MSB Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-07 16:00:39 +00:00
Alyssa Rosenzweig	2e7555b14b	panfrost/midgard: Add umin/umax opcodes Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-07 15:59:05 +00:00
Alyssa Rosenzweig	d84ee49027	panfrost: Add tilebuffer load? branch Also document branches better. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-07 15:58:44 +00:00
Alyssa Rosenzweig	7cccc89f80	panfrost/decode: Add flags for tilebuffer readback These flags are set when reading back the tilebuffer from a fragment shader via various mechanisms (including ARM_shader_framebuffer_fetch and EXT_pixel_local_storage). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-07 15:58:19 +00:00
Karol Herbst	1aabb79bdc	panfrost/midgard: use nir_src_is_const and nir_src_as_uint Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-07 15:56:10 +00:00
Jason Ekstrand	10a2fdacfa	vc4: Prefer nir_src_comp_as_uint over nir_src_as_const_value Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-07 15:13:36 +02:00

1 2 3 4 5 ...

109801 commits