fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 11:20:11 +01:00

Author	SHA1	Message	Date
Samuel Iglesias Gonsálvez	a21dc2b500	i965/vec4: split DF instructions and later double its execsize in IVB/BYT We need to split DF instructions in two on IVB/BYT as it needs an execsize 8 to process 4 DF values (one GRF in total). v2: - Rename helper and make it static inline function (Matt). - Fix indention and add braces (Matt). v3: - Don't edit IR instruction when doubling exec_size (Curro) - Add comment into the code (Curro). - Manage ARF registers like the others (Curro) v4: - Add get_exec_type() function and use it to calculate the execution size. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Take destination type as execution type where there is no valid source. Assert-fail if the deduced execution type is byte. Clarify comment in get_lowered_simd_width(). Move SIMD width workaround outside of 'if (...inst->size_written > REG_SIZE)' conditional block, since the problem should be independent of whether the amount of data written by the instruction is greater or lower than a GRF. Drop redundant is_ivb_df definition. Drop bogus inst->exec_size < 8 check. Simplify channel group assertion. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez	a5399e8b1c	i965/fs: lower all non-force_writemask_all DF instructions to SIMD4 on IVB/BYT The hardware applies the same channel enable signals to both halves of the compressed instruction which will be just wrong under non-uniform control flow. Fix this by splitting those instructions to SIMD4. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Francisco Jerez	ebfb703d44	i965/fs: Get 64-bit indirect moves working on IVB. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-04-14 14:56:08 -07:00
Matt Turner	630b84cdc8	i965: Use source region <1,2,0> when converting to DF. Doing so allows us to use a single MOV in VEC4_OPCODE_TO_DOUBLE instead of two. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero	3198ce3f96	i965/fs: fix lower SIMD width for IVB/BYT's MOV_INDIRECT According to the IVB and HSW PRMs: "2.When the destination requires two registers and the sources are indirect, the sources must use 1x1 regioning mode." So for DF instructions the execution size is not limited by the number of address registers that are available, but by the EU decompression logic not handling VxH indirect addressing correctly. This patch limits the SIMD width to 4 in this case. v2: - Fix typo (Matt). - Fix condition (Curro) v3: - Add spec quote (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero	571cbd05eb	i965/fs: fix dst stride in IVB/BYT type conversions When converting a DF to 32-bit conversions, we set dst stride to 2, to fulfill alignment restrictions because the upper Dword of every Qword will be written with undefined value. But in IVB/BYT, this is not necessary, as each DF conversion already writes 2, the first one the real value, and the second one a 0. That is, IVB/BYT already set stride = 2 implicitly, so we must set it to 1 explicitly to avoid ending up with stride = 4. v2: - Fix typo (Matt) v3: - Fix stride in the destination's brw_reg, don't modity IR (Curro) v4: - Remove 'is_dst' argument of brw_reg_from_fs_reg() (Curro) - Fix comment (Curro). - Relax hstride assert (Curro) Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Minor spelling fixes. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez	af6fc3a8ea	i965/fs: rename lower_d2x to lower_conversions v2: - Change the name to lower_conversions. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez	dee31311eb	Revert "i965/fs: Don't emit SEL instructions for type-converting MOVs." This reverts commit `7dccd38b40`. d2x pass fixes SEL instructions when there is a type conversion by doing a SEL without type conversion and then convert the result. This pass also takes into account the non-uniform control flow. Then, `7dccd38b40` is not needed anymore. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez	aeecc82d05	i965/fs: generalize the legalization d2x pass Generalize it to lower any unsupported narrower conversion. v2 (Curro): - Add supports_type_conversion() - Reuse existing intruction instead of cloning it. - Generalize d2x to narrower and equal size conversions. v3 (Curro): - Make supports_type_conversion() const and improve it. - Use foreach_block_and_inst to process added instructions. - Simplify code. - Add assert and improve comments. - Remove redundant mov. - Remove useless comment. - Remove saturate == false assert and add support for saturation when fixing the conversion. - Add get_exec_type() function. v4 (Curro): - Use get_exec_type() function to get sources' type. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Matt Turner	94ffeb7fa2	i965: Use <0,2,1> region for scalar DF sources on IVB/BYT. On HSW+, scalar DF sources can be accessed using the normal <0,1,0> region, but on IVB and BYT DF regions must be programmed in terms of floats. A <0,2,1> region accomplishes this. v2: - Apply region <0,2,1> in brw_reg_from_fs_reg() (Curro). v3: - Added comment explaining the reason (Curro). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez	82d17615f4	i965/fs: clamp exec_size when an instruction has a scalar DF source Then the SIMD lowering pass will get rid of any compressed instructions with scalar source (whether force_writemask_all or not) and we avoid hitting the Gen7 region decompression bug. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Suggested-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero	0f1316d4db	i965/fs: double regioning parameters and execsize for DF in IVB/BYT In IVB and BYT, both regioning parameters and execution sizes are measured as 32-bits element size. So when we have something like: mov(8) g2<1>DF g3<4,4,1>DF We are not actually moving 8 doubles (our intention), but 4 doubles. We need to double the parameters to cope with this issue. However, horizontal strides don't behave as they're supposed to on IVB for DF regions, they will cause each 32-bit half of DF sources to be strided individually, and doubling the value won't make any difference. v2: - Use devinfo directly (Matt). - Use Baytrail instead of Valleview (Matt). - Use IvyBridge instead of Ivy (Matt) - Double the exec_size in code emission (Curro) v3: - Change hstride doubling by an assert and fix commit log (Curro). - Substitute remaining compiler->devinfo by devinfo (Curro). v4: - Fix comment (Curro). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero	79af256388	i965/fs: add helper to retrieve instruction execution type The execution data size is the biggest type size of any instruction operand. We will use it to know if the instruction deals with DF, because in Ivy we need to double the execution size and regioning parameters. v2: - Fix typo in commit log (Matt) - Use static inline function instead of fs_inst's method (Curro). - Define the result as a constant (Curro). - Fix indentation (Matt). - Add braces to nested control flow (Matt). v3 (Curro): - Add get_exec_type() and other auxiliary functions and use them to calculate its size. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Fix deduced execution type for integer vector types. Take destination type as execution type where there is no valid source. Assert-fail if the deduced execution type is byte. Move into brw_ir_fs.h header for consistency with the VEC4 back-end. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:07 -07:00
Matt Turner	fd349d29e4	i965: Handle IVB DF differences in the validator. On IVB/BYT, region parameters and execution size for DF are in terms of 32-bit elements, so they are doubled. For evaluating the validity of an instruction, we halve them. v2 (Sam): - Add comments. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-04-14 14:56:07 -07:00
Iago Toral Quiroga	fbac8b1f94	i965/disasm: also print nibctrl in IVB for execsize=8 4-wide DF operations where NibCtrl applies require and execsize of 8 in IvyBridge/BayTrail. v2: - Refactor NibCtrl printing (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:06 -07:00
Boyan Ding	ff29f488d4	nir: Destination component count of shader_clock intrinsic is 2 This fixes the following error when using ARB_shader_clock on i965: vec1 32 ssa_0 = intrinsic shader_clock () () () intrinsic store_var (ssa_0) (clock_retval) (3) /* wrmask=xy */ error: src->ssa->num_components == num_components (nir/nir_validate.c:204) Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: mesa-stable@lists.freedesktop.org	2017-04-14 14:54:06 -07:00
Nicolai Hähnle	39f51b5db9	radeonsi: add missing initialization for userptr buffers Fix the accounting for memory usage of userptr buffers, which has been wrong forever (or at least for a long time). Also initialize flags. Without this initialization, the sparse buffer flag might end up being set, which leads to staging buffers being used unnecessarily (and incorrectly) in transfers to or from userptr buffers. This works around VM faults that occur with the radeon kernel module when running piglit ./bin/amd_pinned_memory decrement-offset map-buffer -auto Fixes: `e077c5fe65` ("gallium/radeon: transfers and invalidation for sparse buffers") Reported-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-04-14 23:23:04 +02:00
Fredrik Höglund	c1dd5d0b01	radv: remove the temp descriptor set infrastructure It is no longer used. Signed-off-by: Fredrik Höglund <fredrik@kde.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-04-14 23:21:24 +02:00
Fredrik Höglund	5ab5d1bee4	radv: use push descriptors in meta Use push descriptors instead of temp descriptor sets. Signed-off-by: Fredrik Höglund <fredrik@kde.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-04-14 23:21:24 +02:00
Fredrik Höglund	f95caae504	radv: add private push descriptors for meta This allows meta to use push descriptors without disturbing user push descriptors. radv_meta_push_descriptor_set differs from vkCmdPushDescriptorSetKHR in that partial updates are not supported; all descriptors used in subsequent draw commands must be pushed at the same time. Signed-off-by: Fredrik Höglund <fredrik@kde.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-04-14 23:21:24 +02:00
Jason Ekstrand	220974b38d	anv/blorp: Properly handle VK_ATTACHMENT_UNUSED The Vulkan driver was originally written under the assumption that VK_ATTACHMENT_UNUSED was basically just for depth-stencil attachments. However, the way things fell together, VK_ATTACHMENT_UNUSED can be used anywhere in the subpass description. The blorp-based clear and resolve code has a bunch of places where we walk lists of attachments and we weren't handling VK_ATTACHMENT_UNUSED everywhere. This commit should fix all of them. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: <mesa-stable@lists.freedesktop.org>	2017-04-14 14:20:42 -07:00
Jason Ekstrand	21d2ca72d8	anv/cmd_buffer: Use the null surface state for ATTACHMENT_UNUSED Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: <mesa-stable@lists.freedesktop.org>	2017-04-14 14:20:42 -07:00
Jason Ekstrand	02eca8b6f8	anv/cmd_buffer: Always set up a null surface state We're about to start requiring it in yet another case and calculating exactly when one is needed is starting to get prohibitively expensive. A single surface state doesn't take up that much space so we may as well create one all the time. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: <mesa-stable@lists.freedesktop.org>	2017-04-14 14:20:42 -07:00
Nicolai Hähnle	d6588d9962	radeonsi: cope with missing disassembly For robustness and testing purposes. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-04-14 22:51:07 +02:00
Nicolai Hähnle	d15b1f6e2d	gallium/ddebug: dump missing members of pipe_draw_info Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-04-14 22:50:54 +02:00
Nicolai Hähnle	2ac03e90fb	radeonsi: enable ARB_shader_viewport_layer_array Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2017-04-14 22:50:17 +02:00
Nicolai Hähnle	d5e53f348e	radeonsi: handle ignored LAYER and VIEWPORT_INDEX writes Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2017-04-14 22:50:13 +02:00
Nicolai Hähnle	4127f38bae	st/mesa: enable ARB_shader_viewport_layer_array Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2017-04-14 22:50:09 +02:00
Nicolai Hähnle	f3d2cf6c1f	tgsi: clarify TGSI_SEMANTIC_{LAYER,VIEWPORT_INDEX} Depending on pipe caps they can be writable in all vertex processing stages, but only the output of the last stage counts. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2017-04-14 22:50:06 +02:00
Nicolai Hähnle	17f24a9b75	gallium: add PIPE_CAP_TGSI_TES_LAYER_VIEWPORT Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2017-04-14 22:49:44 +02:00
Nicolai Hähnle	8b5d477aa8	configure.ac: add --enable-sanitize option Enable code sanitizers by adding -fsanitize=$foo flags for the compiler and linker. In addition, this also disables checking for undefined symbols: running the address sanitizer requires additional symbols which should be provided by a preloaded libasan.so (preloaded for hooking into malloc & friends globally), and the undefined symbols check gets tripped up by that. Running the tests works normally via `make check`, but shows additional failures with the address sanitizer due to memory leaks that seem to be mostly leaks in the tests themselves. I believe those failures should really be fixed. In the mean-time, you can set export ASAN_OPTIONS=detect_leaks=0 to only check for more serious error types. v2: - fail reasonably when an unsupported sanitize flag is given (Eric Engestrom) Reviewed-by: Bartosz Tomczyk <bartosz.tomczyk86@gmail.com> (v1) Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2017-04-14 22:44:30 +02:00
Jason Ekstrand	e1f6fb8021	anv/cmd_buffer: Flush the VF cache at the top of all primaries Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>	2017-04-14 13:35:02 -07:00
Jason Ekstrand	939337e49f	anv/blorp: Flush the texture cache in UpdateBuffer Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>	2017-04-14 13:35:02 -07:00
Jason Ekstrand	475bab0330	anv: Limit VkDeviceMemory objects to 2GB Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2017-04-14 13:35:02 -07:00
Jason Ekstrand	4495b917e2	intel/blorp: Add a blorp_emit_dynamic macro This makes it much easier to throw together a bit of dynamic state. It also automatically handles flushing so you don't accidentally forget. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2017-04-14 13:35:02 -07:00
Bruce Cherniak	1832ef6cd9	swr: Enable MSAA in OpenSWR software renderer This patch enables multisample antialiasing in the OpenSWR software renderer. MSAA is a proof-of-concept/work-in-progress with bug fixes and performance on the way. We wanted to get the changes out now to allow several customers to begin experimenting with MSAA in a software renderer. So as not to impact current customers, MSAA is turned off by default - previous functionality and performance remain intact. It is easily enabled via environment variables, as described below. It has only been tested with the glx-lib winsys. The intention is to enable other state-trackers, both Windows and Linux and more fully support FBOs. There are 2 environment variables that affect behavior: * SWR_MSAA_FORCE_ENABLE - force MSAA on, for apps that are not designed for MSAA... Beware, results will vary. This is mainly for testing. * SWR_MSAA_MAX_SAMPLE_COUNT - sets maximum supported number of samples (1,2,4,8,16), or 0 to disable MSAA altogether. (The default is currently 0.) Reviewed-by: George Kyriazis <george.kyriazis@intel.com>	2017-04-14 15:22:45 -05:00
Bruce Cherniak	91a7f0b3af	swr: Removed unnecessary PIPE_BIND flags from swr_is_format_supported Removed unnecessary and probably wrong PIPE_BIND_SCANOUT and PIPE_BIND_SHARED flags in favor of check on single PIPE_BIND_DISPLAY_TARGET flag. Reference llvmpipe change <bee4c7718a3bd57e3d99f0913d9081cd13fe5fd> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2017-04-14 15:22:44 -05:00
Bruce Cherniak	97bbb7b6a3	swr: Align swr_context allocation to SIMD alignment. The context now contains SIMD vectors which must be aligned (specifically samplePositions in the rastState in the derived state). Failure to align can result in segv crash on unaligned memory access in vector instructions. Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2017-04-14 15:22:44 -05:00
Tim Rowley	4dcfa83114	swr: update gallium driver docs v2: add back scons section, mention additional built swr libraries Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-04-14 15:21:31 -05:00
Grazvydas Ignotas	bffdb434b7	radv: remove irrelevant comment A leftover from anv. Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-04-14 23:16:03 +03:00
Grazvydas Ignotas	1b2fe7ce45	radv: report timestampPeriod correctly The kernel returns frequency in kHz, so to convert to nanosecond interval that Vulkan uses the dividend should be 1000000.0 and not 100000.0. This fixes the GPU graph in DOOM and matches the amdgpu-pro blob. Fixes: `f4e499ec79` "radv: add initial non-conformant radv vulkan driver" Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-04-14 23:15:55 +03:00
Rob Clark	9fc3e7137a	nir/print: add compute shader info Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2017-04-14 12:46:12 -04:00
Rob Clark	16d493f1e7	gallium/docs: small correction about register files for atomics These can operate on MEMORY[], in addition to BUFFER[] and IMAGE[] Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-04-14 12:46:12 -04:00
Rob Clark	0b613c20aa	freedreno: enable draw/batch reordering by default Probably should have flipped the switch a long time ago, since it doesn't seem to cause any problems and is a nice perf boost in a number of cases. Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-04-14 12:46:12 -04:00
Rob Clark	b5cc88af5e	freedreno/ir3: small re-order Small re-order of switch statement to handled op-code categories in order. Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-04-14 12:46:12 -04:00
Rob Clark	75afd2586f	freedreno/ir3: move 'keeps' to block level For things like SSBOs and atomics we'll want to track this at a block level. Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-04-14 12:46:12 -04:00
Rob Clark	331bd3b5e1	freedreno/ir3: convert dynamic arrays to ralloc Want to move one of these under ir3_block, so that gives a reason to migrate the remaining malloc/realloc to ralloc. Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-04-14 12:46:12 -04:00
George Kyriazis	870760e02e	swr: add linux to scons build Make swr compile for both linux and windows. Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2017-04-14 10:59:46 -05:00
Bas Nieuwenhuizen	e20eb91e2b	radv: make sizes & offsets 32 bit in radv_descriptor_update_template_entry. v2: Also convert the calculations. Signed-off-by: Bas Nieuwenhuizen <basni@google.com> Reviewed-by: Fredrik Höglund <fredrik@kde.org>	2017-04-14 14:14:07 +02:00
Kenneth Graunke	7c83d44d54	docs: Update MESA_shader_integer_functions spec to version 3. When publishing this spec on the OpenGL ES registry, Jon Leech noticed that it didn't actually mention what the ES dependencies and interactions were. I looked at extensions_table.h and noted that we expose it in ES 3.0 contexts, and he added the obvious spec texts. The updated copy also contains our official extension number. https://github.com/KhronosGroup/OpenGL-Registry/issues/3 Acked-by: Matt Turner <mattst88@gmail.com>	2017-04-13 23:01:27 -07:00

... 8 9 10 11 12 ...

91544 commits