fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-07 11:28:05 +02:00

Author	SHA1	Message	Date
Brian Paul	8a32dd2ec9	st/mesa: refactor bufferobj_data() Split out some of the code into three new helper functions: buffer_target_to_bind_flags(), storage_flags_to_buffer_flags(), buffer_usage() to make the code more managable. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-06 15:23:26 -07:00
Samuel Pitoiset	3488a3f033	radv: run nir_opt_shrink_load LLVM can't shrink loads. Polaris10: Totals from affected shaders: SGPRS: 62528 -> 59955 (-4.11 %) VGPRS: 44708 -> 44616 (-0.21 %) Spilled SGPRs: 16 -> 8 (-50.00 %) Code Size: 1355504 -> 1355172 (-0.02 %) bytes Max Waves: 11710 -> 11670 (-0.34 %) Vega10: Totals from affected shaders: SGPRS: 51448 -> 50371 (-2.09 %) VGPRS: 39140 -> 39048 (-0.24 %) Spilled SGPRs: 16 -> 16 (0.00 %) Code Size: 1307188 -> 1304296 (-0.22 %) bytes Max Waves: 11312 -> 11292 (-0.18 %) This reduces SGPRs spilling in MadMax, and it also reduces number of SGPRs in DOW3 and F12017. The number of waves slightly decreases in F1 but I don't see any performance changes after benchmarking it. Talos and Serious Sam are not affected because they don't use any push constants. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-02-06 23:08:44 +01:00
Samuel Pitoiset	e68562b94b	nir: add nir_opt_shrink_load pass This is a very simple pass that just shrinks load_push_constant intrinsics when some components are unused. For now, it can just shrink vec4 to vec3, vec3 to vec2 and so on. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-02-06 23:08:39 +01:00
Timothy Arceri	e2ea9e1191	radeonsi/nir: add nir support for compiling compute shaders Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-07 08:43:08 +11:00
Timothy Arceri	9c52902c76	ac/radeonsi: add num_work_groups to the abi Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-07 08:43:08 +11:00
Timothy Arceri	f12e2f9c12	ac: implement nir_intrinsic_shader_clock Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-07 08:43:08 +11:00
Timothy Arceri	b7b89bbddb	ac/radeonsi: create ac_build_shader_clock() helper Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-07 08:43:08 +11:00
Timothy Arceri	d116af383f	ac/radeonsi: add load_local_group_size() to the abi Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-07 08:43:08 +11:00
Timothy Arceri	f6932d1ef3	radeonsi: add get_block_size() helper This will be reused by the nir backend in a later patch. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-07 08:43:08 +11:00
Timothy Arceri	e3ebffdbb0	ac: don't call emit_outputs() for compute Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-07 08:43:08 +11:00
Timothy Arceri	c8066cdfa7	ac/radeonsi: add local_invocation_ids to the abi Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-07 08:43:08 +11:00
Timothy Arceri	fa5239c153	ac/radeonsi: add workgroup_ids to the abi Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-07 08:43:08 +11:00
Timothy Arceri	64c10c9737	radeonsi/nir: gather some compute info in si_nir_scan_shader() Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-07 08:43:08 +11:00
Timothy Arceri	1142b1d3e1	radeonsi/nir: always set input_usage_mask as using all components This fixes a regression for now, in the future we should gather the used components properly. V2: just set for VS and correctly handle doubles Fixes: `be973ed21f` "radeonsi: load the right number of components for VS inputs and TBOs" Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-07 08:38:52 +11:00
Timothy Arceri	ffeebcfa7e	i965: remove unused brw_nir_lower_cs_shared() This has been unused since `8761a04d0d`. Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-02-07 08:38:01 +11:00
Bas Nieuwenhuizen	a3e42e7a69	vulkan/wsi: Fix OOM behavior with prime images. Fixes: `d50937f137` "vulkan/wsi: Implement prime in a completely generic way" Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-02-06 21:52:39 +01:00
Bas Nieuwenhuizen	c7d640fbbf	ac/nir: fix GS load input type. Fixes: `df1d5174fc` "ac/nir: replace SI.buffer.load.dword with amdgcn.buffer.load" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-02-06 21:52:38 +01:00
Mathias Fröhlich	e8a9473d32	mesa: Factor out _mesa_disable_vertex_array_attrib. And use it in the enable code path. Move _mesa_update_attribute_map_mode into its only remaining file. Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-02-06 21:20:14 +01:00
Mathias Fröhlich	236657842b	vbo: Move vbo_rebase into its only caller module tnl. Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-02-06 21:20:14 +01:00
Mathias Fröhlich	2313c33e95	mesa: Use atomics for buffer objects reference counts. The mutex is currently used for reference counting and updating the minmax index cache. The change uses atomics directly for reference counting and the mutex for the minmax cache. This is safe since the reference count is not modified beside in _mesa_reference_buffer_object where atomics aim to be used. While using the minmax cache, the calling code holds a reference to the buffer object. Thus unreferencing or even referencing the buffer object does not need to be serialized with accessing the minmax cache. The change reduces the time _mesa_reference_buffer_object_ takes by about a factor of two when looking at perf results for some of my favorite use cases. Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-02-06 21:20:14 +01:00
Dave Airlie	6c691081a1	r600: fixup sparse color exports. If we have gaps in the shader mask we have to have 0x1 in them according to a comment in radeonsi, and this is required to fix the test at least on cayman. We also need to record the highest one written to write to the ps exports reg. This fixes: KHR-GL45.enhanced_layouts.fragment_data_location_api Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-02-07 06:16:59 +10:00
Dave Airlie	2d5b5d267e	r600: work out target mask at framebuffer bind. If we only get 1,2,3,6 framebuffers we want a sparse target mask. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-02-07 06:16:55 +10:00
Dave Airlie	5b14e06d8b	r600: work out shader export mask at shader build time (v1.1) Since enhanced layouts allows setting specific MRT outputs, we can get sparse outputs, so we have to calculate the shader mask earlier. v1.1: update checks for state update (Roland) Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-02-07 06:16:27 +10:00
Dave Airlie	f292eceae1	r600: fix xfb stream check. This fixes: KHR-GL45.enhanced_layouts.xfb_vertex_streams Reviewed-by: Roland Scheidegger <sroland@vmware.com> Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-02-07 06:08:12 +10:00
Dave Airlie	680cb9898a	r600/compute: add render cond support. Set render cond and emit atom. Fixes: KHR-GL45.compute_shader.conditional-dispatching Reviewed-by: Roland Scheidegger <sorland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-02-07 06:08:12 +10:00
Dave Airlie	5fd7b282b3	r600: fix not-very indirect compute We need to get the grid sizes earlier to fill in to the const buffer. Fixes: KHR-GL45.compute_shader.built-in-variables and KHR-GL45.compute_shader.dispatch-indirect Reviewed-by: Roland Scheidegger <sorland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-02-07 06:08:12 +10:00
Dave Airlie	00a112641b	r600: overhaul buffer resource query. This cleans up and fixes the previous fix even more. Buffers from textures start at max const, buffers from buffers/images come in from the 168 offset. This fixes a bunch of: KHR-GL45.shader_storage_buffer_object* Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-02-07 06:08:12 +10:00
Dave Airlie	736b150768	r600/eg: fix buffer sizing. For buffers we want the size in bytes, For images we want it in elements. This fixes: KHR-GL45.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-pad Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-02-07 06:08:12 +10:00
Dave Airlie	c9c4f0b722	r600/images: set offset for compute shaders with number of declared samplers for frag shaders we get a value in the key, I expect I need to make compute work better Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-02-07 06:08:12 +10:00
Dave Airlie	ab5cee4c24	r600/compute: only mark buffer/image state dirty for fragment shaders The compute emission path always emits this currently, and emitting it on the fragment path breaks the blitter. This fixes gpu hangs in KHR-GL45.compute_shader.resource-texture Reviewed-by: Roland Scheidegger <sorland@vmware.com> Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-02-07 06:08:12 +10:00
Dave Airlie	4e3b43f180	r600/atomic: fix ATOMCAS instruction. This has 4 srcs. This fixes: KHR-GL45.shader_atomic_counter_ops_tests.ShaderAtomicCounterOpsExchangeTestCase Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-02-07 06:08:11 +10:00
Dave Airlie	8bdad9fa1f	r600/sb/cayman: fix indirect ubo access on cayman With sb enabled on cayman, this was overwriting the proper cf index value with random ones if the dst gpr was 2 or 3, only save the value for a MOVA instruction. Fixes: KHR-GL45.gpu_shader5.uniform_blocks_array_indexing (on cayman with sb) Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-02-07 06:08:11 +10:00
Dave Airlie	012100b809	r600/eg: use texture target to pick array size not view target (v2) This fixes a few CTS cases in : KHR-GL45.texture_view.view_sampling some multisample cases are still broken, but not sure this is the same problem. v2: fix more cases Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-02-07 06:08:11 +10:00
Dave Airlie	e7e81f362d	radv: don't support tc-compat on multisample d32s8 at all. RX550 fails dEQP-VK.renderpass.suballocation.multisample.d32_sfloat_s8_uint.samples_2 So increase the range of the workaround. Fixes: `f4c534ef6` (radv: don't enable tc compat for d32s8 + 4/8 samples (v1.1)) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-02-06 19:56:00 +00:00
Michal Navratil	4081e08896	winsys/amdgpu: allow non page-aligned size bo creation from pointer Fix INVALID_OPERATION caused by BufferData with target EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD when the buffer size is not page aligned. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Cc: 17.3 18.0 <mesa-stable@lists.freedesktop.org>	2018-02-06 18:51:12 +01:00
Jon Turney	9440599c8e	meson: ensure xmlpool/options.h is generated for libgallium In file included from ../src/gallium/targets/dri/target.c:1: In file included from ../src/gallium/auxiliary/target-helpers/drm_helper.h:8: ../src/util/xmlpool.h:103:10: fatal error: 'xmlpool/options.h' file not found See also `26bde1e3`. Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>	2018-02-06 15:56:12 +00:00
Andres Gomez	1ec88755c2	vbo: provide 64bits support to print_draw_arrays Cc: Mathias Fröhlich <mathias.froehlich@web.de> Cc: Brian Paul <brianp@vmware.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>	2018-02-06 15:30:29 +02:00
Andres Gomez	0057ae4038	vbo: take into account the size when printing VAO elements When using print_draw_arrays for debugging, we were printing an "n" amount of vertex but that meant not to print all the size in the "n" vertex, depending on the stride used. Now we print the whole size in the "n" vertex. Cc: Mathias Fröhlich <mathias.froehlich@web.de> Cc: Brian Paul <brianp@vmware.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>	2018-02-06 15:30:23 +02:00
Andres Gomez	c9325b4fa9	vbo: print first element of the VAO when the binding stride is 0 Cc: Mathias Fröhlich <mathias.froehlich@web.de> Cc: Brian Paul <brianp@vmware.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>	2018-02-06 15:30:12 +02:00
Iago Toral Quiroga	a5053ba27e	anv/device: initialize the list of enabled extensions properly The loop goes through the list of enabled extensions marking them as enabled in the list, but this relies on every other extension being initialized to false by default. This bug would make us, for example, advertise certain device extension entry points as available even when the corresponding extensions had not been enabled. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Fixes: `abc62282b5` "anv: Add a per-device table of enabled extensions" Cc: "18.0" <mesa-stable@lists.freedesktop.org>	2018-02-06 07:51:00 +01:00
Iago Toral Quiroga	ef439a4fdc	spirv: split constant initializers on in/out structs The SPIR-V parser splits in/out struct variables and creates a separate variable for each first-level member of the struct. When the struct variable has an initializer this means that we also need to split the initializer. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-02-06 07:50:18 +01:00
Iago Toral Quiroga	1d20001d97	i965/nir: do int64 lowering before optimization Otherwise loop unrolling will fail to see the actual cost of the unrolling operations when the loop body contains 64-bit integer instructions, and very specially when the divmod64 lowering applies, since its lowering is quite expensive. Without this change, some in-development CTS tests for int64 get stuck forever trying to register allocate a shader with over 50K SSA values. The large number of SSA values is the result of NIR first unrolling multiple seemingly simple loops that involve int64 instructions, only to then lower these instructions to produce a massive pile of code (due to the divmod64 lowering in the unrolled instructions). With this change, loop unrolling will see the loops with the int64 code already lowered and will realize that it is too expensive to unroll. v2: Run nir_algebraic first so we can hopefully get rid of some of the int64 instructions before we even attempt to lower them. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-02-06 07:49:27 +01:00
Ilia Mirkin	02a6d901ee	mesa: add OES_EGL_image_external_essl3 support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-02-06 07:28:11 +02:00
Vinson Lee	fe32f796f2	r600/fp64: Fix build. CC r600_shader.lo r600_shader.c: In function ‘egcm_int_to_double’: r600_shader.c:4543:12: error: ‘ctx’ is a pointer; did you mean to use ‘->’? if (ctx.bc->chip_class == CAYMAN) ^ -> Fixes: `35b4301577` ("r600/fp64: fix integer->double conversion") Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-02-05 15:32:20 -08:00
Dave Airlie	35b4301577	r600/fp64: fix integer->double conversion Doing a straight uint/int->fp32->fp64 conversion causes some precision issues, Roland suggested splitting the integer into two portions and doing two separate int->fp32->fp64 conversions then adding the results. This passes the tests in CTS and piglit. [airlied: fix cypress conversion opcodes] Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-02-06 08:21:48 +10:00
Samuel Pitoiset	0170ae1e23	ac/nir: remove emission of nir_op_fdiv RadeonSI and RADV lower fdiv. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-02-05 23:09:34 +01:00
Jon Turney	b5af199f92	travis: add macOS meson build v2: Simplify set of options now we have better defaults Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-02-05 19:42:01 +00:00
Jon Turney	80bc41b2ec	meson: osx ld doesn't support --build-id Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-02-05 19:40:43 +00:00
Jon Turney	ea8730024f	meson: build src/glx/apple Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-02-05 19:40:43 +00:00
Dylan Baker	569628dd24	meson: set apple glx defines Reviewed-by: Jon Turney <jon.turney@dronecode.org.uk> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-02-05 19:40:43 +00:00

1 2 3 4 5 ...

99844 commits