fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-19 13:38:19 +02:00

Author	SHA1	Message	Date
Thomas Helland	8d5cd91ca0	nir: Migrate nir_dce to instr worklist Shader-db runtime change avarage of five runs: Before 125,77 seconds (+/- 0,09%) After 124,48 seconds (+/- 0,07%) Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de> Reviewed-by: Eric Anholt <eric at anholt.net>	2018-03-21 19:26:40 +01:00
Thomas Helland	edb18564c7	nir: Initial implementation of a nir_instr_worklist Make a simple worklist by basically just wrapping u_vector. This is intended used in nir_opt_dce to reduce the number of calls to ralloc, as we are currenlty spamming ralloc quite bad. It should also give better cache locality and much lower memory usage. Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de> Reviewed-by: Eric Anholt <eric at anholt.net>	2018-03-21 19:26:27 +01:00
Scott D Phillips	cab8df1e3e	intel/tools: aubinator: Catch gen11 "enhanced execlist" submission Different registers are used for execlist submission in gen11, so also watch those. This code only watches element zero of the submit queue, which is all aubdump currently writes. Tested-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-03-21 11:07:15 -07:00
Marek Olšák	a8d55374dc	radeonsi: fix a snprintf warning on gcc 7.3.0	2018-03-21 13:43:09 -04:00
Marek Olšák	cf0a95afac	radeonsi/gfx9: print the swizzle mode for testdma Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-03-21 13:40:06 -04:00
Marek Olšák	f7ffa504a0	ac/surface: compute tile swizzle for GFX9 Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-03-21 13:40:06 -04:00
Eric Anholt	9f0c9c6d18	broadcom/vc5: Don't skip job submit just because everything is scissored. The coordinate shaders may now have side effects in the form of transform feedback. Part of fixing GTF-GLES3.gtf.GL3Tests.transform_feedback.transform_feedback_misc	2018-03-21 10:04:21 -07:00
Eric Anholt	024e814dee	broadcom/vc5: Handle sparsely populated SO target array. Fixes GTF-GLES3.gtf.GL3Tests.transform_feedback.transform_feedback_state_variables	2018-03-21 10:04:21 -07:00
Eric Anholt	f735ac6b1c	broadcom/vc5: Fix 3D miplevel limit to match other texture targets. Fixes segfault in GTF-GLES3.gtf.GL3Tests.texture_storage.texture_storage_texture_levels on level 13.	2018-03-21 10:04:21 -07:00
Eric Anholt	ba87d85b04	broadcom/vc5: Clamp the instance divisor to 16 bits. Fixes debug assert on GTF-GLES3.gtf.GL3Tests.instanced_arrays.instanced_arrays_divisor Signed-off-by: Eric Anholt <eric@anholt.net>	2018-03-21 10:04:21 -07:00
Lionel Landwerlin	3dd92184d5	i965: fix android build This is the equivalent of commit `5770e1d89e` for android. v2: fix xml files path and file given to --header Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Fixes: `2d2b15fbca` ("i965: fix autotools/android build") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105634 Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-03-21 18:56:47 +02:00
Caio Marcelo de Oliveira Filho	8571c577aa	nir/dead_cf: also remove useless ifs Generalize the code for remove dead loops to also remove dead if nodes. The conditions are the same in both cases, if the node (and it's children) don't have side-effects AND the nodes after it don't use the values produced by the node. The only difference is when evaluating side effects: loops consider only return jumps as a side-effect -- they can stop execution of nodes after it; 'if' nodes outside loops should consider all kinds of jumps (return, break, continue) since all of them can cause execution of nodes after it to be skipped. After this patch, empty ifs (those which both then and else blocks are empty) will be removed by nir_opt_dead_cf. It caused no change to shader-db, in part because the removal of empty ifs is currently covered by nir_opt_peephole_select. v2: Improve the identification of cases where break/continue can cause side-effects. (Jason) v3: Move code comment changes to a different patch. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-03-21 09:36:09 -07:00
Caio Marcelo de Oliveira Filho	470056d37b	nir/dead_cf: rephrase definition of a dead loop node Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-03-21 09:35:57 -07:00
Leo Liu	c4de2f0880	radeon/vce: move feedback command inside of destroy function On the CI family, firmware requires the destory command have to be the last command in the IB, moving feedback command after destroy is causing issues on CI cards, so we have to keep the previous logic that moves destroy back to the last command. But as the original issue fixed previously, with the newer family like Vega10, feedback command have to be included inside of the task info command along with destroy command. Fixes: 6d74cb25("radeon/vce: move destroy command before feedback command") Signed-off-by: Leo Liu <leo.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Cc: mesa-stable@lists.freedesktop.org	2018-03-21 11:24:35 -04:00
Aaron Watry	c95d953b18	clover: Dynamically calculate __OPENCL_VERSION__ and CLC language version Use get_language_version to calculate default cl standard based on device capabilities and -cl-std specified in build options. v5; move dev_clc_version declaration from an earlier patch v4: Squash the __OPENCL_VERSION__ and CLC language version patches v3: (Jan) Allow device_version up to 2.2 while device_clc_version only goes to 2.0 Use get_cl_version to calculate version instead v2: Split out from the previous patch (Pierre) Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr> CC: Jan Vesely <jan.vesely@rutgers.edu>	2018-03-21 06:59:46 -05:00
Aaron Watry	29b4090d18	clover/llvm: Add get_[cl\|language]_version, validation and some helpers Used to calculate the default CLC language version based on the --cl-std in build args and the device capabilities. According to section 5.8.4.5 of the 2.0 spec, the CL C version is chosen by: 1) If you have -cl-std=CL1.1+ use the version specified 2) If not, use the highest 1.x version that the device supports Curiously, there is no valid value for -cl-std=CL1.0 Validates requested cl-std against device_clc_version Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr> v7: (Pierre) Split cl/clc versions into separate lists and make more references const. v6: (Pierre) Add more const and fix some whitespace v5: (Aaron) Use a collection of cl versions instead of switch cases Consolidates the string, numeric version, and clc langstandard::kind v4: (Pierre) Split get_language_version addition and use into separate patches Squash patches that add the helpers and validate the language standard v3: Change device_version to device_clc_version v2: (Pierre) Move create_compiler_instance changes to correct patch to prevent temporary build breakage. Convert version_str into unsigned and use it to find language version Add build_error for unknown language version string Whitespace fixes	2018-03-21 06:59:37 -05:00
Eric Anholt	4d8b476fa9	intel/blorp: Fix compiler warning about num_layers. The compiler doesn't notice that the condition for num_layers to be undefined already defined it above (as our assert checked in a debug build). v2: Move the pair of assignments to one outside of the block. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-03-20 14:06:46 -07:00
Samuel Pitoiset	f0211155f1	radv: add support for VK_EXT_depth_range_unrestricted This extension removes the restrictions on minDepth/maxDepth, minDepthBounds/maxDepthBounds and VkClearDepthStencilValue::depth. The following CTS tests now pass: dEQP-VK.glsl.builtin_var.fragdepth.line_list_d32_sfloat_large_depth dEQP-VK.glsl.builtin_var.fragdepth.point_list_d32_sfloat_large_depth dEQP-VK.glsl.builtin_var.fragdepth.triangle_list_d32_sfloat_large_depth dEQP-VK.draw.inverted_depth_ranges.nodepthclamp_depth_range_unrestricted dEQP-VK.draw.inverted_depth_ranges.depthclamp_depth_range_unrestricted Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-20 21:55:41 +01:00
Samuel Pitoiset	4e9b0b39b5	radv: only enable one channel when exporting prim id It's a 32-bit integer like the layer. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-20 21:54:48 +01:00
Lionel Landwerlin	5770e1d89e	i965: fix out of tree autotools build Fixes: `2d2b15fbca` ("i965: fix autotools/android build") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>	2018-03-20 19:48:56 +00:00
Stéphane Marchesin	1117edc60d	virgl: Implement seamless cube maps This was previously ignored. Along with the virglrenderer patch, this fixes ~100 dEQP tests: dEQP-GLES3.functional.texture.filtering.cube.* Signed-off-by: Stéphane Marchesin <marcheu@chromium.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-21 05:44:52 +10:00
Emil Velikov	c43715d30b	i965: annotate brw_oa.py's --header and --code as required As of earlier commit, the --header was made a hard requirement when using --code. Hence - annotate both as required and drop a few no longer needed checks. Fixes: `035cc7a12d` ("i965: perf: reduce i965 binary size") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-03-20 17:21:49 +00:00
Lionel Landwerlin	d3e5d3955c	i965: pipecontrol: add LRI write immediate flag Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-03-20 16:58:30 +00:00
Lionel Landwerlin	7f977d51b3	intel: genxml: add INSTPM/CS_DEBUG_MODE2 registers Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-03-20 16:58:30 +00:00
Lionel Landwerlin	2d2b15fbca	i965: fix autotools/android build Autotools/android builds generate the header & code files in 2 steps, but the code generation requires the name of the header file to include it. This change generates both files in one command. Fixes: `035cc7a12d` ("i965: perf: reduce i965 binary size") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-03-20 16:58:29 +00:00
Emil Velikov	28780c5028	st/mesa: add compiler/nir/ prefix for nir includes Stay consistent with the rest of the codebase, effectively fixing the autotools build. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105621 Fixes: `ffa4bbe466` ("st/nir/radeonsi: move nir_lower_uniforms_to_ubo() to the state tracker") Cc: Timothy Arceri <tarceri@itsqueeze.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-03-20 16:11:19 +00:00
Scott D Phillips	d849d36c6c	anv: off-by-one in GetDescriptorSetLayoutSupport Loop was accessing one more than bindingCount elements from pBindings, accessing uninitialized memory. Fixes: `ddc4069122` ("anv: Implement VK_KHR_maintenance3") Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-03-20 07:58:10 -07:00
Lionel Landwerlin	035cc7a12d	i965: perf: reduce i965 binary size Performance metric numbers are calculated the following way : - out of the 256 bytes long OA reports, we accumulate the deltas into an array of uint64_t - the equations' generated code reads the accumulated uint64_t deltas and normalizes them for a particular platform Our hardware is such that a number of counters in the OA reports always return the same values (i.e. they're not programmable), and they return the same values even across generations, and as a result a number of equations are identical in different metric sets across different generations. Up to now we've kept the generated code of the equations separated in different files (per generation/GT), and didn't apply any factorization of the common equations. We could have make some improvement by reusing equations within a given metrics file, but we can go even further and reuse across generations (i.e. all files). This change changes the code generation to emit a single file in which we reuse equations emitted code based on the hash of equations' strings. Here are the savings in a meson build : Before(.old)/after : $ du -h ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old 43M ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so 47M ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old $ size build/src/mesa/drivers/dri/libmesa_dri_drivers.so build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old text data bss dec hex filename 13054002 409424 671856 14135282 d7aff2 build/src/mesa/drivers/dri/libmesa_dri_drivers.so 14550386 409552 671856 15631794 ee85b2 build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old As a side comment here is the size of the drivers if we remove all of the metrics from the build : $ du -sh build/src/mesa/drivers/dri/libmesa_dri_drivers.so 40M build/src/mesa/drivers/dri/libmesa_dri_drivers.so v2: Fix an issue with hashing of counter equations (Lionel) Build system rework (Emil) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (build system part) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-03-20 13:56:07 +00:00
Lionel Landwerlin	e9a9e85948	i965: perf: fix a counter return type on hsw The equation code computes a float (percentage) yet the return type was an uint64_t. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-03-20 11:36:13 +00:00
Tapani Pälli	604cac9f73	mesa: fix leaking ParameterValueOffset ==15115== 48 bytes in 1 blocks are definitely lost in loss record 16 of 66 ==15115== at 0x4C2EC15: realloc (vg_replace_malloc.c:785) ==15115== by 0x8602C3E: _mesa_reserve_parameter_storage (prog_parameter.c:212) ==15115== by 0x8602D1E: _mesa_add_parameter (prog_parameter.c:252) ==15115== by 0x86032C4: _mesa_add_sized_state_reference (prog_parameter.c:384) ==15115== by 0x8603324: _mesa_add_state_reference (prog_parameter.c:409) Fixes: `edded12376` "mesa: rework ParameterList to allow packing" Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-03-20 13:25:07 +02:00
Daniel Stone	478fc2d2a1	dri3: Don't fail on version mismatch The previous commit to make DRI3 modifier support optional, breaks with an updated server and old client. Make sure we never set multibuffers_available unless we also support it locally. Make sure we don't call stubs of new-DRI3 functions (or empty branches) which will never succeed. Signed-off-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Fixes: `7aeef2d4ef` ("dri3: allow building against older xcb (v3)")	2018-03-20 08:52:59 +00:00
Timothy Arceri	9a243eccae	radv: don't lower indirects until after opts have run Noticed while passing by. Not sure if it impacts anything, but likely to impact GFX9 more than anything else since we lower inputs, outputs and locals there. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-20 15:01:44 +11:00
Timothy Arceri	dfe2f19855	st/nir: fix atomic lowering for gallium drivers i965 and gallium handle the atomic buffer index differently. It was just by luck that the single piglit test for this was passing. For gallium we use the atomic binding so that we match the handling in st_bind_atomics(). On radeonsi this fixes the CTS test: KHR-GL43.shader_storage_buffer_object.advanced-write-fragment It also fixes tressfx hair rendering in Tomb Raider. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-03-20 14:29:53 +11:00
Timothy Arceri	632d5e97ef	st/radeonsi: enable uniform packing in NIR backend Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-03-20 14:19:35 +11:00
Timothy Arceri	231333a20d	st: add uniform packing support to lower_uniforms_to_ubo() Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-03-20 14:17:34 +11:00
Timothy Arceri	9c51a7ea29	gallium: add packed uniform CAP Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-03-20 14:17:34 +11:00
Timothy Arceri	ffa4bbe466	st/nir/radeonsi: move nir_lower_uniforms_to_ubo() to the state tracker This will only ever be used by gallium drivers so it probably doesn't belong in the nir toolkit. Also we want to pass it some non NIR things in the following patch. To avoid regressions we wrap the lowering calls that have been moved to st_glsl_to_nir with a quick hack so that they are only called for radeonsi, we will replace the hack with a check for uniform packing in a following patch. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-03-20 14:17:34 +11:00
Timothy Arceri	a80cf442d9	st: add st_glsl_type_dword_size() helper This will be used to support uniform packing. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-03-20 14:17:34 +11:00
Timothy Arceri	5488166730	st/glsl_to_nir: add support for packed builtin uniforms Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-03-20 14:17:34 +11:00
Timothy Arceri	57ebab64c0	mesa: add _mesa_add_sized_state_reference() helper This will be used for adding packed builtin uniforms. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-03-20 14:17:34 +11:00
Timothy Arceri	2377754329	mesa: add support propagate uniform support for packed uniforms Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-03-20 14:17:34 +11:00
Timothy Arceri	40711a7a60	mesa: allow for uniform packing when adding uniforms to param list Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-03-20 14:17:33 +11:00
Timothy Arceri	a2198d4fdb	mesa: add packing support for setting uniform handles Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-03-20 14:17:33 +11:00
Timothy Arceri	6cfa15b803	mesa: add packing support for setting uniforms Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-03-20 14:17:33 +11:00
Timothy Arceri	4a7c5c079b	mesa: create copy uniform to storage helpers These will be used in the following patch to allow copying directly to the param list when packing is enabled. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-03-20 14:17:33 +11:00
Timothy Arceri	edded12376	mesa: rework ParameterList to allow packing Currently everything is padded to 4 components. Making the list more flexible will allow us to do uniform packing. V2 (suggestions from Nicolai): - always pass existing calls to _mesa_add_parameter() true for padd_and_align - fix bindless param value offsets - remove left over wip logic from pad and align code - zero out param value padding - whitespace fix Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-03-20 14:17:33 +11:00
Timothy Arceri	b13b9eb432	mesa: add PackedDriverUniformStorage const Will be used to determine whether to take packing code paths or not. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-03-20 14:17:33 +11:00
Eric Anholt	00910e3057	broadcom/vc5: Don't annotate dumps with stale live intervals. As you're debugging register allocation, you may have changed the intervals and not recomputed yet. Just skip the dump in that case.	2018-03-19 16:44:20 -07:00
Eric Anholt	facc3c6f58	broadcom/vc5: Add support for register spilling. Our register spilling support is nice to have since vc4 couldn't at all, but we're still very restricted due to needing to not spill during a TMU operation, or during the last segment of the program (which would be nice to spill a value of, when there's a long-lived value being passed through with little modification from the start to the end). We could do better by emitting unspills for the last-segment values just before the last thrsw, since the last segment is probably not the maximum interference area. Fixes GTF uniform_buffer_object_arrays_of_all_valid_basic_types and 3 others.	2018-03-19 16:44:06 -07:00
Eric Anholt	271fc58ba1	broadcom/vc5: Remove redundant last_inst lookup. The point was to get the MOV, which the MOV_dest already returned.	2018-03-19 16:42:59 -07:00

1 2 3 4 5 ...

93073 commits