fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-08 06:20:19 +01:00

Author	SHA1	Message	Date
Dylan Baker	113bb8d448	glsl: fix general_ir_test with mingw Somewhere down in the depths of the mingw headers 'interface' is defined, change it to iface like a similar patch did. Signed-off-by: Dylan Baker <dylan.c.baker@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-03 10:57:17 -07:00
Dylan Baker	f1d5f2aff3	meson: always define libglapi This allows the identifier to be used even if shared-glapi isn't build, which simplifies a bunch of things. Signed-off-by: Dylan Baker <dylan.c.baker@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-03 10:57:10 -07:00
Chuck Atkins	a381dbf253	meson: Fix missing glproto dependency for gallium-glx Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com> Cc: mesa-stable <mesa-stable@lists.freedesktop.org> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-03 13:36:25 -04:00
Samuel Pitoiset	4f18c43d1d	radv: apply the indexing workaround for atomic buffer operations on GFX9 Because the new raw/struct intrinsics are buggy with LLVM 8 (they weren't marked as source of divergence), we fallback to the old instrinsics for atomic buffer operations only. This means we need to apply the indexing workaround for GFX9. The load/store operations still use the new LLVM 8 intrinsics. The fact that we need another workaround is painful but we should be able to clean up that a bit once LLVM 7 support will be dropped. This fixes a GPU hang with AC Odyssey and some rendering problems with Nioh. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110573 Fixes: `31164cf5f7` ("ac/nir: only use the new raw/struct image atomic intrinsics with LLVM 9+") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-03 17:59:12 +02:00
Lionel Landwerlin	80dc78407d	anv: fix crash when application does not provide push constants Found while running Talos Principle. As far as I can tell running a draw call with a pipeline having push constants without the application having called vkCmdPushConstants gives undefined push constant values. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: mesa-stable@lists.freedesktop.org	2019-05-03 10:21:40 +01:00
Samuel Pitoiset	e68d7bec67	radv: fix radv_get_aspect_format() for D+S formats This restores the previous behaviour before YCBCR landed. For D+S formats, it returns the depth format. This fixes an assertion with Thrones of Britannia. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110540 Fixes: `66507cc656` ("radv: Add single plane image views & meta operations") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-03 09:01:10 +02:00
Caio Marcelo de Oliveira Filho	aa675cef5e	intel/fs: Assert when brw_fs_nir sees a nir_deref_instr Since `09f1de97a7` "anv,i965: Lower away image derefs in the driver" the backend compiler is not expected to handle any derefs, so let's assert on it. This helps identifying problems when a deref is not lowered and "leaks" into the backend compiler. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-02 23:25:30 -07:00
Julien Isorce	a77512635e	r600: implement resource_get_info Factoring code with resource_get_handle. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110443 Signed-off-by: Julien Isorce <jisorce@oblong.com> Reviewed-by: Dave Airlie airlied@redhat.com	2019-05-03 05:54:28 +00:00
Dave Airlie	512a31a412	util/bitset: fix bitset range mask calculations. The MASK macro is used in the RANGE macro, and it should return the pre-bitset word mask for the (b) value. i.e. BITSET_MASK(0) should be undefined since it's meaningless. BITSET_MASK(31) should give 0x7fffffff BITSET_MASK(32) should give 0xffffffff BITSET_MASK(33) should give 0x00000001 BITSET_MASK(64) should give 0xffffffff However then BITSET_RANGE ends up broken for cases where it's (b) value is the 0,32,64 value as in that case the lower mask would be 0 not 0xffffffff. This fixes the unit tests that I've added, and my code that uses bitsets. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Fixes: `bb38cadb1c` "More GLSL code" Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-03 15:23:04 +10:00
Dave Airlie	18973a450e	util/tests: add basic unit tests for bitset The last test here currently fails as there is a bug in bitset.h Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-03 15:23:04 +10:00
Dave Airlie	6fd6246d92	nir: fix lower vars to ssa for larger vector sizes. This has a couple of hardcoded vec4 limits in it, change them to the proper sizing to avoid future issues. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-03 15:23:00 +10:00
Dave Airlie	2774d39366	spirv: fix SpvOpBitSize return value. The spir-v spec says this returns a bool. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-03 15:22:57 +10:00
Kenneth Graunke	5ff5d0a895	iris: Disable dual source blending when shader doesn't handle it This is a port of Danylo's `eca4a6548d` which fixed the hang on i965. It fixes GPU hangs in his new Piglit test, arb_blend_func_extended-dual-src-blending-discard-without-src1. I avoided my own review feedback here, and decided to simply adjust 3DSTATE_PS_BLEND rather than BLEND_STATE_ENTRY[0]. It has never been clear to me which the hardware uses in every case. However, whacking the enable in 3DSTATE_PS_BLEND seems to be sufficient to fix the hang, and that packet is already dynamic, so it's easy to handle. I'd rather avoid making BLEND_STATE_ENTRY[0] dynamic unless I have to.	2019-05-02 21:14:49 -07:00
Jason Ekstrand	be7e9870d6	anv: Stop including POS in FS input limits It is an input but it comes in as part of the shader payload and doesn't count towards the limits. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-02 18:56:51 -05:00
Rob Clark	b73dd91f60	nir: fix nir tex print harder Fixes: `691d5a825a` nir: rework tex instruction printing Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 15:06:01 -07:00
Erico Nunes	568e8fc736	lima/ppir: support nir_op_ftrunc Support nir_op_ftrunc by turning it into a mov with a round to integer output modifier. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-05-02 20:55:56 +00:00
Heinrich	9b80322532	gbm: Improve documentation of BO import - Add GBM_BO_IMPORT_FD_MODIFIER to documentation of supported foreign object types - Add newline before documentation block - Improve language Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-05-02 20:36:38 +00:00
Samuel Pitoiset	62001f3dff	radv: only need to force emit the TCS regs on Vega10 and Raven1 Other GFX9 chips aren't affected. Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-02 22:29:01 +02:00
Marek Olšák	b3a26d4628	glsl: fix and clean up NV_compute_shader_derivatives support - make sure compute shader derivatives are exposed for all extensions - unify duplicated code Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-02 16:09:24 -04:00
Marek Olšák	20909284f2	st/dri: decrease input lag by syncing sooner in SwapBuffers It's done by: - decrease the number of frames in flight by 1 - flush before throttling in SwapBuffers (instead of wait-then-flush, do flush-then-wait) The improvement is apparent with Unigine Heaven. Previously: draw frame 2 wait frame 0 flush frame 2 present frame 2 The input lag is 2 frames. Now: draw frame 2 flush frame 2 wait frame 1 present frame 2 The input lag is 1 frame. Flushing is done before waiting, because otherwise the device would be idle after waiting. Nine is affected because it also uses the pipe cap.	2019-05-02 16:09:24 -04:00
Erik Faye-Lund	28f18915b8	meson: lift driver-collection out into parent build-file This way we can mark the dri_drivers and dri_link arrays as temporary, as all knowledge about them are contained in a single build-file with clearly visible limited life-span. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-02 18:30:29 +00:00
Rob Clark	8c77e669a8	freedreno/a6xx: smaller hammer for fb barrier We just need to do a sequence of commands to flush the cache. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	6fa8a6d60f	freedreno/a6xx: KHR_blend_equation_advanced support Wire up support to sample from the fb (and force GMEM rendering when we have fb reads). The existing GLSL IR lowering for blend_equation_advanced does the rest. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	650246523b	freedreno/ir3: fb read support Lower load_output to txf_ms_fb and add support for the new texture fetch instruction. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	0704ddb2e5	freedreno/drm: expose GMEM_BASE address Needed for sampling from tile buffer (GMEM). Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	a99c360a46	nir: add pass to lower fb reads Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	a2c89a85f4	nir: fix lower_wpos_ytransform in load_frag_coord case Apparently we never hit this path. Or at least haven't for a rather long time. But in either case (load_deref or load_frag_coord), we can just directly use the intrinsic's ssa dest. So stop passing the nir_variable (which would be NULL in the load_frag_coord case) around and instead just use &intr->dest.ssa. (This ofc means we need to setup the cursor to insert after the instruction, which seems to be another bug of the original implementation.) Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	691d5a825a	nir: rework tex instruction printing The extra comma at the end was annoying me. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	ca3eb5db66	freedreno/ir3: add some ubo range related asserts And a comment.. since we are mixing units of bytes/dwords/vec4, hopefully this will avoid some unit confusion. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 11:19:22 -07:00
Rob Clark	e941faf3e8	freedreno/ir3: add IR3_SHADER_DEBUG flag to disable ubo lowering It isn't quite as simple as not running the pass, since with packed varyings we get load_ubo for block==0 (ie. the "real" uniforms). So instead run the pass normally but decline to lower anything in block > 0 Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 11:19:22 -07:00
Rob Clark	f697f61590	freedreno/ir3: fix lowered ubo region alignment Since we emit UBO regions INDIRECTly (ie. not copied into cmdstream but emit by EXT_SRC_ADDR) we need to keep them 4*vec4 aligned. Which the code already mostly did, except for aligning the first UBO region itself (ie. the one after block==0 which is the "real" uniforms). Fixes: `893425a607` freedreno/ir3: Push UBOs to constant file Fixes: `3c8779af32` freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 11:19:22 -07:00
Rob Clark	32925f4072	freedreno/ir3: fix shader variants vs UBO analysis Otherwise we zero out the state again, but all the UBO loads that we could lower are already lowered. End result is that we didn't emit the uniforms for lowered UBO access in any case where multiple shader variants are used. Fixes: `893425a607` freedreno/ir3: Push UBOs to constant file Fixes: `3c8779af32` freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 11:19:22 -07:00
Lionel Landwerlin	ff4168c418	vulkan/overlay: add TODO list Keen on having other people contribute. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 17:02:57 +01:00
Lionel Landwerlin	99cb2d325f	vulkan/overlay: make overriden functions static And fix the unused CmdDrawIndirect. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:57 +01:00
Lionel Landwerlin	f2afd6bd76	vulkan/overlay: make overlay size configurable Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:55 +01:00
Lionel Landwerlin	7d908038ad	vulkan/overlay: add a frame counter option This is useful to normalize the numbers written into the output file as those number are accumulated over a period of time and number of frames. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:35 +01:00
Lionel Landwerlin	81fd6ba7cc	vulkan/overlay: record all select metrics into output file The output looks something like this (csv style) : fps, frame, frame_timing(us), submit, draw_indexed, pipeline_graphics, acquire_timing(us), vert_invocations, frag_invocations, gpu_timing(ns) 480.55, 242, 501512, 247, 1444, 1204, 714, 5827272, 113043296, 121424174 467.80, 234, 500214, 234, 1412, 1176, 648, 5635680, 109436188, 117743760 424.37, 213, 501923, 213, 2130, 1704, 623, 5132448, 99657292, 105474683 472.15, 237, 501962, 237, 2370, 1896, 667, 5710752, 110924644, 122226004 411.32, 206, 500826, 206, 2060, 1648, 709, 4963776, 96491764, 95333273 458.87, 230, 501228, 230, 2300, 1840, 634, 5542080, 107758204, 123112090 475.01, 238, 501044, 238, 2380, 1904, 631, 5734848, 111477480, 122087426 471.08, 236, 500972, 236, 2360, 1888, 655, 5686656, 110498496, 114816162 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:34 +01:00
Lionel Landwerlin	74a9fdd8a2	vulkan/overlay: add a margin to the size of the window Looks a bit better. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:07 +01:00
Lionel Landwerlin	7ba50d8040	vulkan/overlay: add no display option In case you're just interested in data being record to the output file. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:07 +01:00
Lionel Landwerlin	ea7a6fa980	vulkan/overlay: add pipeline statistic & timestamps support v2: switch to VkBase{In,Out}Structure v3: Add timestamps at begin/end of primary command buffers to estimate gpu time spent per submission (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v2)	2019-05-02 17:02:06 +01:00
Lionel Landwerlin	4438188f49	vulkan/overlay: record stats in command buffers and accumulate on exec/submit This significantly reworks how numbers displayed are computed. We accumulate operations written into command buffers and add those to the device when submitted to a queue. These collected values are then used to compute per frame overlay data. We also accumulate the data over the sampling fps period to produce numbers for that period of time. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:06 +01:00
Lionel Landwerlin	9eddceef44	vulkan/overlay: update help printout Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 17:02:06 +01:00
Lionel Landwerlin	a1e6b5e9be	vulkan/util: generate a helper function to return pNext struct sizes This will be used to copy chains of structures so that we can alterate some of them. v2: Drop vk_util.h include (Eric) Use VkBaseInStructure directly (Eric) v3: Drop --platforms= param to generator script, instead produce a file with #ifdef based what platforms are compiled. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 17:02:02 +01:00
Tomeu Vizoso	ad7c9ba0ec	panfrost/midgard: Skip liveness analysis for instructions without dest [Alyssa: Add comment explanation] Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-02 15:29:48 +00:00
Tomeu Vizoso	a5dddc2d42	panfrost/midgard: Skip register allocation if there's no work to do Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-02 15:29:41 +00:00
Eric Engestrom	a34ee4dec7	egl: hard-code destroy function instead of passing it around as a pointer Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-05-02 14:44:16 +00:00
Connor Abbott	6ec4ed48fc	nir/search: Add debugging code to dump the pattern matched This was useful while debugging the previous commit. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-02 16:14:06 +02:00
Connor Abbott	7ce86e6938	nir/search: Add automaton-based pre-searching nir_opt_algebraic is currently one of the most expensive NIR passes, because of the many different patterns we've added over the years. Even though patterns are already sorted by opcode, there are still way too many patterns for common opcodes like bcsel and fadd, which means that many patterns are tried but only a few actually match. One way to fix this is to add a pre-pass over the code that scans it using an automaton constructed beforehand, similar to the automatons produced by lex and yacc for parsing source code. This automaton has to walk the SSA graph and recognize possible pattern matches. It turns out that the theory to do this is quite mature already, having been developed for instruction selection as well as other non-compiler things. I followed the presentation in the dissertation cited in the code, "Tree algorithms: Two Taxonomies and a Toolkit," trying to keep the naming similar. To create the automaton, we have to perform something like the classical NFA to DFA subset construction used by lex, but it turns out that actually computing the transition table for all possible states would be way too expensive, with the dissertation reporting times of almost half an hour for an example of size similar to nir_opt_algebraic. Instead, we adopt one of the "filter" approaches explained in the dissertation, which trade much faster table generation and table size for a few more table lookups per instruction at runtime. I chose the filter which resulted the fastest table generation time, with medium table size. Right now, the table generation takes around .5 seconds, despite being implemented in pure Python, which I think is good enough. Based on the numbers in the dissertation, the other choice might make table compilation time 25x slower to get 4x smaller table size, but I don't think that's worth it. As of now, we get the following binary size before and after this patch: text data bss dec hex filename 11979455 464720 730864 13175039 c908ff before i965_dri.so text data bss dec hex filename 12037835 616244 791792 13445871 cd2aef after i965_dri.so There are a number of places where I've simplified the automaton by getting rid of details in the LHS patterns rather than complicate things to deal with them. For example, right now the automaton doesn't distinguish between constants with different values. This means that it isn't as precise as it could be, but the decrease in compile time is still worth it -- these are the compilation time numbers for a shader-db run with my (admittedly old) database on Intel skylake: Difference at 95.0% confidence -42.3485 +/- 1.375 -7.20383% +/- 0.229926% (Student's t, pooled s = 1.69843) We can always experiment with making it more precise later. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-02 16:14:06 +02:00
Samuel Pitoiset	08be23bfde	radv: set WD_SWITCH_ON_EOP=1 when drawing primitives from a stream output buffer According to RadeonSI, this seems to be required by the hardware to avoid GPU hangs. I think I just forgot to set that bit when I implemented VK_EXT_transform_feedback. This fixes a GPU hang with Space Engineers and DXVK. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110291 Fixes: `b4eb029062` ("radv: implement VK_EXT_transform_feedback") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-02 15:55:46 +02:00
Brian Paul	48107b5a2b	glsl: fix typo in #warning message Trivial. Spotted by Eric Engestrom.	2019-05-02 06:32:57 -06:00

1 2 3 4 5 ...

102094 commits