fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-24 23:38:10 +02:00

Author	SHA1	Message	Date
Marek Olšák	a72ed2f6bc	radeonsi: move MRT color exporting into a separate function This will be used by a fragment shader epilog. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:06 +01:00
Marek Olšák	0ffe3d3772	radeonsi: use EXP_NULL for pixel shaders without outputs This never happens currently. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:06 +01:00
Marek Olšák	677c65968b	radeonsi: only use LLVMBuildLoad once when updating color outputs at the end without LLVMBuildStore. So: - do LLVMBuildLoad - update the values as necessary - export Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:06 +01:00
Marek Olšák	185267a6fd	radeonsi: export "undef" values for undefined PS outputs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:06 +01:00
Marek Olšák	1ce659f820	radeonsi: move MRTZ export into a separate function Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:06 +01:00
Marek Olšák	5f3e6b5b0f	radeonsi: simplify setting the DONE bit for PS exports First find out what the last export is and simply set the DONE bit there. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:06 +01:00
Marek Olšák	e00f3f23b1	radeonsi: set SPI color formats and CB_SHADER_MASK outside of compilation Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:06 +01:00
Marek Olšák	4e597c25c7	radeonsi: write all MRTs only if there is exactly one output This doesn't fix a known bug, but better safe than sorry. Also, simplify the expression in si_shader.c. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:06 +01:00
Marek Olšák	746a7a7498	radeonsi: determine SPI_SHADER_Z_FORMAT outside of shader compilation Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:05 +01:00
Marek Olšák	2cb8bf90cd	radeonsi: determine DB_SHADER_CONTROL outside of shader compilation because the API pixel shader binary will not emulate alpha test one day, so the KILL_ENABLE bit must be determined elsewhere. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:05 +01:00
Marek Olšák	ff7e77724e	tgsi/scan: set which color components are read by a fragment shader This will be used by radeonsi. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:05 +01:00
Marek Olšák	18ec76730a	tgsi/scan: fix tgsi_shader_info::reads_z This has no users in Mesa. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:05 +01:00
Marek Olšák	f3658be108	tgsi/scan: set if a fragment shader writes sample mask This will be used by radeonsi. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-07 18:26:05 +01:00
Roland Scheidegger	8d4039ecdb	softpipe: tell draw about the vertex layout we want This makes it more similar to llvmpipe. It also allows us to let draw emit code handle things like getting zeros for non-existing vs outputs automatically. There probably isn't really any overhead either way, there isn't really any "simply copy everything" code in the emit path it would copy each attrib individually just the same. Likewise, we still do another mapping step in softpipe as the layout may still not match exactly (same as in llvmpipe, should probably nuke the pointless mapping in both drivers). This fixes the piglit arb_fragment_layer_viewport no_gs/no_write tests. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-01-07 02:00:04 +01:00
Roland Scheidegger	8e3a76791f	llvmpipe: use ints not unsigned for slots They can't actually be 0 (as position is there) but should avoid confusion. This was supposed to have been done by `af7ba989fb` but I accidentally pushed an older version of the patch in the end... Also prettify slightly. And make some notes about the confusing and useless fs input "map". Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-01-07 01:59:17 +01:00
Roland Scheidegger	2dbc20e456	draw: nuke the interp parameter from vertex_info draw emit couldn't care less what the interpolation mode is... This somehow looked like it would matter, all drivers more or less dutifully filled that in correctly. But this is only used for emit, if draw needs to know about interpolation mode (for clipping for instance) it will get that information from the vs anyway. softpipe actually used to depend on that interpolation parameter, as it abused that structure quite a bit but no longer. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-01-07 01:58:05 +01:00
Roland Scheidegger	892e2d1395	softpipe: don't abuse the draw vertex_info struct for something different softpipe would calculate two "vertex layouts". The second one was however just used for internal purposes, draw would know nothing about it even though it looked exactly the same as the other one we tell draw about. So, store that information separately as this was just confusing. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-01-07 01:57:21 +01:00
Roland Scheidegger	b64d008052	softpipe: fix mapping of "special" vs outputs Unlike llvmpipe, softpipe always tells draw to emit the vertices as-is. The two vertex layouts it calculates are a bit confusing, one which is just used to tell draw to emit vertices as-is, and the other which has draw written all over it but draw is completely unaware of and is used only to look up the correct interpolation info later in setup. Thus, the slots used are different to what llvmpipe does (I'm going to clean up the confusing two layout stuff). Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-01-07 01:56:43 +01:00
Roland Scheidegger	01761a38e8	llvmpipe: scratch some special handling of vp_index/layer It was actually slightly buggy (missing initialization / setup not dependent on new vs albeit I didn't see issues), but the case of non-existing attributes is now handled by draw emit code so don't need that anymore. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-01-07 01:55:45 +01:00
Roland Scheidegger	afa035031f	draw: rework handling of non-existing outputs in emit code Previously the code would just redirect requests for attributes which don't exist to use output 0. Rework this to output all zeros instead which seems more useful - in particular some extensions like ARB_fragment_layer_viewport require 0 in the fs even if it wasn't output by previous stages. That way, drivers don't have to special case this depending if the vs/gs outputs some attribute or not. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-01-07 01:52:39 +01:00
Sinclair Yeh	0819287f56	svga: Rename SVGA_HINT_FLAG_DRAW_EMITTED Rename SVGA_HINT_FLAG_DRAW_EMITTED to SVGA_HINT_FLAG_CAN_PRE_FLUSH because preemptive flush can be unblocked by more commands than draw. Reviewed-by: Brian Paul <brianp@vmware.com>	2016-01-06 16:04:45 -07:00
Sinclair Yeh	9ccc716534	svga: allow preemptive flushing on DMA, update, and readback commands The existing code effectively turns off preemptive flushing for all but the regions used for draws. This turns out to be overly restrictive as some memory regions, e.g. GMR, may never get a draw when used as a DMA upload staging area, causing problems for apps that upload a large amount of textures, e.g. Unigine Heaven. This patch fixes the Unigine Heaven memory allocation error and has been verified to not cause a regression in the previous extended retina display issue. Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-01-06 16:03:33 -07:00
Charmaine Lee	b074a5b02d	svga: skip vertex attribute instruction with zero usage_mask In emit_input_declarations(), we are skipping declarations for those registers that are not being used. But in emit_vertex_attrib_instructions(), we are still emitting instructions to tweak the vertex attributes even if they are not being used. This causes an assert in the backend because an input register is not declared in the shader. This patch fixes the problem by skipping the instruction if the vertex attribute is not being used. Changes in this patch is originated from the code snippet from Jose as suggested in bug 1530161. Tested with piglit, Heaven, Turbine, glretrace. Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-01-06 16:01:38 -07:00
Krzysztof Sobiecki	0d7477a289	gallium/r600: Replace ALIGN_DIVUP with DIV_ROUND_UP ALIGN_DIVUP is a driver specific(r600g) macro that duplicates DIV_ROUND_UP functionality. Replacing it with DIV_ROUND_UP eliminates this problems. Signed-off-by: Krzysztof A. Sobiecki <sobkas@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-01-06 16:09:12 -05:00
Eric Anholt	bbd29f1375	vc4: Fix driver build from last minute rebase fix. I had the driver all tested for the last series, and in my last build I noticed that get_swizzled_channel was unused now, and removed it... apparently without testing to find that I removed the wrong channel swizzle function.	2016-01-06 12:49:45 -08:00
Eric Anholt	25aa436e86	vc4: Optimize out a comparison for bcsel based on an ALU comparison We routinely have code like: vec1 ssa_220 = fge ssa_104, ssa_61 vec1 ssa_199 = bcsel ssa_220, ssa_106, ssa_105 and we would compare fge's args and choose between ~0 and 0 to generate ssa_220, then compare ssa_220 to 0 and choose between bcsel's args. Instead, try to notice the pattern and compare between fge's args to select between bcsel's args. total instructions in shared programs: 88019 -> 87574 (-0.51%) instructions in affected programs: 9985 -> 9540 (-4.46%) total estimated cycles in shared programs: 245752 -> 245237 (-0.21%) estimated cycles in affected programs: 17232 -> 16717 (-2.99%)	2016-01-06 12:43:09 -08:00
Eric Anholt	7a9eb76786	vc4: Add missing sRGB decode to texel fetches. We only see txf on MSAA textures, currently, and apparently this didn't impact any of our piglit tests.	2016-01-06 12:43:09 -08:00
Eric Anholt	f01ca9eeda	vc4: Add support for GL_ARB_texture_swizzle. We already had the code supporting it, since it's needed for the depth mode when doing shadow comparisons.	2016-01-06 12:43:09 -08:00
Eric Anholt	12519a972f	vc4: Use NIR texture lowering for texture swizzling. We can't use its other features currently (mostly because we don't want Newton-Raphson on rcps for texture coordinates), but it gets us started. This eliminates some comparisons with constants in GLB2.7 and ETQW traces at the QIR level by moving the comparisons into NIR, where they get constant-folded out. instructions in affected programs: 165 -> 156 (-5.45%) total uniforms in shared programs: 32087 -> 32085 (-0.01%) total estimated cycles in shared programs: 245762 -> 245752 (-0.00%) estimated cycles in affected programs: 461 -> 451 (-2.17%)	2016-01-06 12:43:08 -08:00
Eric Anholt	71db7d3dc5	vc4: Replace the SSA-style SEL operators with conditional MOVs. I'm moving away from QIR being SSA (since NIR is doing lots of SSA optimization for us now) and instead having QIR just be QPU operations with virtual registers. By making our SELs be composed of two MOVs, we could potentially coalesce the registers for the MOV's src and dst and eliminate the MOV. total instructions in shared programs: 88448 -> 88028 (-0.47%) instructions in affected programs: 39845 -> 39425 (-1.05%) total estimated cycles in shared programs: 246306 -> 245762 (-0.22%) estimated cycles in affected programs: 162887 -> 162343 (-0.33%)	2016-01-06 12:39:51 -08:00
Eric Anholt	0a89f307f9	vc4: Don't try the SF coalescing unless it's on a def. If you want the SF of the value of a register produced from a series of packing MOVs or conditional MOVs, we can't just SF on the last MOV into the register.	2016-01-06 12:39:27 -08:00
Edward O'Callaghan	1953cee6d7	gallium/drivers/svga: Use unsigned for loop index Fix a 's/unsigned int/unsigned/' consistency case while here. Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-01-06 08:04:03 -07:00
Edward O'Callaghan	8e2a8ec731	gallium/drivers/r600: Use unsigned for loop index Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-01-06 08:04:03 -07:00
Edward O'Callaghan	76a7d6f412	gallium/drivers/ilo: Use unsigned for loop index Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-01-06 08:04:03 -07:00
Edward O'Callaghan	5071c192cc	gallium: Use unsigned for loop index Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-01-06 08:04:03 -07:00
Edward O'Callaghan	bfabd5e74a	gallium/drivers: Remove unnecessary semicolons Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-01-06 08:04:03 -07:00
Edward O'Callaghan	67d4b4b28c	gallium: Remove unnecessary semicolons Fix silly issue with MSVC case fall-though support to need a extra 'break;' Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-01-06 08:04:03 -07:00
Oded Gabbay	9d59b9d00c	llvmpipe: Optimize lp_rast_triangle_32_3_16 for POWER8 This patch converts the SSE-optimized lp_rast_triangle_32_3_16() to VMX/VSX. I measured the results on POWER8 machine with 32 cores at 3.4GHz and 16GB of RAM. FPS/Score Name Before After Delta ------------------------------------------------ openarena 16.35 16.7 2.14% xonotic 4.707 4.97 5.57% glmark2 didn't show a significant (more than 1%) difference. v2: Make sure code is build only on POWER8 LE machine Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-01-06 14:54:16 +02:00
Oded Gabbay	925c46cfc4	llvmpipe: Optimize BUILD_MASK(_LINEAR) for POWER8 This patch converts the SSE-optimized build_mask_32() and build_mask_linear_32() to VMX/VSX. I measured the results on POWER8 machine with 32 cores at 3.4GHz and 16GB of RAM. FPS/Score Name Before After Delta ------------------------------------------------ glmark2 (score) 139.8 142.7 2.07% openarena and xonotic didn't show a significant (more than 1%) difference. v2: Make sure code is build only on POWER8 LE machine Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-01-06 14:54:16 +02:00
Oded Gabbay	3bbe16ea79	llvmpipe: Optimize do_triangle_ccw for POWER8 This patch converts the SSE optimization done in do_triangle_ccw to VMX/VSX. I measured the results on POWER8 machine with 32 cores at 3.4GHz and 16GB of RAM. FPS/Score Name Before After Delta ------------------------------------------------ glmark2 (score) 136.6 139.8 2.34% openarena 16.14 16.35 1.30% xonotic 4.655 4.707 1.11% v2: - Convert loads to use aligned loads - Make sure code is build only on POWER8 LE machine Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-01-06 14:54:16 +02:00
Oded Gabbay	e99555ef0b	llvmpipe: add POWER8 portability file - u_pwr8.h This file provides a portability layer that will make it easier to convert SSE-based functions to VMX/VSX-based functions. All the functions implemented in this file are prefixed using "vec_". Therefore, when converting from SSE-based function, one needs to simply replace the "_mm_" prefix of the SSE function being called to "vec_". Having said that, not all functions could be converted as such, due to the differences between the architectures. So, when doing such conversion hurt the performance, I preferred to implement a more ad-hoc solution. For example, converting the _mm_shuffle_epi32 needed to be done using ad-hoc masks instead of a generic function. All the functions in this file support both little-endian and big-endian but currently the file is build only on POWER8 LE machine. All of the functions are implemented using the Altivec/VMX intrinsics, except one where I needed to use inline assembly (due to missing intrinsic). v2: - Use vec_vgbbd instead of __builtin_vec_vgbbd - Add an aligned load function - Don't use typeof() - Make file build only on POWER8 LE machine Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-01-06 14:54:16 +02:00
Brian Paul	f4caa7d2fc	draw: minor indentation fix	2016-01-05 13:03:05 -07:00
Brian Paul	95d412181d	util: add debug_dump_ubyte_rgba_bmp() Like debug_dump_float_rgba_bmp() but takes ubyte values. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2016-01-05 13:03:04 -07:00
Brian Paul	eec8d7e7e0	svga: fix test for SVGA_NEW_STIPPLE We only want to set the SVGA_NEW_STIPPLE dirty flag when the polygon stipple state changes. Before, we only set the flag when we were enabling stipple, but not disabling. We don't really have to add SVGA_NEW_STIPPLE to the dirty FS state set since it's a subset of SVGA_NEW_RAST, but let's be explicit. This doesn't fix any known bugs. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2016-01-05 13:03:04 -07:00
Brian Paul	993b04ee2c	svga: add some comments in svga_state_vs.c Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2016-01-05 13:03:04 -07:00
Brian Paul	fc07658895	svga: change svga_hw_view_state::dirty to boolean Since it's a true/false value. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2016-01-05 13:03:04 -07:00
Brian Paul	077aa3be93	svga: avoid emitting redundant SetVertexBuffers() commands Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2016-01-05 13:03:04 -07:00
Brian Paul	b11bd20889	svga: check for no-ops in svga_bind_sampler_states() and svga_set_sampler_views(). If there's no change, return early and don't set a SVGA_NEW_x dirty state flag. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2016-01-05 13:03:04 -07:00
Julien Isorce	777d1453f1	build: enable st/va with nouveau driver vainfo fails in vaDriverInit because "dd_create_screen" does not reach strcmp(driver_name, "nouveau") code. Indeed when compiling the va target.c, the macro GALLIUM_NOUVEAU is not defined. This patch define the macro the same it is done for dri and vdpau targets. Tested with: ./autogen.sh --enable-glx --enable-gles2 --enable-egl --enable-vdpau --enable-glx-tls=yes --enable-va --with-gallium-drivers=swrast,nouveau --with-dri-drivers=swrast,nouveau --with-egl-platforms=x11 LIBVA_DRIVER_NAME=gallium vainfo Output: vainfo: Driver version: mesa gallium vaapi vainfo: Supported profile and entrypoints VAProfileMPEG2Simple : VAEntrypointVLD VAProfileMPEG2Main : VAEntrypointVLD VAProfileMPEG4Simple : VAEntrypointVLD VAProfileMPEG4AdvancedSimple : VAEntrypointVLD VAProfileVC1Simple : VAEntrypointVLD VAProfileVC1Main : VAEntrypointVLD VAProfileVC1Advanced : VAEntrypointVLD VAProfileH264Baseline : VAEntrypointVLD VAProfileH264Main : VAEntrypointVLD VAProfileH264High : VAEntrypointVLD VAProfileNone : VAEntrypointVideoProc Signed-off-by: Julien Isorce <j.isorce@samsung.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-05 12:07:53 -05:00
Julien Isorce	abb30b9c8b	nvc0: add support for st/va - split nvc0_decoder_bsp in begin/next/end - preserve content buffer when calling nvc0_decoder_bsp_next - implement pipe_video_codec::begin_frame/end_frame https://bugs.freedesktop.org/show_bug.cgi?id=89969 Signed-off-by: Julien Isorce <j.isorce@samsung.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-05 12:07:53 -05:00

1 2 3 4 5 ...

25705 commits