fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-02 09:30:11 +01:00

Author	SHA1	Message	Date
Kenneth Graunke	19c652b29c	i965: Use shader_info for brw_vue_prog_data::cull_distance_mask. This also allows us to move it from a GL specific location to a part of the compiler shared by both GL and Vulkan. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-19 12:30:25 -08:00
Kenneth Graunke	c447ca64c1	compiler: Store the clip/cull distance array sizes in shader_info. We switched from a boolean to array lengths in gl_program a while back. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-19 12:30:25 -08:00
Kenneth Graunke	c4be6e0b8d	i965: Fix GS push inputs with enhanced layouts. We weren't taking first_component into account when handling GS push inputs. We hardly ever push GS inputs, so this was not caught by existing tests. When I started using component qualifiers for the gl_ClipDistance arrays, glsl-1.50-transform-feedback-type-and-size started catching this. Cc: "13.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-19 12:30:25 -08:00
Kenneth Graunke	45aee6be02	i965: Delete unused variable. I forgot to delete this in `9ef2b9277d`. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>	2016-11-19 12:30:25 -08:00
Kenneth Graunke	9ef2b9277d	intel: Share URB configuration code between GL and Vulkan. This code is far too complicated to cut and paste. v2: Update the newly added genX_gpu_memcpy.c; const a few things. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-11-19 11:40:01 -08:00
Kenneth Graunke	6d416bcd84	i965: Use arrays in Gen7+ URB code. So much of this code was cut and pasted per stage. We can accomplish much of it by looping over shader stages. Improves performance of OglBatch7 (version 6) by 1.50783% +/- 0.287049% (n = 71) at 1024x768 on Cherryview. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-11-19 11:40:00 -08:00
Kenneth Graunke	6656dd4b92	i965: Drop brw->urb.{nr__entries,_start} assignments from gen7_urb.c. The context fields are for Gen4-5; setting them has always been useless. There's no point in spending the cost in the hottest path in the driver. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-11-19 11:40:00 -08:00
Kenneth Graunke	74d8612eed	i965: Switch to roundf in HS/DS URB code. Matt intentionally switched the VS calculation to be float-based in commit `c1da15709a`. Tessellation support was written before this and rebased forward, and missed the change. Now it's consistent. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-11-19 11:39:59 -08:00
Kenneth Graunke	c87b5dee11	i965: Make URB code use prog_data for GS/tessellation enable checks. If geometry/tessellation shaders are disabled, prog_data will be NULL (see brw_state_upload.c). This consolidates dirty bits a little. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-11-19 11:39:58 -08:00
Kenneth Graunke	639af2a7c6	intel: Convert devinfo->urb.min_*_entries into an array. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-11-19 11:39:56 -08:00
Kenneth Graunke	58c09e72b1	intel: Convert devinfo->urb.max_*_entries into an array. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-11-19 11:39:45 -08:00
Brian Paul	2acfd36479	docs: document MESA_DEBUG=context Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>	2016-11-19 08:44:03 -07:00
Ilia Mirkin	ea276512a0	swr: mark streamout buffers as written Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-19 10:40:37 -05:00
Timothy Arceri	203c8794a1	st/mesa/glsl/nir/i965: make use of new gl_shader_program_data in gl_shader_program Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-11-19 15:45:46 +11:00
Timothy Arceri	65cd0a0d7f	mesa: create new gl_shader_program_data struct This will be used to share data between gl_program and gl_shader_program allowing for greater code simplification as we can remove a number of awkward uses of gl_shader_program. Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-11-19 15:45:46 +11:00
Timothy Arceri	0c85d2fea4	glsl: add new program driver function to standalone compiler This fixes a regression with the standalone compiler caused by `9d96d3803a` Note that we change standalone_compiler_cleanup() to no longer explicitly free the linked shaders as the will be freed when we free the parent ctx whole_program. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98774	2016-11-19 15:00:12 +11:00
Kenneth Graunke	ff0253a5ed	i965: Disable depth writes when depth test is GL_EQUAL. There's no point in performing depth writes when the depth test comparison function is set to GL_EQUAL - it would just write out the same value that's already there (if it is written at all). While this is harmless from a functional perspective, it hurts performance. Obviously, writing to memory is not free, but there's another more subtle impact as well: it can prevent early depth optimizations. Depth writes aren't supposed to happen for pixels that are killed by fragment shader discard statements or the alpha test. So, with depth writes enabled and either of those, the pixel shader must be invoked to determine whether or not to perform the write. This is fairly stupid in the EQUAL case - we're running a shader to decide whether to replace the existing depth value with itself. By disabling these pointless writes, we allow early depth even with discards and alpha testing, allowing the hardware to skip the pixel shader altogether if the depth test fails. Improves performance of Unigine Valley: - Skylake GT2: +17.8% - Broadwell GT3e: +11.5% - Cherrytrail: +19.4% Huge thanks to Mark Janes for building frameretrace [1], the performance analysis tool that helped us find this issue, and to Robert Bragg for providing us performance metrics on Linux. Mark also spent the time to analyze Valley performance on Windows vs. Linux and discovered a discrepancy in early depth test metrics. Once he had isolated a draw call and drawn attention to the problem, fixing it was pretty simple. [1] https://github.com/janesma/apitrace/wiki/frameretrace-branch Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-18 14:48:52 -08:00
Timothy Arceri	adb3a83c09	glsl: tidy up entries temporary Here we just move initialisation of entries to where it is needed i.e. outside the loop and after the continue checks. Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-11-19 09:35:58 +11:00
Timothy Arceri	c20564ae3e	glsl/i965: move per stage AtomicBuffers list to gl_program Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-11-19 09:35:58 +11:00
Timothy Arceri	9d96d3803a	glsl: create gl_program at the start of linking rather than the end This will allow us to directly store metadata we want to retain in gl_program this metadata is currently stored in gl_linked_shader and will be lost if relinking fails even though the program will remain in use and is still valid according to the spec. "If a program object that is active for any shader stage is re-linked unsuccessfully, the link status will be set to FALSE, but any existing executables and associated state will remain part of the current rendering state until a subsequent call to UseProgram, UseProgramStages, or BindProgramPipeline removes them from use." This change will also help avoid the double handing that happens in _mesa_copy_linked_program_data(). Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-11-19 07:42:33 +11:00
Timothy Arceri	2b8f97d0ff	st/mesa/i965: simplify gl_program references and stop leaking In i965 we were calling _mesa_reference_program() after creating gl_program and then later calling it again with NULL as a param to get the refcount back down to 1. This changes things to not use _mesa_reference_program() at all and just have gl_linked_shader take ownership of gl_program since refcount starts at 1. The st and ir_to_mesa linkers were worse as they were both getting in a state were the refcount would never get to 0 and we would leak the program. Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-11-19 07:42:33 +11:00
Nanley Chery	9db5cc829f	anv/cmd_buffer: Enable stencil-only HZ clears The HZ sequence modifies less state than the blorp path and requires less CPU time to generate the necessary packets. Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-18 12:12:55 -08:00
Nanley Chery	37c07d64b4	anv/cmd_buffer: Manage Anv state around HZ op emission Move the assignment to a less surprising location. Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-18 12:12:50 -08:00
Nanley Chery	6ff4c24fdd	anv/cmd_buffer: Clarify HZ rectangle behavior This behavior differs from what's described in the PRMs and was observed by analyzing CTS test results. Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-18 12:12:34 -08:00
Nanley Chery	63318d34ac	mesa/fbobject: Update CubeMapFace when reusing textures Framebuffer attachments can be specified through FramebufferTexture* calls. Upon specifying a depth (or stencil) framebuffer attachment that internally reuses a texture, the cube map face of the new attachment would not be updated (defaulting to TEXTURE_CUBE_MAP_POSITIVE_X). Fix this issue by actually updating the CubeMapFace field. This bug manifested itself in BindFramebuffer calls performed on framebuffers whose stencil attachments internally reused a depth texture. When binding a framebuffer, we walk through the framebuffer's attachments and update each one's corresponding gl_renderbuffer. Since the framebuffer's depth and stencil attachments may share a gl_renderbuffer and the walk visits the stencil attachment after the depth attachment, the uninitialized CubeMapFace forced rendering to TEXTURE_CUBE_MAP_POSITIVE_X. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77662 Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2016-11-18 11:58:19 -08:00
Lionel Landwerlin	9a806d2d15	mesa: add NV_image_formats extension support This extension can be enabled automatically as it is a subset of ARB_shader_image_load_store. v2: Replace helper function by qualifier struct field (Ilia) Enable NV_image_formats using ARB_shader_image_load_store (Ilia) v3: Drop extension field from gl_extensions (Ilia) Release notes (Ilia) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98480 Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-11-18 13:27:28 +00:00
Timothy Arceri	88fe2c308e	mesa: fix old classic drivers to use ralloc for ARB asm programs These changes were missed in `0ad69e6b5`. Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98767	2016-11-18 23:39:40 +11:00
Nicolai Hähnle	da2a51129b	st/mesa: silence warnings in optimized builds Mark variables and static functions that only occur in assert()s as MAYBE_UNUSED. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-11-18 09:49:22 +01:00
Nicolai Hähnle	9882ed85bd	radeonsi: emit sample locations also when nr_samples == 1 Since the state tracker now enables MSAA in the hardware for the case nr_samples == 1 as well, we need to set sample locations correctly for this case. The Polaris override is still needed for the non-MSAA case (when nr_samples == 0). Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-11-18 09:48:46 +01:00
Nicolai Hähnle	70454f5b55	radeonsi: allow sample mask export for single-sample framebuffers This fixes GL45-CTS.sample_variables.mask..samples_1.. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-11-18 09:48:43 +01:00
Nicolai Hähnle	ceac3397fb	st/mesa: remove a redundant call to _mesa_is_multisample_enabled We called it immediately prior, so re-use the previously returned value. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-11-18 09:48:39 +01:00
Nicolai Hähnle	adba706122	mesa/main: consider multisampling enabled when number of samples == 1 There are some differences between how non-multisampled framebuffers (i.e. samples == 0) and multisampled framebuffers with a single sample should be treated. For example, alpha to coverage and writing to gl_SampleMask has an effect with single-sample multisample framebuffers, but not on non-multisample framebuffers. This fixes GL45-CTS.sample_variables.mask..samples_1. at least for Gallium drivers (and possibly others, though at least radeonsi needs an additional fix). Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>	2016-11-18 09:48:14 +01:00
Kenneth Graunke	14af96007f	i965: Delete fs_visitor::nir_setup_single_output_varying prototype. I deleted this function in `59864e8e02`. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>	2016-11-18 00:29:11 -08:00
Tapani Pälli	ec4e71f75e	mesa: fix empty program log length In case we have empty log (""), we should return 0. This fixes Khronos WebGL conformance test 'program-infolog'. From OpenGL ES 3.1 (and OpenGL 4.5 Core) spec: "If pname is INFO_LOG_LENGTH , the length of the info log, including a null terminator, is returned. If there is no info log, zero is returned." v2: apply same fix for get_shaderiv and _mesa_GetProgramPipelineiv (Ian) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v1) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97321 Cc: "13.0" <mesa-stable@lists.freedesktop.org>	2016-11-18 07:42:41 +02:00
Roland Scheidegger	5ec3a7333f	draw: finally optimize bool clip mask generation lp_build_any_true_range is just what we need, though it will only produce optimal code with sse41 (ptest + set) - but even without it on 64bit x86 the code is still better (1 unpack, 2 movq + or + set), on 32bit x86 it's going to be roughly the same as before. While here also make it a "real" 8bit boolean - cuts one instruction but more importantly similar to ordinary booleans. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-11-18 01:25:21 +01:00
Roland Scheidegger	b16f06fd05	draw: use vectorized calculations for fetch (v2) Instead of doing all the math with scalars, use vectors. This means the overflow math needs to be done manually, albeit that's only really problematic for the stride/index mul, the rest has been pretty much moved outside the shader loop (albeit the mul could actually be optimized away too), where things are still scalar. To eliminate control flow in the main shader loop fetch, provide fake buffers (so index 0 is always valid to fetch). Still uses aos fetch though in the end - mostly because some more code would be needed to handle unaligned fetches in that path, and because for most formats it won't make a difference anyway (we generate some truly horrendous code for things like R16G16_something for instance). Instanced fetch however stays roughly the same as before, except that no longer the same element is fetched multiple times (I've seen a reduction of ~3 times in main shader loop size due to llvm not recognizing it's all the same fetch, since it would have been possible some of the fetches getting replaced with zeros in case vector size exceeds remaining fetch count - the values of such fetches don't matter at all though). Also, for elts gathering, use vectorized code as well. The generated shaders are smaller and faster to compile (not entirely sure about execution speed, but generally unless there's just single vertices to handle I would expect it to be faster - there's more opportunities for future improvements by using soa fetch). v3: skip the fake index buffer, not needed due to the jit code never seeing the real index buffer in the first place. Fix a bug with mask expansion (needs SExt, not ZExt). Also, be really really careful to keep the behavior the same, even in cases where it looks wrong, and add comments why the code is doing the seemingly wrong stuff... Fortunately it's not actually more complex in the end... Also change function order slightly just to make the diff more readable. No piglit change. Passes some internal testing with another api too... Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-11-18 01:25:21 +01:00
Jordan Justen	0cee3fd5c7	i965/gen7: Minify blit size for stencil tree copy Found by the piglit 'fbo-depth-array stencil-clear' test when implementing blorp blit splitting for gen7. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-11-17 14:15:44 -08:00
Kenneth Graunke	9bfee7047b	mesa: Drop PATH_MAX usage. GNU/Hurd does not define PATH_MAX since it doesn't have such arbitrary limitation, so this failed to compile. Apparently glibc does not enforce PATH_MAX restrictions anyway, so it's kind of a hoax: https://www.gnu.org/software/libc/manual/html_node/Limits-for-Files.html MSVC uses a different name (_MAX_PATH) as well, which is annoying. We don't really need it. We can simply asprintf() the filenames. If the filename exceeds an OS path limit, presumably fopen() will fail, and we already check that. (We actually use ralloc_asprintf because Mesa provides that everywhere, and it doesn't look like we've provided an implementation of GNU's asprintf() for all platforms.) Fixes the build on GNU/Hurd. Cc: "13.0" <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98632 Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-11-17 14:14:37 -08:00
Kenneth Graunke	ca76e6b521	i965: Fix compute shader crash. Fixes crashes when starting Deus Ex: Mankind Divided. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2016-11-17 14:14:06 -08:00
Jason Ekstrand	3da7adc755	anv/TODO: Check off render buffer compression There's still a tiny bit of work to do for storage images but it's otherwise pretty much done at this point.	2016-11-17 12:03:24 -08:00
Jason Ekstrand	4e91f158e6	anv: Enable "permanent" compression for immutable format images This commit extends our support of color compression to surfaces without the VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT set. These images will never have an image view created with a different format then the one set at image creation time so it's safe to always use compression. We still bail if the image is used as a storage image because that sometimes ends up using a different format. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-11-17 12:03:24 -08:00
Jason Ekstrand	2b5644e94d	intel/blorp: Properly handle color compression in blorp_copy Previously, blorp copy operations were CCS-unaware so you had to perform resolves on the source and destination before performing the copy. This commit makes blorp_copy capable of handling CCS-compressed images without any resolves. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-11-17 12:03:24 -08:00
Jason Ekstrand	89f9c46a74	intel/blorp: Always use UINT formats on SKL+ Many of these UINT formats aren't available prior to Sky Lake so we used UNORM formats. Using UINT formats is a bit nicer because it guarantees we don't run into rounding issues. Also, we will need it in the next commit for handling copies with CCS enabled. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-11-17 12:03:24 -08:00
Jason Ekstrand	c8357b5d34	i965/blorp: Rework resolve handling This commit moves the handling of resolves into blorp_surf_for_miptree(). Instead of each helper doing resolves and checks itself, it simply tells blorp_surf_for_miptree which aux modes are supported by the given blorp operation and blorp_surf_for_miptree will resolve as-needed. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-11-17 12:03:24 -08:00
Jason Ekstrand	edb7f67bd9	anv/image: Add an aux_usage field for "default" aux Initially, the field is set to ISL_AUX_USAGE_NONE so this commit shouldn't bring any functional changes. Setting this field to something else will cause all sampled and storage image views to be created with AUX and blorp will start trying to respect it so set with care.	2016-11-17 12:03:24 -08:00
Jason Ekstrand	338cdc172a	anv: Add initial support for Sky Lake color compression This commit adds basic support for color compression. For the moment, color compression is only enabled within a render pass and a full resolve is done before the render pass finishes. All texturing operations still happen with CCS disabled.	2016-11-17 12:03:24 -08:00
Jason Ekstrand	e2f5880839	anv/pass: Precompute some subpass usage information	2016-11-17 12:03:24 -08:00
Jason Ekstrand	9b9fb6d212	util/vk_alloc: Add a vk_zalloc2 helper Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-11-17 12:03:24 -08:00
Jason Ekstrand	a512565b2b	anv/image: Memset all aux surfaces (not just HiZ) to 0 Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-11-17 12:03:24 -08:00
Jason Ekstrand	c3eb58664e	anv/image: Rename hiz_surface to aux_surface	2016-11-17 12:03:24 -08:00

... 109 110 111 112 113 ...

92185 commits