fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-31 07:00:27 +01:00

Author	SHA1	Message	Date
Francisco Jerez	5759eb458b	i965: Factor out isl_surf_dim/isl_dim_layout calculation into functions. The logic to calculate the right layout and dimensionality for a given GL texture target is going to be useful elsewhere, factor it out from intel_miptree_get_isl_surf(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-25 18:36:07 -07:00
Francisco Jerez	99fb167839	i965: Resolve color for non-coherent FB fetch at UpdateState time. This is required because the sampler unit used to fetch from the framebuffer is unable to interpret non-color-compressed fast-cleared single-sample texture data. Roughly the same limitation applies for surfaces bound to texture or image units, but unlike texture sampling, non-coherent framebuffer fetch is by definition non-coherent with previous rendering, so the brw_render_cache_set_check_flush() call can be omitted except after resolve. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-25 18:36:07 -07:00
Francisco Jerez	071665c161	i965: Return whether the miptree was resolved from intel_miptree_resolve_color(). This will allow optimizing out the cache flush in some cases when resolving wasn't necessary. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-25 18:36:07 -07:00
Francisco Jerez	f24e393bd5	i965/fs: Translate nir_intrinsic_load_output on a fragment output. This gets the non-coherent framebuffer fetch path hooked up to the NIR front-end. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-25 18:36:07 -07:00
Francisco Jerez	b00a236d6a	i965/fs: Allocate fragment output temporaries on demand. This gets rid of the duplication of logic between nir_setup_outputs() and get_frag_output() by allocating fragment output temporaries lazily whenever get_frag_output() is called. This makes nir_setup_outputs() a no-op for the fragment shader stage. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-25 18:36:06 -07:00
Francisco Jerez	7dac882073	i965/fs: Rework representation of fragment output locations in NIR. The problem with the current approach is that driver output locations are represented as a linear offset within the nir_outputs array, which makes it rather difficult for the back-end to figure out what color output and index some nir_intrinsic_load/store_output was meant for, because the offset of a given output within the nir_output array is dependent on the type and size of all previously allocated outputs. Instead this defines the driver location of an output to be the pair formed by its GLSL-assigned location and index (I've borrowed the bitfield macros from brw_defines.h in order to represent the pair of integers as a single scalar value that can be assigned to nir_variable_data::driver_location). nir_assign_var_locations is no longer useful for fragment outputs. Because fragment outputs are now allocated independently rather than within the nir_outputs array, the get_frag_output() helper becomes necessary in order to obtain the right temporary register for a given location-index pair. The type_size helper passed to nir_lower_io is now type_size_dvec4 rather than type_size_vec4_times_4 so that output array offsets are provided in terms of whole array elements rather than in terms of scalar components (dvec4 is the largest vector type supported by the GLSL so this will cause all individual fragment outputs to have a size of one regardless of the type). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-25 18:36:06 -07:00
Francisco Jerez	4e990b67ce	i965: Fix undefined signed overflow in INTEL_MASK for bitfields of 31 bits. Most likely we had only ever used this macro on bitfields of less than 31 bits -- That's going to change shortly. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-25 18:36:06 -07:00
Francisco Jerez	f3cb2c34f2	i965/fs: Special-case nir_intrinsic_store_output for the fragment shader. I'm about to change how fragment shader output locations are represented, so the generic nir_intrinsic_store_output implementation that assumes that outputs are just contiguous elements in the big nir_outputs array won't work anymore. This somewhat simplified implementation of nir_intrinsic_store_output for fragment shaders should be functionally equivalent to the current fall-back one. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-25 18:36:06 -07:00
Francisco Jerez	af0cc743e6	i965/fs: Implement non-coherent framebuffer fetch using the sampler unit. v2: Memoize sample ID, misc codestyle changes. (Ken) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-25 18:36:06 -07:00
Francisco Jerez	fe6abb5755	i965/fs: Emit interpolation setup if non-coherent framebuffer fetch is in use. This will be required for the next commit since the non-coherent path makes use of the fragment coordinates implicitly, so they need to be calculated. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-25 18:36:06 -07:00
Francisco Jerez	98d61ee083	i965/fs: Force per-sample dispatch if the shader reads from a multisample FBO. The result of a framebuffer fetch from a multisample FBO is inherently per-sample, so the spec requires at least those sections of the shader that depend on the framebuffer fetch result to be executed once per sample. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-25 18:36:06 -07:00
Francisco Jerez	08705badfe	i965: Allocate space in the binding table for non-coherent FB fetch. Unfortunately due to the inconsistent meaning of some surface state structure fields, we cannot re-use the same binding table entries for sampling from and rendering into the same set of render buffers, so we need to allocate a separate binding table block specifically for render target reads if the non-coherent path is in use. The slight noise is due to the change of brw_assign_common_binding_table_offsets to return the next available binding table index rather than void. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-25 18:36:06 -07:00
Francisco Jerez	40b23ad57e	i965/fs: Add brw_wm_prog_key bit specifying whether FB reads should be coherent. Some of the following changes in this series are specific to the non-coherent path, so I need some way to tell whether the coherent or non-coherent path is in use. The flag defaults to the value of the gl_extensions::MESA_shader_framebuffer_fetch enable so that it can be overridden easily on hardware that supports both framebuffer fetch extensions in order to test the non-coherent path, like: MESA_EXTENSION_OVERRIDE=-GL_EXT_shader_framebuffer_fetch (Of course trying to force-enable the coherent framebuffer fetch extension on hardware without native support won't work and lead to assertion failures). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-25 18:36:06 -07:00
Francisco Jerez	4a87e4ade7	i965/fs: Get rid of fs_visitor::do_dual_src. This boolean flag was being used for two different things: - To set the brw_wm_prog_data::dual_src_blend flag. Instead we can just set it based on whether the dual_src_output register is valid, which will be the case if the shader writes the secondary blending color. - To decide whether to call emit_single_fb_write() once, or in a loop that would iterate only once, which seems pretty useless. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-25 18:36:00 -07:00
Francisco Jerez	aee3d8f0d9	nir: Handle FB fetch outputs correctly in nir_lower_io_to_temporaries. This requires emitting a series of copies at the top of the program from each output variable to the corresponding temporary. The initial copy can be skipped for non-framebuffer fetch outputs whose initial value is undefined, and the final copy needs to be skipped for read-only outputs (i.e. gl_LastFragData), since it would be illegal to emit a store output intrinsic for it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-25 18:33:29 -07:00
Francisco Jerez	97ac3eba58	nir: Pass through fb_fetch_output and OutputsRead from GLSL IR. The NIR representation of framebuffer fetch is the same as the GLSL IR's until interface variables are lowered away, at which point it will be translated to load output intrinsics. The GLSL-to-NIR pass just needs to copy the bits over to the NIR program. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-25 18:33:29 -07:00
Eric Anholt	00c72acba5	vc4: Add support for fddx/fddy Based vaguely on a patch by jonasarrow on github.	2016-08-25 17:24:11 -07:00
Eric Anholt	e763e19808	vc4: Add register allocation support for MUL output rotation. We need the source to be in r0-r3, so make a new register class for it. It will be up to the surrounding passes to make sure that the r0-r3 allocation of its source won't conflict with anything other class requirements on that temp.	2016-08-25 17:24:11 -07:00
Eric Anholt	8ce6526178	vc4: Add support for MUL output rotation. Extracted from a patch by jonasarrow on github.	2016-08-25 17:24:11 -07:00
Eric Anholt	074f1f3c0c	vc4: Add support for the 2-bit LOAD_IMM variants. Extracted and fixed up from a patch by jonasarrow on github. This ended up not getting used for ddx/ddy, but seems like it might still be useful.	2016-08-25 17:24:11 -07:00
Eric Anholt	3da4e38f48	vc4: Add QPU scheduling to handle MUL rotate sources. We need MUL rotates to do ddx/ddy support.	2016-08-25 17:24:11 -07:00
Eric Anholt	b0b99a7952	vc4: Add disassembly for constant MUL rotates	2016-08-25 17:24:11 -07:00
Eric Anholt	b160708e03	vc4: Add real validation for MUL rotation. Caught problems in the upcoming DDX/DDY implementation.	2016-08-25 17:24:11 -07:00
Eric Anholt	31da39ddc9	vc4: Add a QIR value for the QPU element register. This will be used in the ddx/ddy support for "Am I the top half?" or "Am I the left half?" checks.	2016-08-25 17:24:11 -07:00
Chad Versace	5b03975889	i965: Respect miptree offsets in intel_readpixels_tiled_memcpy() Respect intel_miptree_slice::x_offset,y_offset and intel_mipmap_tree::offset. All three may be non-zero when glReadPixels is called on an EGLImage created from the non-base slice of a miptree. Patch 2/2 that fixes test 'dEQP-EGL.functional.image.create.gles2_cubemap_*'. Reported-by: Haixia Shi <hshi@chromium.org> Diagnosed-by: Haixia Shi <hshi@chromium.org> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Change-Id: I4b397b27e55a743a7094d29fb0a6a4b6b34352b0	2016-08-25 16:52:00 -07:00
Chad Versace	c82f99e883	i965: Fix miptree layout for EGLImage-based renderbuffers When glEGLImageTargetRenderbufferStorageOES() was given an EGLImage created from the non-base slice of a miptree, intel_image_target_renderbuffer_storage() forgot to apply the intra-tile offsets __DRIimage::tile_x,tile_y to the miptree layout. This patch fixes the problem with a quick hack suitable for cherry-picking. A proper fix requires more thorough plumbing in intel_miptree_create_layout() and brw_tex_layout(). Patch 1/2 that fixes test 'dEQP-EGL.functional.image.create.gles2_cubemap_*'. Reported-by: Haixia Shi <hshi@chromium.org> Diagnosed-by: Haixia Shi <hshi@chromium.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: mesa-stable@lists.freedesktop.org Change-Id: I8a64b0048a1ee9e714ebb3f33fffd8334036450b	2016-08-25 16:52:00 -07:00
Jason Ekstrand	bebc1a1d99	intel: Flatten the makefile structure This pulls isl and genxml into a single make file so that they can properly build in parallel. This isn't terribly important now as genxml just generates sources which happens serially first anyway but it will be more important as we add more stuff to src/intel. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-08-25 15:29:48 -07:00
Jason Ekstrand	c19fc5e019	isl/tests: Use a longer path for isl.h The tests assumed that isl would be in the include path but that usually isn't the case. Instead, we usually have src/intel and you need to add an "isl/" prefix. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-08-25 15:29:47 -07:00
Jason Ekstrand	8bdf605214	intel/isl/gen9: Only use the magic 1D alignment for GEN9_1D surfaces If the surface has a layout of GEN4_2D then we need to compute a normal 2D alignment and not use the magic linewar 1D alignment. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2016-08-25 14:11:15 -07:00
Jason Ekstrand	cda1a5dc0e	intel/isl: Pass the dim_layout into choose_alignment_el Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2016-08-25 14:10:43 -07:00
Jason Ekstrand	f68cfb05fa	intel/isl: Use DIM_LAYOUT_GEN4_2D for tiled 1-D surfaces on SKL The Sky Lake 1D layout is only used if the surface is linear. For tiled surfaces such as depth and stencil the old gen4 2D layout is used. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2016-08-25 14:09:44 -07:00
Jason Ekstrand	78715c7211	nir/phi_builder: Don't recurse in value_get_block_def In some programs, we can have very deep dominance trees and the recursion can cause us to risk stack overflows. Instead, we replace the recursion with a pair of loops, one at the start and one at the end. This is functionally equivalent to what we had before and it's actually a bit easier to read in the new form without the recursion. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225 Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-08-25 14:08:07 -07:00
Matt Turner	e53130cc27	nir: Walk blocks in source code order in lower_vars_to_ssa. Prior to this commit rename_variables_block() is recursively called, performing a depth-first traversal of the control flow graph. The function uses a non-trivial amount of stack space for local variables, which puts us in danger of smashing the stack, given a sufficiently deep dominance tree. XCOM: Enemy Within contains a shader with such a dominance tree (1574 nir_blocks in total, depth of at least 143). Jason tells me that he believes that any walk over the nir_blocks that respects dominance is sufficient (a DFS might have been necessary prior to the introduction of nir_phi_builder). In fact, the introduction of nir_phi_builder made the problem worse: rename_variables_block(), walks to the bottom of the dominance tree before calling nir_phi_builder_value_get_block_def() which walks back to the top of the dominance tree... In any case, this patch ensures we avoid that problem as well. Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225 Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-08-25 13:45:39 -07:00
Marek Olšák	a491b9e945	radeonsi: don't use allocas for arrays with LLVM 3.8 It crashes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97413	2016-08-25 21:19:17 +02:00
Marek Olšák	fe91ae06d3	gallium/radeon: unify and simplify checking for an empty gfx IB We can take advantage of the fact that multi_fence does the obvious thing with NULL fences. This fixes unflushed fences that can get stuck due to empty IBs.	2016-08-25 21:19:17 +02:00
Kenneth Graunke	6cf8708ce5	meta: Always do GenerateMipmaps in linear colorspace. When generating mipmaps for sRGB textures, force both decode and encode, so the filtering is done in linear colorspace, regardless of settings. Fixes a WebGL conformance test in Chrome: https://www.khronos.org/registry/webgl/sdk/tests/conformance2/textures/misc/tex-srgb-mipmap.html?webglVersion=2 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97322 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-08-25 11:07:01 -07:00
Brian Paul	2a2dc416b6	swrast: fix incorrectly positioned putImage() in swrast driver Some front buffer rendering was in the wrong position. This included scissored clears, glDrawPixels and glCopyPixels. The problem was the y coordinate passed to putImage() didn't match the y coordinate passed to getImage(). We fix this by setting xrb->map_y to the inverted coordinate in swrast_map_renderbuffer() which is used later by the putImage() call. Also pass xrb->map_y to getImage() to be symmetric. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97426 Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2016-08-25 07:19:35 -06:00
Marek Olšák	3ff0b67e1b	radeonsi: disable SDMA texture copying on Carrizo Cc: 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2016-08-25 14:51:08 +02:00
Marek Olšák	1276316d67	gallium/noop: use 3-space indentation Reviewed-by: Brian Paul <brianp@vmware.com>	2016-08-25 14:09:48 +02:00
Marek Olšák	9daaa6f5a6	gallium: add a pipe_context parameter to resource_get_handle radeonsi needs to do some operations (DCC decompression) for OpenGL-OpenCL interop and this is the only way to make it coherent with the current context. It can optionally be set to NULL. Reviewed-by: Brian Paul <brianp@vmware.com>	2016-08-25 14:09:48 +02:00
Nicolai Hähnle	b662c70aea	st/mesa: fix sRGB BlitFramebuffer regression Broken since: `3190c7ee97` Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97285 Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2016-08-25 13:21:05 +02:00
Michel Dänzer	1e3218bc5b	loader/dri3: Overhaul dri3_update_num_back Always use 3 buffers when flipping. With only 2 buffers, we have to wait for a flip to complete (which takes non-0 time even with asynchronous flips) before we can start working on the next frame. We were previously only using 2 buffers for flipping if the X server supports asynchronous flips, even when we're not using asynchronous flips. This could result in bad performance (the referenced bug report is an extreme case, where the inter-frame stalls were preventing the GPU from reaching its maximum clocks). I couldn't measure any performance boost using 4 buffers with flipping. Performance actually seemed to go down slightly, but that might have been just noise. Without flipping, a single back buffer is enough for swap interval 0, but we need to use 2 back buffers when the swap interval is non-0, otherwise we have to wait for the swap interval to pass before we can start working on the next frame. This condition was previously reversed. Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97260 Reviewed-by: Frank Binns <frank.binns@imgtec.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2016-08-25 17:40:24 +09:00
Jason Ekstrand	2301705dee	anv: Include the pipeline layout in the shader hash The pipeline layout affects shader compilation because it is what determines binding table locations as well as whether or not a particular buffer has dynamic offsets. Since this affects the generated shader, it needs to be in the hash. This fixes a bunch of CTS tests now that the CTS is using a pipeline cache. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Cc: "12.0" <mesa-stable@lists.freedesktop.org>	2016-08-24 20:42:05 -07:00
Jason Ekstrand	05f36435ef	anv: Add a --disable-vulkan-icd-full-driver-path option This option makes installed Vulkan ICD files contain only a driver library name and not a path. This is intended for distros to help them work around multi-arch issues. Reviewed-by: Dave Airlie <airlied@redhat.com>	2016-08-25 10:32:31 +10:00
Francisco Jerez	c8f5bd2c99	i965/fs: Don't consider the stencil output to be a color output. This would cause gl_FragStencilRef to be counted as a color output incorrectly during the precompile phase, which leads to unnecessary recompilation on master and could trigger an assertion failure in fs_visitor::emit_fb_writes() on my i965-fb-fetch branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-24 13:28:31 -07:00
Francisco Jerez	2018371692	glsl: Keep track of the set of fragment outputs read by a GL program. This is the set of shader outputs whose initial value is provided to the shader by some external means when the shader is executed, rather than computed by the shader itself. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-24 13:28:31 -07:00
Francisco Jerez	711213fb72	glsl: Don't consider read-only fragment outputs to be written to. Since they cannot be written. This prevents adding fragment outputs to the OutputsWritten set that are only read from via the gl_LastFragData array but never written to. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-24 13:28:31 -07:00
Francisco Jerez	913ae618c6	glsl/linker: Allow fragment output overlap for gl_LastFragData. gl_LastFragData overlaps gl_FragData by definition. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-24 13:28:31 -07:00
Francisco Jerez	6b3d23dcc0	glsl/ast: Allow redeclaration of gl_LastFragData with different precision qualifier. v2: No need to check the GLSL version. (Ken) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-24 13:28:31 -07:00
Francisco Jerez	5e1d34394e	glsl: Don't attempt to do dead varying elimination on gl_LastFragData arrays. Apparently this pass can only handle elimination of a single built-in fragment output array, so the presence of gl_LastFragData (which it wouldn't split correctly anyway) could prevent it from splitting the actual gl_FragData array. Just match gl_FragData by name since it's the only built-in it can handle. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-24 13:28:31 -07:00

... 64 65 66 67 68 ...

80360 commits