fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-07 20:08:06 +02:00

Author	SHA1	Message	Date
Neil Roberts	2dd76ec16e	meta: Support 16x MSAA in the multisample scaled blit shader v2: Fix the x_scale in the shader. Remove the doubts in the commit message. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2015-11-05 10:33:16 +01:00
Neil Roberts	1a22b12fc5	i965/meta: Support 16x MSAA in the meta stencil blit The destination rectangle is now drawn at 4x4 the size and the shader code to calculate the sample number is adjusted accordingly. Acked-by: Ben Widawsky <ben@bwidawsk.net>	2015-11-05 10:33:16 +01:00
Neil Roberts	a680465428	i965/fs/skl+: Fix calculating gl_SampleID for 16x MSAA In order to accomodate 16x MSAA, the starting sample pair index is now 3 bits rather than 2 on SKL+. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2015-11-05 10:33:16 +01:00
Neil Roberts	bf6bd7eaf0	i965: Support allocating the MCS buffer for 16x MSAA When 16 samples are used the MCS buffer needs 64 bits per pixel. Reviewed-by: Ben Widawsky <ben@bwidawsk.net>	2015-11-05 10:33:16 +01:00
Neil Roberts	b4c2e6054f	i965: Support calculating the bits needed to set up 16x MSAA The gen7_surface_msaa_bits function already returns the right values for 16 samples but it just needs its assert to be relaxed. Reviewed-by: Ben Widawsky <ben@bwidawsk.net>	2015-11-05 10:33:16 +01:00
Neil Roberts	1a97cac767	i965/fs: Add a sampler program key for whether the texture is 16x MSAA When 16x MSAA is used for sampling with texelFetch the compiler needs to use a different instruction which passes more arguments for the MCS data. Previously on skl+ it was unconditionally using this new instruction. However since 16x MSAA is probably going to be pretty rare, it is probably worthwhile to avoid using this instruction for the other sample counts. In order to do that this patch adds a new member to brw_sampler_prog_key_data to track when a sampler refers to a buffer with 16 samples. Note that this isn't done for the vec4 backend because it wouldn't change how many registers it uses. Acked-by: Ben Widawsky <ben@bwidawsk.net>	2015-11-05 10:33:16 +01:00
Neil Roberts	4ef27745c8	i965/vec4/skl+: Use ld2dms_w instead of ld2dms In order to support 16x MSAA, skl+ has a wider version of ld2dms that takes two parameters for the MCS data. The MCS data in the response still fits in a single register so we just need to ensure we copy both values rather than just the lower one. Acked-by: Ben Widawsky <ben@bwidawsk.net>	2015-11-05 10:33:16 +01:00
Neil Roberts	e386fb0dee	i965/fs/skl+: Use ld2dms_w instead of ld2dms In order to support 16x MSAA, skl+ has a wider version of ld2dms that takes two parameters for the MCS data. The MCS data retrieved from the ld_mcs instruction already returns 4 or 8 registers and is documented to return zeroes for the mcsh value when the sample count is less than 16. v2: Use get_lowered_simd_width to fall back to SIMD8 instructions when the message length would be too long in SIMD16. Reviewed-by: Ben Widawsky <ben@bwidawsk.net>	2015-11-05 10:33:16 +01:00
Neil Roberts	20250e854e	i965: Program 16x MSAA sample positions. This is the standard pattern used by the other 3D graphics API. BDW has slots for these values, but they aren't actually used until SKL. Even though the documentation for BDW says they must be zero, it doesn't seem to cause any harm to program them anyway. The comment above for the 8x sample positions says that the hardware implements centroid interpolation by picking the centre-most sample that is inside the primitive. That implies that it might be worthwhile to pick a pattern that includes 0.5,0.5. However by experimentation this doesn't seem to actually be the case. With the sample positions in this patch, if I modify the piglit test below so that it instead reports the centroid position, it reports 0.492188,0.421875 which doesn't match any of the positions. If I modify the sample positions so that they include one at exactly 0.5,0.5 it doesn't help and it reports another position which is even further from the center for some reason. arb_gpu_shader5-interpolateAtSample-different Kenneth Graunke experimented with some other patterns that have a higher standard deviation but I think after some discussion it was decided that it would be better to pick the same pattern as the other graphics API in case there are games that rely on this pattern. (Based on a patch by Kenneth Graunke) Cc: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ben Widawsky <ben at bwidawsk.net>	2015-11-05 10:33:15 +01:00
Kenneth Graunke	5048da974e	i965: Handle 16x MSAA in IMS dimension munging code. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Neil Roberts <neil@linux.intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>	2015-11-05 10:33:15 +01:00
Kenneth Graunke	b9f8e729c8	nir: Rename nir_live_variables.c to nir_liveness.c. It doesn't actually operate on variables. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-11-05 00:09:40 -08:00
Kenneth Graunke	5c6f21579d	nir: Rename live_variables to live_ssa_defs. This computes liveness of SSA values, not nir_variables. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-11-05 00:09:40 -08:00
Alejandro Piñeiro	56774e6302	i965/vec4: select predicate based on writemask for sel emissions Equivalent to commit `8ac3b525c` but with sel operations. In this case we select the PredCtrl based on the writemask. This patch helps on cases like this: 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D 3: (+f0.0) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD In this case, cmod propagation can't optimize instruction #2, because instructions #1 and #2 have different writemasks, and we can't update directly instruction #2 writemask because our code thinks that sel at instruction #3 reads all four channels of the flag, when it actually only reads .x. So, with this patch, the previous case becames this: 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD Now only the x channel of the flag is used, allowing dead code eliminate to update the writemask at the second instruction: 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F 2: cmp.nz.f0.0 null.x:D, vgrf40.xxxx:D, 0D 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD So now cmod propagation can simplify out #2: 1: cmp.l.f0.0 vgrf40.0.x:F, attr18.wwww:F, vgrf7.xxxx:F 2: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD Shader-db numbers: total instructions in shared programs: 6235835 -> 6228008 (-0.13%) instructions in affected programs: 219850 -> 212023 (-3.56%) total loops in shared programs: 1979 -> 1979 (0.00%) helped: 1192 HURT: 0	2015-11-05 08:57:23 +01:00
Ilia Mirkin	bb73fc4cb8	nouveau: relax fence emit space assert We also have the "reserved for kick" space available. Some of my earlier changes can probably be removed, but this is a quick fix for some of the rarer fallout. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: <mesa-stable@lists.freedesktop.org>	2015-11-04 22:43:56 -05:00
Eric Anholt	6d3a24bce8	vc4: When the create ioctl fails, free our cache and try again. This greatly increases the pressure you can put on the driver before create fails. Ultimately we need to let the kernel take control of our cached BOs and just take them from us (and other clients) directly, but this is a very easy patch for the moment. Cc: "11.0" <mesa-stable@lists.freedesktop.org>	2015-11-04 14:04:14 -08:00
Eric Anholt	3f7c96c36c	vc4: Print the rounded shader size in debug output. It's surprising to see "0kb" printed for debug on short shaders, while 4kb alignment won't be suprising.	2015-11-04 13:32:07 -08:00
Eric Anholt	4a951f1c08	vc4: Fix dumping the size of BOs allocated/cached. 60MB of cached BOs are a lot less scary than 600MB.	2015-11-04 13:32:07 -08:00
Ilia Mirkin	5bbd522452	mesa/tests: add glBufferStorageEXT to ES 3.1 dispatch list I thought that aliased functions didn't need to be added, but that might only be if the function aliases something in the same {desktop,ES} space. Resolves the dispatch sanity test failure. Fixes: `13b19aa81` (mesa: expose support for GL_EXT_buffer_storage) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92824 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-04 14:28:57 -05:00
Brian Paul	bdf6cef033	vbo: fix another GL_LINE_LOOP bug Very long line loops which spanned 3 or more vertex buffers were not handled correctly and could result in stray lines. The piglit lineloop test draws 10000 vertices by default, and is not long enough to trigger this. Even 'lineloop -count 100000' doesn't trigger the bug. For future reference, the issue can be reproduced by changing Mesa's VBO_VERT_BUFFER_SIZE to 4096 and changing the piglit lineloop test to use glVertex2f(), draw 3 loops instead of 1, and specifying -count 1023. Acked-by: Sinclair Yeh <syeh@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2015-11-04 11:51:59 -07:00
Brian Paul	d31481e70a	svga: implement 'white_fragments' option for VGPU10 fragment shaders When we emulate XOR logicop mode with blend-subtract, we need to ensure that the fragment shader always emits white. We had this implemented for VGPU9, but not VGPU10. VMware bug 1545492. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2015-11-04 11:51:41 -07:00
Brian Paul	149ac1fe43	u_vbuf: minor code reformatting / line wrapping Trivial.	2015-11-04 11:51:41 -07:00
Brian Paul	e450d4371a	u_vbuf: add some const qualifiers Trivial.	2015-11-04 11:51:40 -07:00
Brian Paul	3f98c812b3	svga: use new enum indices_mode type Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2015-11-04 11:51:40 -07:00
Brian Paul	fa6efbd27d	util/indices: replace #define tokens with enum type To ease debugging in gdb. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2015-11-04 11:51:40 -07:00
Alejandro Piñeiro	c3d7caa1e0	i965: check inst->predicate when clearing flag_live at dead code eliminate Detected by Matt Turner while reviewing commit `a59359ecd2` Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-11-04 19:33:56 +01:00
Roland Scheidegger	c19443bc8b	gallivm: fix sampling for s3tc srgb formats when using texture cache This actually stored the values as 8bit linear values in the cache, then did another srgb->linear conversion... We don't want to do the former (decoding 8bit srgb values to 8bit linear completely defeats the purpose of srgb in the first place), so just decode to 8bit srgb. Fixes piglit.spec.ext_texture_srgb.texwrap formats-s3tc tests.	2015-11-04 14:21:43 +01:00
Ben Widawsky	d56a1478a8	i965/meta: Assert fast clears and rep clears never overlap There is nothing wrong with the code today, but as one modifies the code it turns out to be not too difficult to mess up the code, and this easy assertion should catch such driver implementation failures quickly. Cc: Kristian Høgsberg <krh@bitplanet.net> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chad Versace <chad.versace@intel.com> Reviewed-by: Neil Roberts <neil@linux.intel.com>	2015-11-03 21:54:11 -08:00
Ryan Houdek	13b19aa815	mesa: expose support for GL_EXT_buffer_storage This extension requires ES 3.1 since it relies on glMemoryBarrier. For testing purposes I temporarily moved glMemoryBarrier to be an ES 3.0 function. This has been tested with the piglit in the ML and the Dolphin emulator. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-11-04 00:01:03 -05:00
Timothy Arceri	8e4cf900f0	glsl: make sure to only add subroutines to resource list Over looked in `763cd8c080`. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2015-11-04 15:43:12 +11:00
Timothy Arceri	f6b3c163f9	glsl: remove old TODO SSBO support now exists as of commits f24e5e and `f408a13dd3`. Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Acked-by: Matt Turner <mattst88@gmail.com>	2015-11-04 15:40:38 +11:00
Timothy Arceri	6e3b380387	docs: Mark AoA as done for i965 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-11-04 13:41:16 +11:00
Timothy Arceri	5b75dbd7be	i965: enable ARB_arrays_of_arrays Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2015-11-04 13:39:08 +11:00
Timothy Arceri	fb77da89f5	i965: add support for image AoA V3: clamp array index to the correct size (the size of the current array rather than the inner array) Francisco Jerez. V2: avoid useless zero-initialization and addition for the first AoA level, avoid redundant temporary, make use of type_size_scalar(), rename aoa_size to element_size, assign the indirect indexing temporary directly to image.reladdr, and replace while loop with a for loop. All suggested by Francisco Jerez. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2015-11-04 13:38:32 +11:00
Roland Scheidegger	9285ed98f7	llvmpipe: add cache for compressed textures compressed textures are very slow because decoding is rather complex (and because there's no jit code code to decode them too for non-technical reasons). Thus, add some texture cache which holds a couple of decoded blocks. Right now this handles only s3tc format albeit it could be extended to work with other formats rather trivially as long as the result of decode fits into 32bit per texel (ideally, rgtc actually would decode to more than 8 bits per channel, but even then making it work for it shouldn't be too difficult). This can improve performance noticeably but don't expect wonders (uncompressed is unsurprisingly still faster). It's also possible it might be slower in some cases (using nearest filtering for example or if there's otherwise not many cache hits, the cache is only direct mapped which isn't great). Also, actual decode of a block relies on util code, thus even though always full blocks are decoded it is done texel by texel - this could obviously benefit greatly from simd-optimized code decoding full blocks at once... Note the cache is per (raster) thread, and currently only used for fragment shaders. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-11-04 02:51:02 +01:00
Oded Gabbay	39b4dfe6ab	llvmpipe: use simple coeffs calc for 128bit vectors There are currently two methods in llvmpipe code to calculate coeffs to be used as inputs for the fragment shader. The two methods use slightly different ways to do the floating point calculations and thus produce slightly different results. The decision which method to use is determined by the size of the vector that is used by the platform. For vectors with size of more than 128bit, a single-step method is used, in which coeffs_init_simple() + attribs_update_simple() are called. For vectors with size of 128bit or less, a two-step method is used, in which coeffs_init() + attribs_update() are called. This causes some piglit tests (clip-distance-bulk-copy, interface-vs-unnamed-to-fs-unnamed) to fail when using platforms with 128bit vectors (such as ppc64le or x86-64 without AVX). This patch makes platforms with 128bit vectors use the single-step method (aka "simple" method) instead of the two-step method. This would make the resulting coeffs identical between more platforms, make sure the piglit tests passes, and make debugging and maintainability a bit easier as the generated LLVM IR will be the same for more platforms. The performance impact is negligible for x86-64 without AVX, and basically non-existent for ppc64le, as it can be seen from the following benchmarking results: - glxspheres, on ppc64le: - original code: 4.892745317 frames/sec 5.460303857 Mpixels/sec - with the patch: 4.932083873 frames/sec 5.504205571 Mpixels/sec - Additional 0.8% performance boost - glxspheres, on x86-64 without AVX: - original code: 20.16418809 frames/sec 22.50323395 Mpixels/sec - with the patch: 20.31328989 frames/sec 22.66963152 Mpixels/sec - Additional 0.74% performance boost - glmark2, on ppc64le: - original code: score of 58 - with my change: score of 57 - glmark2, on x86-64 without AVX: - original code: score of 175 - with the patch: score of 167 - Impact of of -4.5% on performance - OpenArena, on ppc64le: - original code: 3398 frames 1719.0 seconds 2.0 fps 255.0/505.9/2773.0/0.0 ms - with the patch: 3398 frames 1690.4 seconds 2.0 fps 241.0/497.5/2563.0/0.2 ms - 29 seconds faster with the patch, which is about 2% - OpenArena, on x86-64 without AVX: - original code: 3398 frames 239.6 seconds 14.2 fps 38.0/70.5/719.0/14.6 ms - with the patch: 3398 frames 244.4 seconds 13.9 fps 38.0/71.9/697.0/14.3 ms - 0.3 fps slower with the patch (about 2%) Additional details can be found at: http://lists.freedesktop.org/archives/mesa-dev/2015-October/098635.html Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2015-11-04 02:38:53 +01:00
Kenneth Graunke	59bbe2681b	nir: Properly invalidate metadata in nir_opt_remove_phis(). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2015-11-03 17:06:48 -08:00
Kenneth Graunke	bc3942e297	nir: Properly invalidate metadata in nir_lower_vec_to_movs(). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2015-11-03 17:06:48 -08:00
Kenneth Graunke	0f037bd71f	nir: Properly invalidate metadata in nir_opt_copy_prop(). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2015-11-03 17:06:48 -08:00
Kenneth Graunke	4cb7546066	nir: Properly invalidate metadata in nir_remove_dead_variables(). v2: Preserve live_variables too (Jason). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>	2015-11-03 17:06:48 -08:00
Kenneth Graunke	8bb44510fc	nir: Properly invalidate metadata in nir_split_var_copies(). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2015-11-03 17:06:48 -08:00
Kenneth Graunke	aea40091f0	nir: Properly invalidate metadata in nir_lower_global_vars_to_local(). v2: Preserve nir_metadata_live_variables as well (caught by Jason). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>	2015-11-03 17:06:48 -08:00
Jason Ekstrand	531be601d5	nir: Unexpose _impl versions of copy_prop and dce Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-11-03 17:06:48 -08:00
Jordan Justen	4bc16ad217	mesa: rename UniformBlockStageIndex to InterfaceBlockStageIndex Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: Iago Toral <itoral@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>	2015-11-03 16:44:22 -08:00
Matt Turner	cf3121ed18	i965/vec4: Send from GRF in atomic operations. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-11-03 16:38:36 -08:00
Marek Olšák	3b37155a68	gallium/radeon: allow returning SDMA fences from pipe->flush pipe->flush never returned SDMA fences. This fixes it. This is only an issue on amdgpu where fences can signal out of order. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-11-04 00:43:14 +01:00
Marek Olšák	7f9122c968	gallium/radeon: always return the last SDMA fence on SDMA flush if needed Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-11-04 00:43:14 +01:00
Kenneth Graunke	36fd653817	i965: Add scalar geometry shader support. This is hidden behind INTEL_SCALAR_GS=1 for now, as we don't yet support instanced geometry shaders, and Orbital Explorer's shader spills like crazy. But the infrastructure is in place, and it's largely working. v2: Lots of rebasing. v3: (feedback from Kristian Høgsberg) - Handle stride and subreg_offset correctly for ATTRs; use a helper. - Fix missing emit_shader_time_end() call. - Delete dead code after early EOT in static vertex case to avoid tripping asserts in emit_shader_time_end(). - Use proper D/UD type in intexp2(). - Fix "EndPrimitve" and "to that" typos. - Assert that invocations == 1 so we know this is missing. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-11-03 15:08:49 -08:00
Kenneth Graunke	c9541a74e4	i965: Add scalar GS input lowering code. We really ought to compute the VUE map at link time and stash it, rather than recomputing it here, but with the mess of program structures I wasn't sure where to put it. We can improve that later. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-11-03 15:08:49 -08:00
Kenneth Graunke	4861835d1c	i965: Fix the fs_visitor GS constructor to take shader_time_index. Jason reworked this so it isn't simply ST_GS anymore...it's either -1 (not enabled) or an actual offset. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2015-11-03 15:08:49 -08:00
Ben Widawsky	5d4b019d2a	i965/gen8+: Extract color clear surface state On future generation platforms the color clear value is stored elsewhere in the surface state. By extracting this logic, we can cleanly implement the difference in an upcoming patch. Should have no functional impact. v2: Move hunk from the next patch into this patch (Matt) Whitespace fix (Ben) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Neil Roberts <neil@linux.intel.com>	2015-11-03 13:49:21 -08:00

1 2 3 4 5 ...

74057 commits