fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-02-07 05:50:29 +01:00

Author	SHA1	Message	Date
Sagar Ghuge	933c44bcc4	nir: Add a new lowering option to lower 3D surfaces from txd to txl. Tested on gen9. v2: Rename lower_txd_3d_surafaces flag to lower_txd_3d (Jason Ekstrand) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-18 13:44:09 -08:00
Christian Gmeiner	7ea8e54dd6	meson: add etnaviv to the tools option Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-18 21:50:58 +01:00
Adam Jackson	e36d136102	specs: Bump GLX_MESA_query_renderer to version 9 Note that we have an official GL extension number, pick the appropriate section of the GLX spec to modify, and add changelog. Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Adam Jackson <ajax@redhat.com>	2018-12-18 15:46:10 -05:00
Adam Jackson	9e8332ebc2	specs: Remove GLX_RENDERER_ID_MESA from GLX_MESA_query_renderer This has not even had an attempt at implementation. If you asked for renderer 0 - which, the spec implies, should always work - then dri2_convert_glx_attribs would fail, we'd silently fall back to creating an indirect context, and xserver would also not recognize the attribute and would throw BadValue at you. The API would be difficult to use in any case, since there's no way to enumerate how many renderers the screen has. I'd be tempted to add that by defining: glXQueryRendererIntegerMESA(dpy, screen, /* renderer = */ -1, 0, &value); to return the number of renderers, but a new entrypoint might be cleaner. Still, better to not specify it at all than to lie about it. Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Adam Jackson <ajax@redhat.com>	2018-12-18 15:46:10 -05:00
Adam Jackson	c63c391756	specs: Remove GLES profile interaction text from GLX_MESA_query_renderer In one place we say, if GLES isn't supported then the profile version will be 0.0. Then later we say, if the GLES profile extension isn't supported then GLX_RENDERER_OPENGL_ES_PROFILE_VERSION_MESA is not mentioned in the spec. A strict reading of the latter would mean that GLX_RENDERER_OPENGL_ES_PROFILE_VERSION_MESA is not a recognized token, and the query should instead return False. The implementation does not check for the GLES profile extensions, and the additional complexity doesn't seem worth it. Removing the interaction text makes the spec match the implementation. Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Adam Jackson <ajax@redhat.com>	2018-12-18 15:46:10 -05:00
Eduardo Lima Mitev	5820e63418	freedreno/ir3: Make imageStore use num components from image format emit_intrinsic_store_image() is always using 4 components when collecting registers for the value. When image has less than 4 components (e.g, r32f, rg32i, etc) this results in extra mov instructions. This patch uses the actual number of components from the image format. For example, in a shader like: layout (r32f, binding=0) writeonly uniform imageBuffer u_image; ... void main(void) { ... imageStore (u_image, some_offset, vec4(1.0)); ... } instruction count is reduced in at least 3 instructions (note image format is r32f, 1 component only). This obviously reduces register pressure as well. v2: - Added support for image formats from NV_image_format extension (Ilia Mirkin). - Return 4 components by default instead of asserting. (Rob Clark). v3: Added more missing formats (Ilia Mirkin). v4: Added a debug message for unknown image formats (Rob Clark). Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robdclark@gmail.com>	2018-12-18 21:15:20 +01:00
Jason Ekstrand	5dad1abfdc	nir/dead_write_vars: Get modes directly from derefs Instead of going all the way back to the variable, just look at the deref. The modes are guaranteed to be the same by nir_validate whenever the variable can be found. This fixes clear_unused_for_modes for derefs that don't have an accessible variable. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	fa40a58fd9	nir/copy_prop_vars: Get modes directly from derefs Instead of going all the way back to the variable, just look at the deref. The modes are guaranteed to be the same by nir_validate whenever the variable can be found. This fixes apply_barrier_for_modes for derefs that don't have an accessible variable. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	cf7fb39805	nir/lower_wpos_center: Look at derefs for modes This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	867fe35a16	nir/lower_io_to_scalar: Look at derefs for modes This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	3fe0363dda	nir/lower_io_arrays_to_elements: Look at derefs for modes This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	8cc0f92492	nir/linking_helpers: Look at derefs for modes This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	8410cf66d7	nir/propagate_invariant: Skip unknown vars If we can't find the variable from the deref, just assume it isn't invariant and continue on. This can happen if, for instance, we're writing to a deref that points into an SSBO. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Ian Romanick	29e4b949b4	Revert "nir/lower_indirect: Bail early if modes == 0" "There's no point in walking the program if we're never going to actually lower anything." Except we might lower compacted local arrays. In that case, modes will be 0, but there is still lowering to be done. This reverts commit `7f75cf2a94`. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109081 Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Clayton Craft <clayton.a.craft@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org>	2018-12-18 10:47:54 -08:00
Lucas Stach	433ca3127a	st/dri: replace format conversion functions with single mapping table Each time I have to touch the buffer import/export functions in the dri state tracker I get lost in the maze of functions converting between DRI_IMAGE_FOURCC, DRI_IMAGE_FORMAT, DRI_IMAGE_COMPONENTS and pipe format. Rip it out and replace by a single table, which defines the correspondence between the different representations. Also this now stores all the known representations in the __DRIimageRec, to avoid the loss of information we currently have when importing a buffer with a fourcc, which doesn't have a corresponding dri format. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-18 19:19:45 +01:00
Lucas Stach	67174d40f1	st/dri: allow both render and sampler compatible dma-buf formats Currently all the EGL APIs are missing a way to specify how an imported dma-buf is intended to be used. Demanding the format to be both usable for sampling and rendering artificially restricts the list of formats a driver is able to import. Looking at how the Intel driver implements those DRI2 image APIs it doesn't distinguish between render or sampler compatible formats. So this patch aligns behavior between Intel and Gallium based drivers. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-18 19:19:40 +01:00
Lucas Stach	a3e592e839	etnaviv: use surface format directly There is no need to do the detour over the resource behind the surface to get the format. Use the surface format directly. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>	2018-12-18 19:07:10 +01:00
Dylan Baker	7a90886921	meson: Add toggle for glx-direct GNU Hurd needs to turn off glx-direct, rather than special case it, we'll just add a toggle. CC: 18.3 <mesa-stable@lists.freedesktop.org> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-12-18 09:20:53 -08:00
Dylan Baker	8c77f4c76d	meson: Add support for gnu hurd CC: 18.3 <mesa-stable@lists.freedesktop.org> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-12-18 09:20:49 -08:00
Dylan Baker	6cf5f25bc5	meson: remove duplicate definition Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-12-18 09:18:12 -08:00
Dylan Baker	e430a034b9	meson: Fix ppc64 little endian detection Old versions of meson returned ppc64le as the cpu_family for little endian power8 cpus, versions >=0.48 don't do this, so the check wouldn't work in that case. This generalizes the check to work for both old and new versions of meson. Fixes: `34bbb24ce7` ("meson: Add support for ppc assembly/optimizations") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-12-18 09:17:54 -08:00
Jason Ekstrand	3feda3cf35	anv: Bump the patch version to 96 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-18 09:40:46 -06:00
Kenneth Graunke	3c71ba3baa	i965: Don't override subslice count to 4 on Gen11. Gen9-10 have fewer than 4 subslices per slice, so they need this to be rounded up. Gen11 isn't documented as needing this hack, and it can also have more than 4 subslices, so the hack actually can break things. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2018-12-17 14:03:45 -08:00
Ian Romanick	af07141b33	intel/compiler: More peephole_select for pre-Gen6 No shader-db changes on any Gen6+ platform. All of the shaders with cycles hurt by more than ~2% are from Master of Orion. All of the shaders have instructions helped. It looks like the pass enables some control flow to be converted to bcsels, then the scheduler does dumb things. These are new shaders (just added before doing this shader-db run), so there's probably some low-hanging fruit. Iron Lake total instructions in shared programs: 8214327 -> 8213684 (<.01%) instructions in affected programs: 84469 -> 83826 (-0.76%) helped: 114 HURT: 26 helped stats (abs) min: 2 max: 18 x̄: 7.75 x̃: 9 helped stats (rel) min: 0.17% max: 13.73% x̄: 2.52% x̃: 1.05% HURT stats (abs) min: 2 max: 20 x̄: 9.23 x̃: 8 HURT stats (rel) min: 0.70% max: 2.48% x̄: 1.66% x̃: 1.61% 95% mean confidence interval for instructions value: -5.87 -3.32 95% mean confidence interval for instructions %-change: -2.32% -1.17% Instructions are helped. total cycles in shared programs: 187736850 -> 187749314 (<.01%) cycles in affected programs: 506750 -> 519214 (2.46%) helped: 104 HURT: 36 helped stats (abs) min: 2 max: 72 x̄: 21.96 x̃: 16 helped stats (rel) min: 0.02% max: 6.16% x̄: 0.97% x̃: 0.63% HURT stats (abs) min: 4 max: 1402 x̄: 409.67 x̃: 40 HURT stats (rel) min: 0.33% max: 23.12% x̄: 5.79% x̃: 1.39% 95% mean confidence interval for cycles value: 28.32 149.74 95% mean confidence interval for cycles %-change: -0.07% 1.61% Inconclusive result (%-change mean confidence interval includes 0). GM45 total instructions in shared programs: 5044014 -> 5043652 (<.01%) instructions in affected programs: 46751 -> 46389 (-0.77%) helped: 63 HURT: 13 helped stats (abs) min: 2 max: 29 x̄: 7.65 x̃: 9 helped stats (rel) min: 0.17% max: 13.73% x̄: 2.93% x̃: 1.04% HURT stats (abs) min: 2 max: 20 x̄: 9.23 x̃: 8 HURT stats (rel) min: 0.66% max: 2.35% x̄: 1.58% x̃: 1.52% 95% mean confidence interval for instructions value: -6.54 -2.99 95% mean confidence interval for instructions %-change: -3.04% -1.28% Instructions are helped. total cycles in shared programs: 128143042 -> 128150188 (<.01%) cycles in affected programs: 324564 -> 331710 (2.20%) helped: 57 HURT: 19 helped stats (abs) min: 6 max: 74 x̄: 30.70 x̃: 32 helped stats (rel) min: 0.08% max: 4.74% x̄: 1.22% x̃: 0.81% HURT stats (abs) min: 10 max: 1400 x̄: 468.21 x̃: 60 HURT stats (rel) min: 0.56% max: 19.94% x̄: 5.80% x̃: 1.70% 95% mean confidence interval for cycles value: 6.90 181.15 95% mean confidence interval for cycles %-change: -0.52% 1.59% Inconclusive result (%-change mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	378f996771	nir/opt_peephole_select: Don't peephole_select expensive math instructions On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	8fb8ebfbb0	intel/compiler: More peephole select Shader-db results: The one shader hurt for instructions is a compute shader that had both spills and fills hurt. v2: Fix typo in comment noticed by Caio. v3: Fix inverted condition in brw_nir.c. Noticed by Lionel. Skylake, Broadwell, and Haswell had similar results. (Skylake shown) total instructions in shared programs: 15072761 -> 15047884 (-0.17%) instructions in affected programs: 895539 -> 870662 (-2.78%) helped: 3623 HURT: 1 helped stats (abs) min: 1 max: 181 x̄: 6.89 x̃: 4 helped stats (rel) min: 0.10% max: 25.00% x̄: 3.93% x̃: 3.20% HURT stats (abs) min: 92 max: 92 x̄: 92.00 x̃: 92 HURT stats (rel) min: 1.92% max: 1.92% x̄: 1.92% x̃: 1.92% 95% mean confidence interval for instructions value: -7.10 -6.63 95% mean confidence interval for instructions %-change: -4.03% -3.82% Instructions are helped. total cycles in shared programs: 369738930 -> 369535732 (-0.05%) cycles in affected programs: 68027851 -> 67824653 (-0.30%) helped: 2609 HURT: 1035 helped stats (abs) min: 1 max: 4508 x̄: 181.44 x̃: 77 helped stats (rel) min: <.01% max: 71.31% x̄: 9.14% x̃: 5.47% HURT stats (abs) min: 1 max: 33336 x̄: 261.04 x̃: 20 HURT stats (rel) min: <.01% max: 47.61% x̄: 2.93% x̃: 1.47% 95% mean confidence interval for cycles value: -96.43 -15.09 95% mean confidence interval for cycles %-change: -6.07% -5.36% Cycles are helped. total spills in shared programs: 10158 -> 10159 (<.01%) spills in affected programs: 166 -> 167 (0.60%) helped: 1 HURT: 1 total fills in shared programs: 22105 -> 22116 (0.05%) fills in affected programs: 837 -> 848 (1.31%) helped: 4 HURT: 1 Ivy Bridge total instructions in shared programs: 12021190 -> 11990256 (-0.26%) instructions in affected programs: 910561 -> 879627 (-3.40%) helped: 3344 HURT: 18 helped stats (abs) min: 1 max: 99 x̄: 9.29 x̃: 6 helped stats (rel) min: 0.11% max: 31.18% x̄: 5.19% x̃: 3.31% HURT stats (abs) min: 2 max: 20 x̄: 7.89 x̃: 6 HURT stats (rel) min: 0.70% max: 2.59% x̄: 1.63% x̃: 1.70% 95% mean confidence interval for instructions value: -9.49 -8.91 95% mean confidence interval for instructions %-change: -5.32% -4.98% Instructions are helped. total cycles in shared programs: 179077826 -> 178570196 (-0.28%) cycles in affected programs: 63205667 -> 62698037 (-0.80%) helped: 2767 HURT: 620 helped stats (abs) min: 1 max: 7531 x̄: 217.58 x̃: 88 helped stats (rel) min: <.01% max: 75.86% x̄: 9.59% x̃: 6.09% HURT stats (abs) min: 1 max: 31255 x̄: 152.27 x̃: 11 HURT stats (rel) min: <.01% max: 36.36% x̄: 2.77% x̃: 0.58% 95% mean confidence interval for cycles value: -173.94 -125.81 95% mean confidence interval for cycles %-change: -7.68% -6.97% Cycles are helped. Sandy Bridge total instructions in shared programs: 10852569 -> 10843758 (-0.08%) instructions in affected programs: 235803 -> 226992 (-3.74%) helped: 800 HURT: 0 helped stats (abs) min: 1 max: 88 x̄: 11.01 x̃: 8 helped stats (rel) min: 0.11% max: 23.08% x̄: 4.69% x̃: 3.36% 95% mean confidence interval for instructions value: -11.93 -10.10 95% mean confidence interval for instructions %-change: -4.99% -4.39% Instructions are helped. total cycles in shared programs: 154732047 -> 154608941 (-0.08%) cycles in affected programs: 4063110 -> 3940004 (-3.03%) helped: 606 HURT: 253 helped stats (abs) min: 1 max: 2524 x̄: 227.93 x̃: 62 helped stats (rel) min: 0.02% max: 39.24% x̄: 4.36% x̃: 1.81% HURT stats (abs) min: 1 max: 1966 x̄: 59.36 x̃: 11 HURT stats (rel) min: 0.02% max: 67.10% x̄: 3.22% x̃: 0.67% 95% mean confidence interval for cycles value: -170.49 -116.13 95% mean confidence interval for cycles %-change: -2.61% -1.65% Cycles are helped. No change on Iron Lake or GM45. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	09b7e1d8e4	nir/opt_peephole_select: Don't try to remove flow control around indirect loads That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	4cd1a0be76	i965/vec4: Propagate conditional modifiers from more compares to other compares If there is a CMP.NZ that compares a single component (via a .zzzz swizzle, for example) with 0, it can propagate its conditional modifier back to a previous CMP that writes only that component. The specific case that I saw was: cmp.l.f0(8) g42<1>.xF g61<4>.xF (abs)g18<4>.zF ... cmp.nz.f0(8) null<1>D g42<4>.xD 0D In this case we can just delete the second CMP. No changes on Broadwell or Skylake because they do not use the vec4 backend. Also no changes on GM45 or Iron Lake. Sandy Bridge, Ivy Bridge, and Haswell had similar results. (Sandy Bridge shown) total instructions in shared programs: 10856676 -> 10852569 (-0.04%) instructions in affected programs: 228322 -> 224215 (-1.80%) helped: 1331 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 3.09 x̃: 4 helped stats (rel) min: 0.11% max: 6.67% x̄: 1.88% x̃: 1.83% 95% mean confidence interval for instructions value: -3.19 -2.99 95% mean confidence interval for instructions %-change: -1.93% -1.83% Instructions are helped. total cycles in shared programs: 154788865 -> 154732047 (-0.04%) cycles in affected programs: 2485892 -> 2429074 (-2.29%) helped: 1097 HURT: 59 helped stats (abs) min: 2 max: 168 x̄: 51.96 x̃: 64 helped stats (rel) min: 0.12% max: 12.70% x̄: 3.44% x̃: 2.22% HURT stats (abs) min: 2 max: 16 x̄: 3.02 x̃: 2 HURT stats (rel) min: 0.18% max: 0.83% x̄: 0.64% x̃: 0.71% 95% mean confidence interval for cycles value: -51.04 -47.26 95% mean confidence interval for cycles %-change: -3.40% -3.07% Cycles are helped. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	9a83c3d3b3	i965/fs: Eliminate unary op on operand of compare-with-zero The (-abs(x) >= 0) => (x == 0) optimization is removed from the vec4 and scalar parts. In the VS part, adding the new pattern was not helpful. The pattern that is removed is really old, and it has been handled by NIR for ages. All Gen7+ platforms had similar results. (Broadwell shown) total instructions in shared programs: 14715715 -> 14715709 (<.01%) instructions in affected programs: 474 -> 468 (-1.27%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.12% max: 1.35% x̄: 1.28% x̃: 1.35% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -1.40% -1.15% Instructions are helped. total cycles in shared programs: 559569911 -> 559569809 (<.01%) cycles in affected programs: 5963 -> 5861 (-1.71%) helped: 6 HURT: 0 helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17 helped stats (rel) min: 1.45% max: 1.88% x̄: 1.73% x̃: 1.85% 95% mean confidence interval for cycles value: -18.15 -15.85 95% mean confidence interval for cycles %-change: -1.95% -1.51% Cycles are helped. Iron Lake and Sandy Bridge had similar results. (Iron Lake shown) total instructions in shared programs: 7780915 -> 7780913 (<.01%) instructions in affected programs: 246 -> 244 (-0.81%) helped: 2 HURT: 0 total cycles in shared programs: 177876108 -> 177876106 (<.01%) cycles in affected programs: 3636 -> 3634 (-0.06%) helped: 1 HURT: 0 GM45 total instructions in shared programs: 4799152 -> 4799151 (<.01%) instructions in affected programs: 126 -> 125 (-0.79%) helped: 1 HURT: 0 total cycles in shared programs: 122052654 -> 122052652 (<.01%) cycles in affected programs: 3640 -> 3638 (-0.05%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	440c051340	i965/vec4/dce: Don't narrow the write mask if the flags are used In an instruction sequence like cmp(8).ge.f0.0 vgrf17:D, vgrf2.xxxx:D, vgrf9.xxxx:D (+f0.0) sel(8) vgrf1:UD, vgrf8.xyzw:UD, vgrf1.xyzw:UD The other fields of vgrf17 may be unused, but the CMP still needs to generate the other flag bits. To my surprise, nothing in shader-db or any test suite appears to hit this. However, I have a change to brw_vec4_cmod_propagation that creates cases where this can happen. This fix prevents a couple dozen regressions in that patch. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `5df88c20` ("i965/vec4: Rewrite dead code elimination to use live in/out.")	2018-12-17 13:47:06 -08:00
Ian Romanick	111bcc8d02	i965/vec4: Silence unused parameter warnings in vec4 compiler tests src/intel/compiler/test_vec4_copy_propagation.cpp: In member function ‘virtual brw::dst_reg* copy_propagation_vec4_visitor::make_reg_for_system_value(int)’: src/intel/compiler/test_vec4_copy_propagation.cpp:57:51: warning: unused parameter ‘location’ [-Wunused-parameter] virtual dst_reg make_reg_for_system_value(int location) ^~~~~~~~ src/intel/compiler/test_vec4_copy_propagation.cpp: In member function ‘virtual void copy_propagation_vec4_visitor::emit_urb_write_header(int)’: src/intel/compiler/test_vec4_copy_propagation.cpp:77:43: warning: unused parameter ‘mrf’ [-Wunused-parameter] virtual void emit_urb_write_header(int mrf) ^~~ src/intel/compiler/test_vec4_copy_propagation.cpp: In member function ‘virtual brw::vec4_instruction copy_propagation_vec4_visitor::emit_urb_write_opcode(bool)’: src/intel/compiler/test_vec4_copy_propagation.cpp:82:57: warning: unused parameter ‘complete’ [-Wunused-parameter] virtual vec4_instruction emit_urb_write_opcode(bool complete) ^~~~~~~~ src/intel/compiler/test_vec4_register_coalesce.cpp: In member function ‘virtual brw::dst_reg register_coalesce_vec4_visitor::make_reg_for_system_value(int)’: src/intel/compiler/test_vec4_register_coalesce.cpp:60:51: warning: unused parameter ‘location’ [-Wunused-parameter] virtual dst_reg make_reg_for_system_value(int location) ^~~~~~~~ src/intel/compiler/test_vec4_register_coalesce.cpp: In member function ‘virtual void register_coalesce_vec4_visitor::emit_urb_write_header(int)’: src/intel/compiler/test_vec4_register_coalesce.cpp:80:43: warning: unused parameter ‘mrf’ [-Wunused-parameter] virtual void emit_urb_write_header(int mrf) ^~~ src/intel/compiler/test_vec4_register_coalesce.cpp: In member function ‘virtual brw::vec4_instruction register_coalesce_vec4_visitor::emit_urb_write_opcode(bool)’: src/intel/compiler/test_vec4_register_coalesce.cpp:85:57: warning: unused parameter ‘complete’ [-Wunused-parameter] virtual vec4_instruction emit_urb_write_opcode(bool complete) ^~~~~~~~ src/intel/compiler/test_vec4_cmod_propagation.cpp: In member function ‘virtual brw::dst_reg cmod_propagation_vec4_visitor::make_reg_for_system_value(int)’: src/intel/compiler/test_vec4_cmod_propagation.cpp:60:51: warning: unused parameter ‘location’ [-Wunused-parameter] virtual dst_reg make_reg_for_system_value(int location) ^~~~~~~~ src/intel/compiler/test_vec4_cmod_propagation.cpp: In member function ‘virtual void cmod_propagation_vec4_visitor::emit_urb_write_header(int)’: src/intel/compiler/test_vec4_cmod_propagation.cpp:85:43: warning: unused parameter ‘mrf’ [-Wunused-parameter] virtual void emit_urb_write_header(int mrf) ^~~ src/intel/compiler/test_vec4_cmod_propagation.cpp: In member function ‘virtual brw::vec4_instruction cmod_propagation_vec4_visitor::emit_urb_write_opcode(bool)’: src/intel/compiler/test_vec4_cmod_propagation.cpp:90:57: warning: unused parameter ‘complete’ [-Wunused-parameter] virtual vec4_instruction *emit_urb_write_opcode(bool complete) ^~~~~~~~ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Bas Nieuwenhuizen	f67dea5e19	radv: Fix multiview depth clears We were not using the view mask for depth clears, causing only the first view to be cleared. Fixes: `2e86f6b259` "radv: Add multiview clears." Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-17 20:16:26 +00:00
Bas Nieuwenhuizen	9add63a3a5	radv: Remove redundant format check. The switch directly after the check has a default case that returns NULL too, so the effective return value is not changed. Also this check is wrong once we start dealing with formats introduced by an extension (e.g. YUV formats). Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-17 20:09:38 +00:00
Eric Anholt	708d8f4d0a	nir: Fix clamping of uints for image store lowering. I botched some copy-and-paste and clamped to signed int max instead of uint max. Fixes KHR-GL46.shader_image_load_store.multiple-uniforms on skl. Fixes: `d3e046e76c` ("nir: Pull some of intel's image load/store format conversion to nir_format.h") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-17 20:02:22 +00:00
Eric Anholt	00e2cbc049	v3d: Fix the argument type for vir_BRANCH(). Apparently this has been spewing warnings for Jason's clang, but not my gcc.	2018-12-17 09:52:23 -08:00
Eric Anholt	376054fff3	vc4: Reuse nir_format_convert.h in our blend lowering. These helpers came along after and have effectively the same implementation.	2018-12-17 09:52:23 -08:00
Samuel Pitoiset	445867c80d	radv: report Vulkan version 1.1.90 for real I thought the value was correctly propagated, but actually not. Fixes: `2ac6d55f38` ("radv: bump reported version to 1.1.90") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-17 17:51:48 +01:00
Jason Ekstrand	cae373117c	anv,radv: Re-enable VK_EXT_pci_bus_info Now at version 2 with the fixed header. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 10:42:35 -06:00
Jason Ekstrand	e5b59fe6f5	vulkan: Update the XML and headers to 1.1.96 Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 10:41:56 -06:00
Rhys Perry	ef198e8c6a	radv: switch from nir_bcsel to nir_b32csel Fixes: `191a1dce92` ('nir: Add 1-bit Boolean opcodes') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-17 14:52:39 +00:00
Rhys Perry	bba94a3d85	radv: don't set surf_index for stencil-only images Fixes: `f8d5b377c8` ('radv: set cb base tile swizzles for MRT speedups (v4)') Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108116 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-17 14:52:10 +00:00
Ian Romanick	9dc135efa1	nir: Release per-block metadata in nir_sweep nir_sweep already marks all metadata invalid, so it is safe to release the memory here too. mean soft fp64 using uint64: 1,342,759,331 => 1,010,670,475 gfxbench5 aztec ruins high 11: 63,555,571 => 61,889,811 deus ex mankind divided 148: 62,845,304 => 62,829,640 deus ex mankind divided 2890: 71,922,686 => 71,922,686 dirt showdown 676: 69,238,607 => 69,238,607 dolphin ubershaders 210: 77,822,072 => 77,822,072 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-16 14:39:56 -08:00
Ian Romanick	7adafd6e1c	nir: Fix holes in nir_instr Found using pahole. Changes in peak memory usage according to Valgrind massif: mean soft fp64 using uint64: 1,343,991,403 => 1,342,759,331 gfxbench5 aztec ruins high 11: 63,619,971 => 63,555,571 deus ex mankind divided 148: 62,887,728 => 62,845,304 deus ex mankind divided 2890: 72,399,750 => 71,922,686 dirt showdown 676: 69,464,023 => 69,238,607 dolphin ubershaders 210: 78,359,728 => 77,822,072 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-16 14:39:56 -08:00
Ian Romanick	8161a87b24	nir/phi_builder: Use per-value hash table to store [block] -> def mapping Replace the old array in each value with a hash table in each value. Changes in peak memory usage according to Valgrind massif: mean soft fp64 using uint64: 5,499,875,082 => 1,343,991,403 gfxbench5 aztec ruins high 11: 63,619,971 => 63,619,971 deus ex mankind divided 148: 62,887,728 => 62,887,728 deus ex mankind divided 2890: 72,402,222 => 72,399,750 dirt showdown 676: 74,466,431 => 69,464,023 dolphin ubershaders 210: 109,630,376 => 78,359,728 Run-time change for a full run on shader-db on my Haswell desktop (with -march=native) is 1.22245% +/- 0.463879% (n=11). This is about +2.9 seconds on a 237 second run. The first time I sent this version of this patch out, the run-time data was quite different. I had misconfigured the script that ran the test, and none of the tests from higher GLSL versions were run. These are generally more complex shaders, and they are more affected by this change. The previous version of this patch used a single hash table for the whole phi builder. The mapping was from [value, block] -> def, so a separate allocation was needed for each [value, block] tuple. There was quite a bit of per-allocation overhead (due to ralloc), so the patch was followed by a patch that added the use of the slab allocator. The results of those two patches was not quite as good: mean soft fp64 using uint64: 5,499,875,082 => 1,343,991,403 gfxbench5 aztec ruins high 11: 63,619,971 => 63,619,971 deus ex mankind divided 148: 62,887,728 => 62,887,728 deus ex mankind divided 2890: 72,402,222 => 72,402,222 * dirt showdown 676: 74,466,431 => 72,443,591 * dolphin ubershaders 210: 109,630,376 => 81,034,320 * The * denote tests that are better now. In the tests that are the same in both patches, the "after" peak memory usage was at a different location. I did not check the local peaks. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-16 14:39:56 -08:00
Ian Romanick	e3043e1276	util/hash_table: Add _mesa_hash_table_init function Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-16 14:39:56 -08:00
Jason Ekstrand	db197fdb6c	st/nir: Use nir_src_as_uint for tokens Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-16 15:07:28 -06:00
Jason Ekstrand	47e1e0692c	radv: Fix a stupid if in gather_intrinsic_info Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 15:06:07 -06:00
Jason Ekstrand	6bcd2af086	nir/algebraic: Add some optimizations for D3D-style Booleans D3D Booleans use a 32-bit 0/-1 representation. Because this previously matched NIR exactly, we didn't have to really optimize for it. Now that we have 1-bit Booleans, we need some specific optimizations to chew through the D3D12-style Booleans. Shader-db results on Kaby Lake: total instructions in shared programs: 15136811 -> 14967944 (-1.12%) instructions in affected programs: 2457021 -> 2288154 (-6.87%) helped: 8318 HURT: 10 total cycles in shared programs: 373544524 -> 359701825 (-3.71%) cycles in affected programs: 151029683 -> 137186984 (-9.17%) helped: 7749 HURT: 682 total loops in shared programs: 4431 -> 4399 (-0.72%) loops in affected programs: 32 -> 0 helped: 21 HURT: 0 total spills in shared programs: 10290 -> 10051 (-2.32%) spills in affected programs: 2532 -> 2293 (-9.44%) helped: 18 HURT: 18 total fills in shared programs: 22203 -> 21732 (-2.12%) fills in affected programs: 3319 -> 2848 (-14.19%) helped: 18 HURT: 18 Note that a large chunk of the improvement fixing regressions caused by switching to 1-bit Booleans. Previously, our ability to optimize D3D booleans was improved by using the D3D representation directly in NIR. Now that NIR does 1-bit bools, we need a few more optimizations. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	3b30814791	nir/algebraic: Optimize 1-bit Booleans Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	44227453ec	nir: Switch to using 1-bit Booleans for almost everything This is a squash of a few distinct changes: glsl,spirv: Generate 1-bit Booleans Revert "Use 32-bit opcodes in the NIR producers and optimizations" Revert "nir/builder: Generate 32-bit bool opcodes transparently" nir/builder: Generate 1-bit Booleans in nir_build_imm_bool Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00

... 214 215 216 217 218 ...

117072 commits