Commit graph

136275 commits

Author SHA1 Message Date
Rob Clark
9425b1343e gallium/u_threaded: use mesa_log for debug msgs
On android, this will show up in logcat, rather than being lost into the
ether.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
2021-03-11 04:42:15 +00:00
Rob Clark
f2f72ec3fe gallium/u_threaded: Add helper to assert driver thread
Useful for drivers to add some sanity checks to avoid/detect threading
issues caused by things that might be called (indirectly) from frontend
thread.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
2021-03-11 04:42:15 +00:00
Rob Clark
d2a920ee6e util: Extract thread-id helpers from u_current
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
2021-03-11 04:42:15 +00:00
Timothy Arceri
1772569449 Revert "glsl: default to compat shaders in compat profile"
This reverts commit 6c8cc9be12.

A spec bug was resolved confirming the original behaviour. Also it
seems the game Foundation no longer depends on the incorrect
behaviour.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9486>
2021-03-11 04:09:49 +00:00
Douglas Anderson
217d6594de gallium/indices: Use "__restrict" to help the compiler
In a perf trace translate_quads_uint2uint_last2last_prdisable() was
showing up as a huge hot spot. Digging through the assembly on arm64
found that the compiler wasn't doing any read caching. Specifically,
the generated code looked roughly like this:

  out[j+0] = in[i+0];
  out[j+1] = in[i+1];
  out[j+2] = in[i+3];
  out[j+3] = in[i+1];
  out[j+4] = in[i+2];
  out[j+5] = in[i+3];

...and the compiler was loading "i+1" and "i+3" from memory twice for
no reason (instead of caching it).

If we sprinkle generous amounts of the `__restrict` keyword then the
compiler is able to be much smarter. Not only does it avoid
double-loading but it also generates better instructions. It uses two
LDRD instructions instead of 6 LDR instructions and uses some STRD
too.

In one example test this increased FPS from ~25.7 to ~34.5.

Change-Id: I88bf8bd9ac421fe48a7d6961e224425c3ae7beee
Reported-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9485>
2021-03-11 03:14:31 +00:00
Jason Ekstrand
e7e297732e vulkan/alloc: Use char * for pointer arithmetic
MSVC doesn't like arithmetic on void *.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9511>
2021-03-10 20:59:59 -06:00
Jason Ekstrand
492b5577f0 vulkan/util: Add a type parameter to vk_multialloc_add
We also switch from using __alignof__ to alignof() in util/macros.h
which works on MSVC with the one unfortunate downside of requiring an
actual type and not a value.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9511>
2021-03-10 20:59:56 -06:00
Jason Ekstrand
c120edd8e8 vulkan/alloc: Add VK_MULTIALLOC_DECL macros
These both declare the variable and add it to the allocator in one go.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9511>
2021-03-10 20:59:55 -06:00
Jason Ekstrand
5afdbfe0c8 vk/alloc: Handle zero sizes better in vk_multialloc_add
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9511>
2021-03-10 20:59:53 -06:00
Jason Ekstrand
c22267262e vulkan: Use ALWAYS_INLINE for multialloc
This way it properly compiles on Visual Studio.

Fixes: 145444d265 "anv: Move multialloc to common code"
Acked-by: Daniel Stone <daniels@collabora.com>
Acked-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9506>
2021-03-10 23:15:17 +00:00
Anuj Phogat
96e251bde7 intel: Rename "GEN_" prefix used in common code to "INTEL_"
This patch renames all macros with "GEN_" prefix defined in
common code.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9413>
2021-03-10 22:23:51 +00:00
Anuj Phogat
65d7f52098 intel: Fix broken alignment due to gen_ prefix renaming
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9413>
2021-03-10 22:23:51 +00:00
Anuj Phogat
692472a376 intel: Rename "gen_" prefix used in common code to "intel_"
This patch renames functions, structures, enums etc. with "gen_"
prefix defined in common code.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9413>
2021-03-10 22:23:51 +00:00
Anuj Phogat
733b0ee8cb intel: Rename files with gen_ prefix in common code to intel_
Changes in this patch include:
- Rename all files in src/intel/common path
- Update the filenames used in source and build files

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9413>
2021-03-10 22:23:51 +00:00
Jason Ekstrand
b9e9f92f73 intel/fs: Handle payload node interference in destinations
Starting with d0d039a4d3, we emit writes to the push constant chunk
of the payload to stomp out-of-bounds data to zero for Vulkan.  Then, in
369eab9420, we started emitting shader preamble code for emulated
push constants on Gen12.5 parts.  In either of these cases, we can run
into issues if we don't have a proper live range for some of the payload
registers where they get used for something and then smashed by our push
handling code.  We've not seen many issues with this yet because it only
happens when you have dead push constants.

Fixes: d0d039a4d3 "anv: Emit pushed UBO bounds checking code..."
Fixes: 369eab9420 "intel/fs: Emit code for Gen12-HP indirect..."
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9501>
2021-03-10 22:17:41 +00:00
Jason Ekstrand
8b7c2f1800 intel/fs: Use INTEL_MASK for pushish constant address masking
It's easier to compare with the HW docs than a pile of hex.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9501>
2021-03-10 22:17:41 +00:00
Yannik Marek
369f9d225d turnip: fix alpha to coverage in no color and unused attachment cases
In cases where the alpha coverage is enabled but the color attachment is
either unused or absent there should be a dummy mrt to make the draw behave
correctly.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Yannik Marek <yannik@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8952>
2021-03-10 22:02:43 +00:00
Adam Jackson
ea27f2bf09 zink: Fix a thinko in instance setup
It really does help to size these arrays correctly.

Fixes: 2b4fcf0a06 zink: generate instance creation code with a python script
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9499>
2021-03-10 20:19:00 +00:00
Matt Turner
6ceb6b509e turnip: Remove unused TU_DEBUG_IR3 flag
Replaced by IR3_SHADER_DEBUG=disasm,{vs,...,cs} and unused since the
commit referenced below.

Fixes: 808992fc50 ("tu: Use the ir3 shader API")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8249>
2021-03-10 18:59:22 +00:00
Eric Anholt
eba1b2a1ba ci/freedreno: Mark another a5xx TF flake.
Showed up with an iommu fault preceding it each time it failed.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9488>
2021-03-10 18:44:16 +00:00
Marek Olšák
e39336a21e radeonsi: enable RGP on gfx10.3
It seems to work on VanGogh.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9492>
2021-03-10 18:31:04 +00:00
Jason Ekstrand
5d8fa880d6 radv: Drop CreateRenderPass
We can use the generic fall-back which calls CreateRenderPass2 instead.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>
2021-03-10 18:17:31 +00:00
Jason Ekstrand
8304b4eef7 radv/meta: Use CreateRenderPass2
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>
2021-03-10 18:17:31 +00:00
Jason Ekstrand
24414e7ec4 anv: Drop CreateRenderPass
Fall back to the common implementation instead.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>
2021-03-10 18:17:31 +00:00
Jason Ekstrand
b302159b1c vulkan: Preserve preserve attachments in CreateRenderPass
This is trivial so I really don't know why it wasn't handled in the
initial turnip code.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>
2021-03-10 18:17:31 +00:00
Jason Ekstrand
147187f754 vulkan: Add some asserts and checks for multiview in CreateRenderPass
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>
2021-03-10 18:17:31 +00:00
Jason Ekstrand
5de355b0f9 vulkan: Use correct aspectMask in CreateRenderPass
If a VkRenderPassInputAttachmentAspectCreateInfo is provided, we use the
aspects specified there.  Otherwise, we default to every aspect in the
format.  For attachments which are not input attachments, aspectMask is
left zero.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>
2021-03-10 18:17:31 +00:00
Jason Ekstrand
4fb6c051c9 anv: Move vk_format helpers to common code
The Android ones we put in anv_android.c.  Maybe one day we'll want a
vk_android.h to put some common Android stuff but, for now, let's keep
it contained to ANV's android code.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>
2021-03-10 18:17:31 +00:00
Jason Ekstrand
c7345bd1fb vulkan: Use VK_MULTIALLOC in CreateRenderPass
The variable-length stack allocations are causing issues with ubsan when
the array size is zero.  Also, a heap allocation is probably safer.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>
2021-03-10 18:17:31 +00:00
Jason Ekstrand
145444d265 anv: Move multialloc to common code
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>
2021-03-10 18:17:31 +00:00
Jason Ekstrand
2523c47720 turnip: Move the CreateRenderPass wrapper to common code
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>
2021-03-10 18:17:31 +00:00
Marek Olšák
3b7b2df509 ac: remove switch cases for pc_lines for compute-only chips
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9389>
2021-03-10 18:02:28 +00:00
Marek Olšák
975e5e262b ac,radeonsi: use correct VGPR granularity on Aldebaran
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9389>
2021-03-10 18:02:28 +00:00
Marek Olšák
a9da3fc0d1 ac: handle bigger instruction prefetch for Aldebaran
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9389>
2021-03-10 18:02:27 +00:00
Marek Olšák
9fdf69e611 ac/llvm: unpack thread IDs on Aldebaran
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9389>
2021-03-10 18:02:27 +00:00
Marek Olšák
6edf1978d3 ac: set the TCC line size for Aldebaran
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9389>
2021-03-10 18:02:27 +00:00
Marek Olšák
230a6dc55d ac,radeonsi: add sampler changes for Aldebaran
- no 3D and cube textures
- no mipmapping
- no border color
- image_sample is the only supported opcode with a sampler (behaves like _lz)

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9389>
2021-03-10 18:02:27 +00:00
James Zhu
381d3a5a38 amd: add Aldebaran chip enum
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9389>
2021-03-10 18:02:27 +00:00
Danylo Piliaiev
2764cf8d32 ir3: use OPC_GETBUF to get size of sampler buffers
The maximum value which OPC_GETSIZE could return for one dimension
is 0x007ff0, however sampler buffer could be much bigger.
Blob uses OPC_GETBUF for them.

Fixes tests:
 dEQP-VK.memory.pipeline_barrier.transfer_dst_uniform_texel_buffer.1048576

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9391>
2021-03-10 17:10:45 +00:00
Danylo Piliaiev
8e6ed9948e freedreno/a5xx: port handling of PIPE_BUFFER textures from a6xx
Otherwise, we won't be able to use OPC_GETBUF to get their size.

After this change we also could get rid of the hack for OPC_GETSIZE
which scaled the size for texture buffers.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9391>
2021-03-10 17:10:44 +00:00
Danylo Piliaiev
d968995c67 turnip: fix SP_HS_WAVE_INPUT_SIZE value
It appears that storage for varyings in a wave has an upper
limit of wavesize * max_a831 where max_a831 is 64.
Exceeding the limit seam to force gpu to reduce primitives
processed per wave, at least calculations make sense with
such interpretation.

With blob SP_HS_WAVE_INPUT_SIZE never exceeds 64 and setting
it to 65 in freedreno leads to a hang.

Copied from the commit to freedreno e5499ca2

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8187>
2021-03-10 16:50:11 +00:00
Connor Abbott
7b7532b806 freedreno/computerator: Add branching example
Mainly to be able to test label resolution without having to replace a
shader.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9463>
2021-03-10 16:23:04 +00:00
Connor Abbott
19c7b6f9d6 ir3/parser: Add ability to specify branchstack
This lets you test branching with computerator.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9463>
2021-03-10 16:23:04 +00:00
Connor Abbott
a820eb537c ir3/parser: Support labels
This fixes the assembly for many scenarios where you want to use shader
replacement.

Note: unfortunately this leaks the identifier string created while
lexing, but I couldn't find a way to avoid leaking it except for
bringing in ralloc or something (which would be way more complicated).
The only other place doing something similar in mesa is the glsl parser,
which is using ralloc (actually a linear context).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9463>
2021-03-10 16:23:04 +00:00
Connor Abbott
534658f79b freedreno/computerator: Fix example assembly
Use the new bindless cat6 syntax for a6xx.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9463>
2021-03-10 16:23:04 +00:00
Connor Abbott
cd772d5687 ir3/parser: Fix parsing of "0.0" in @const line
Trying to specify a floating-point value in a @const line would result
in it getting interpreted as a FLUT value and failing parsing. Fix this
by making the various FLUT tokens include the surrounding parentheses.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9463>
2021-03-10 16:23:04 +00:00
Marek Vasut
f7dc0520d9 etnaviv: Fix point sprite Z,W coordinate replacement
Mesa fixed pipeline texture loading on programmable pipeline hardware emits
a generic fragment shader program which contains gl_TexCoord.xyzw as a vec4
and then expects to configure the varying assignments to the shader in the
pipeline command stream, to select what is wired to the XYZW fragment shader
inputs.

This gl_TexCoord.xyzw is turned into texture load with projection (TGSI TXP
opcode, similar for NIR). Texture load with projection does not exist in the
Vivante GPU as a dedicated opcode and is emulated. The shader program first
divides texture coordinates XYZ by projector W and then applies regular TEX
opcode to load the texture (i.e. TEX(gl_TexCoord.xyzw/gl_TexCoord.wwww)).

For point sprites, XY are the point coordinates from VS, Z=0 and W=1, always.
The Vivante GPU can only configure varying to be either of -- point coord X,
point coord Y, used, unused -- which covers XYZ, but not W. Z is fine because
unused means 0.

W used to be 0 too before this patch and that led to division by 0 in shader.
The only known way to solve this is to set Z=0, W=1 in the shader program
itself if the point sprites are enabled. This means we have to generate a
special shader variant which does extra SET to set the W=1 in case the point
sprites are enabled.

In case of TGSI, emitting the SET.TRUE opcode permits setting W=1 without
allocating additional constants. With NIR, use nir_lower_texcoord_replace()
to lower TEXn to PNTC, which sets Z=0, W=1, and let NIR optimize the shader.
Note that nir_lower_texcoord_replace() must be called before input linking
is set up, as it might add new FS input.

Also note that it should be possible to simply drop PIPE_CAP_POINT_SPRITE
in the long run, ST would then apply the same optimization pass, but that
option is so far misbehaving. And for etnaviv TGSI this is not applicable
yet.

This fixes neverball point sprites (exit cylinder stars) and eglretrace of
gl4es pointsprite test:
https://github.com/ptitSeb/gl4es/blob/master/traces/pointsprite.tgz

Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8618>
2021-03-10 11:48:21 +00:00
Iago Toral Quiroga
8525cb1c53 v3dv: call util_cpu_detect() when initializing the instance
Fixes this assert in debug builds:

in __GI___assert_fail (assertion=0x7ffff731f66b "util_cpu_caps.nr_cpus >= 1", file=0x7ffff731f650 "../src/util/u_cpu_detect.h", line=116,
  function=0x7ffff7323280 <__PRETTY_FUNCTION__.11654> "util_get_cpu_caps") at assert.c:101
in util_get_cpu_caps () at ../src/util/u_cpu_detect.h:116
in _mesa_float_to_float16_rtz (val=0) at ../src/util/half_float.h:93
in util_format_r16g16b16a16_float_pack_rgba_float (dst_row=0x7fffffffbdc0 "", dst_stride=0, src_row=0x7fffffffbf90, src_stride=0, width=1, height=1)
   at src/util/format/u_format_table.c:13459
in util_format_pack_rgba (format=PIPE_FORMAT_R16G16B16A16_FLOAT, dst=0x7fffffffbdc0, src=0x7fffffffbf90, w=1) at ../src/util/format/u_format.h:1525
in util_pack_color (rgba=0x7fffffffbf90, format=PIPE_FORMAT_R16G16B16A16_FLOAT, uc=0x7fffffffbdc0) at ../src/gallium/auxiliary/util/u_pack_color.h:432
in v3dv_get_hw_clear_color (color=0x7fffffffbf90, internal_type=6, internal_size=8, hw_color=0x7fffffffbf10) at ../src/broadcom/vulkan/v3dv_cmd_buffer.c:1241

v2: move call from physical device to instance init.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9408>
2021-03-10 11:44:01 +01:00
Iago Toral Quiroga
c057a1211b broadcom/compiler: disallow ldunif during ldvary sequences if possible
This restores many of the hurt shaders from the previous patch at the
expense of re-adding ldvary tracking in the scheduler.

total instructions in shared programs: 13760415 -> 13755738 (-0.03%)
instructions in affected programs: 1207560 -> 1202883 (-0.39%)
helped: 5080
HURT: 1731
Instructions are helped.

total max-temps in shared programs: 2322991 -> 2322828 (<.01%)
max-temps in affected programs: 5063 -> 4900 (-3.22%)
helped: 229
HURT: 108
Max-temps are helped.

total sfu-stalls in shared programs: 31827 -> 31545 (-0.89%)
sfu-stalls in affected programs: 478 -> 196 (-59.00%)
helped: 304
HURT: 21
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 13792242 -> 13787283 (-0.04%)
inst-and-stalls in affected programs: 1220856 -> 1215897 (-0.41%)
helped: 5162
HURT: 1697
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>
2021-03-10 07:52:22 +00:00
Iago Toral Quiroga
947e9e42cc broadcom/compiler: simplify ldvary pipelining
We get optimal ldvary pipelining by doing the following:

1) Carefully merge a paired ldvary into the previous instruction when
   possible.
2) When the above succeeds, flag the ldvary as scheduled immediately so
   we can merge one of its children into the current instruction.
3) When scheduling ldvary sequences, only pick up instructions that are
   part of the sequence to avoid picking up something that prevents
   successful pipelining.

This patch skips 3) assuming some hurt shaders in exchange for better
scheduling flexibility during ldvary sequences. Besides eliminating most
of the code dedicated to special handling ldvary sequences, this also
usually allows us to produce better code by merging instructions that are
unrelated to ldvary sequences into the ldvary sequences, which is
particularly effective to fill up the gaps produced when scheduling the
first and last ldvary sequences as well as the gaps produced by flat
and noperspective varyings sequences that don't have both mul and add
instructions.

Notice that there are some hurt shaders, because some times the extra
scheduler flexibility can lead to picking up instructions that will
break a sequence without compensating for that, typically an ldunif
that prevents us from doing the fixup for a follow-up ldvary. We will
try to correct some of these cases with the next patch.

total instructions in shared programs: 13786037 -> 13760415 (-0.19%)
instructions in affected programs: 3201387 -> 3175765 (-0.80%)
helped: 16155
HURT: 4146
Instructions are helped.

total max-temps in shared programs: 2324834 -> 2322991 (-0.08%)
max-temps in affected programs: 22160 -> 20317 (-8.32%)
helped: 1340
HURT: 103
Max-temps are helped.

total sfu-stalls in shared programs: 30685 -> 31827 (3.72%)
sfu-stalls in affected programs: 782 -> 1924 (146.04%)
helped: 253
HURT: 1416
Inconclusive result.

total inst-and-stalls in shared programs: 13816722 -> 13792242 (-0.18%)
inst-and-stalls in affected programs: 3171642 -> 3147162 (-0.77%)
helped: 15331
HURT: 4179
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>
2021-03-10 07:52:22 +00:00