Don't add extra movs to construct the swizzles, but just split the
instruction into separate channels, if possible. Idea by Filip Gawin.
shader-db for RV370:
total instructions in shared programs: 84632 -> 83565 (-1.26%)
instructions in affected programs: 12613 -> 11546 (-8.46%)
helped: 295
HURT: 8
total temps in shared programs: 12437 -> 12237 (-1.61%)
temps in affected programs: 1807 -> 1607 (-11.07%)
helped: 153
HURT: 20
LOST: 1
GAINED: 19
The HURT instructions and the single lost shaders are some fluctuations
from pair scheduling. The number of instructions before pair scheduling
is always lower or equivalent.
Partial fix for: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6339
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Tested-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20009>
For any instruction that can be reasonably converted to alpha we check
all of its readers to see if the conversion is possible (including check
for at least one free alpha source) at the beginning of pair scheduling.
However, if the reader instruction has multiples sources that could be
converted to alpha and multiple indeed are, than we could run of of the
alpha sources eventually. So recheck just before converting that there
are still some unused sources left.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Tested-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20009>
Move the streamout code into the streamout-only branch. The code must be
guarded by si_shader_uses_streamout(). Using xfb_stride is not enough.
Fixes: 003cbddfee - radeonsi: use native shader info when init streamout args
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20147>
`d3d12_destroy_screen` is called by `d3d12_create_dxcore_screen` after
`d3d12_init_screen_base` fails and attempts to call `util_dl_close` on
a NULL pointer, leading to an abort.
To fix this, only close the library after if it was actually opened.
Cc: mesa-stable
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20145>
There were a couple of issues here:
1. We should be using nir_const_value_for_uint instead of setting the
union fields directly to ensure the rest of the union is zeroed.
2. It was always filling out the first two components of val even if
the incoming constant had 2 64-bit components.
Fixes: 165fb5117b ("r600/sfn: add lowering passes to get 64 bit ops lowered to 32 bit vec2")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19689>
This pass is for lower intrinsics to driver spec nir instructions,
so that each compiler backend don't need to implement their own.
Like radv_nir_lower_abi().
Currently only lower intrinsics in si_llvm_load_intrinsic().
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18010>
Move shader args out of llvm context, so that we can init
it before get nir. This is for creating a nir lower abi pass
which load args directly in nir.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18010>
We are going to init shader args earlier, there is no such
pipe_stream_output_info when that time.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18010>
The hardware only supports aligned loads and stores. That applies to vertex
buffer loads as well. As such, we need to ensure that the base address of vertex
buffers, the stride, and the offset are all aligned to the vertex buffer format,
ensuring that the load itself is aligned. Mesa has a CAP for that,
PIPE_CAP_VERTEX_ATTRIB_ELEMENT_ALIGNED_ONLY, which ensures that these conditions
are met and will rewrite a vertex buffer on the CPU in the off chance that
they're not.
This is a bug fix compared to the old code, because it requires that offsets and
base addresses are aligned (not just the strides like before). It's also an
optimization compared to the old code, because it does not require 4 byte
alignment for 8-bit and 16-bit formats. In fact, it doesn't require any
alignment for 8-bit formats. This will avoid needless CPU work for smaller
formats.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19996>
Now we support all the vertex formats! This means we don't hit u_vbuf for format
translation, which helps performance in lots of applications. By doing the
lowering in NIR, the vertex fetch code itself can be optimized by NIR (e.g.
nir_opt_algebraic) which can improve generated code quality.
In my first implementation of this, I had a big switch statement mapping format
enums to interchange formats and post-processing code. This ends up being really
unwieldly, the combinatorics of bit packing + conversion + swizzles is
enormous and for performance we want to support everything (no u_vbuf
fallbacks). To keep the combinatorics in check, we rely on parsing the
util_format_description to separate out the issues of bit packing, conversion,
and swizzling, allowing us to handle bizarro formats like B10G10R10A2_SNORM with
no special casing.
In an effort to support everything in one shot, this handles all the formats
needed for the extensions EXT_vertex_array_bgra, ARB_vertex_type_2_10_10_10_rev,
and ARB_vertex_type_10f_11f_11f_rev.
Passes dEQP-GLES3.functional.vertex_arrays.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19996>
When NGG VS need to export primitive ID, it will load it in GS
threads, so need to use gs_prim_id arg. Current nir to llvm
translator check vs_prim_id present to use vs_prim_id first.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17109>
LDS will be accessed starting from esgs_ring which has offset 0.
So ngg_scratch and ngg_emit base address is just the offset from
the esgs_ring base.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17109>
According to the deprecation note:
> It's a pure subset of meson.get_external_property, and works strangely
> in host == build configurations, since it would be more accurately
> described as get_host_property.
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19904>
Add support for NATIVE_FENCE_FD so panfrost can advertise support for
EGL_ANDROID_native_fence_sync.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19774>
Before adding support for NATIVE_FENCE_FD, let's move the fencing logic
to a dedicated file to avoid spreading the code in different places.
Suggested-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19774>
This lowers line-stippling to a combination of geometry and fragment
shaders:
- The geometry shader computes the length of each line-segment, and
outputs a varying that produces the stipple position.
- The fragment shader looks up the stipple position in the
stipple-pattern once per sample, and updates the sample mask
accordingly.
In case there's no geometry shader in place, we create a new
pass-through shader.
We should probably not declare the the push-constants in the pipeline
layout unless they're actually needed. But we already do this
unconditionally for the vertex shader and tesselation push-constants, so
let's do it unconditionally for these as well for now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19117>
Right now, it's only the vertex-shader that needs special handling for
non-optimal keys. That makes it possible to use fallthrough to always
end up in the last-vertex-stage conditional.
But we're about to add special handling for the geometry stage as well,
so let's prepare by splitting the switch-statement in two; one that only
happens for non-optimal keys, and does all the needed processing there,
and one that deals with the rest.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19117>
Line-stipple lowering is going to need some geometry-shader specific
lowering, so lets give the GS its own shader-key struct.
The GS variant only needs a non-optimal variant, so let's assert that to
be sure.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19117>
There's two notable limitations here:
- This will viewport-map to viewport #0 only. This is because we need
the viewport-scale factors, which we'll be uploading using
push-constants. And we don't want to waste too many of those...
- It's missing a "global" stipple-counter. It doesn't seem like there's
a portable way of implementing this, so this is going to require a VK
extension that can be implemented in a hardware-specific way in the
long run. For now, let's just ignore the global stipple counter.
These two limitations don't seem viable to overcome for now, so but this
is better than nothing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19117>
This is not ideal, but at least it should work. In the long run, we
might want to store a bit per mode we're missing, so we can do this
conditionally. But that's quite a bit more complicated, so let's go with
this for now.
The line-stippling logic needs non-optimal shader-keys. So let's drop
some perf on the floor here.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19117>