This can generate things like fneg! of load_const, which is silly.
Fold those away into an actual constant. Only do so on the scalar
backend because there's a comment above that the vec4 backend doesn't
want any new constants this late, and I'm inclined to believe it.
fossil-db stats show a very minor improvement:
Totals:
Instrs: 203091223 -> 203091099 (-0.00%); split: -0.00%, +0.00%
Cycles: 14410638075 -> 14410577067 (-0.00%); split: -0.00%, +0.00%
Totals from 20 (0.00% of 665070) affected shaders:
Instrs: 27067 -> 26943 (-0.46%); split: -0.47%, +0.01%
Cycles: 2687958 -> 2626950 (-2.27%); split: -2.27%, +0.00%
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22881>
Enable sparse binding now that vkQueueBindSparse works with feedback.
If a device only has queue families with exclusive sparse binding
support then disable sparse binding.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22616>
Add back support for vkQueueBindSparse that works with fence and timeline
semaphore feedback.
For each vkQueueBindSparse batch, if it contains feedback then move the
signal operations to a subsequent vkQueueSubmit with feedback cmds.
This requires queue families that support vkQueueSubmit alongside sparse
binding support so any queue familes that exclusively support sparse
binding will be filtered out.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22616>
add getter/setters for VkBindSparseInfo so we can at least share
vn_queue_submission_prepare() to handle external semaphores and
check for fence/semaphore feedback
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22616>
the inner conditional here didn't include uncached readback, meaning
that many (most?) buffers allocated with uncached memory (i.e., BAR) were
being read back directly instead of using staging resources to be faster
at some point this inner conditional should be reevaluated to determine
whether it still does anything, but this is not that time
fixes, among other things, loading in DOOM2016 on some GPUs
Fixes: 52f27cda05 ("zink: allow direct memory mapping for any COHERENT+CACHED buffer")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22907>
in the case where a cmdbuf was submitted with write access and the subsequent
batch promotes an op to unordered, it's important for associated barriers
to happen-before those ops to guarantee synchronization
the fixes tag is wrong on this, but it's all in the same release
Fixes: bf0af0f8ed ("zink: move all barrier-related functions to c++")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22907>
WARP has a temp register limit, and the control flow needed to convert
indirect to direct accesses on large temps ends up bloating shaders massively.
We can just go ahead and spill these large temps to scratch, which maps
to an alloca in DXIL.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22787>
This is useful to debug drivers to be able to
disable all specific d3d9 features and always trigger
the emulated path.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22644>
This feature is almost never used in programmable
shaders so no issue was ever reported.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22644>
Implement backup support for clip planes.
Driver support is still preferred, as the driver
can reuse the compilation of the core of the shader
to generate variants.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22644>
Drivers with PIPE_CAP_CLIP_PLANES set to 0,
such as zink, ignore clip_plane_enable.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22644>
The first version of the shader didn't have proper
force_color_in_centroid field set.
That won't make much a difference (centroid is very
similar to no centroid) but it is still better.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22644>
Some drivers don't handle it, and those who do replace it anyway
depending on the rasterizer setting. Keep the rasterizer setting
but replace the interpolation flag accordingly.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22644>
Gallium drivers used to implement the legacy behaviour.
It's not the case of all recent drivers, so implement
the legacy behaviour in nine.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22644>
What would be missing for position_t to work in
vs programmable shaders when VS_WINDOW_SPACE_POSITION
is unavailable is to apply the inverse viewport transformation
similarly to what is done for ff vs.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22644>
Previously we would implement position_t by
applying the inverse of the viewport, and
advertising clipping was going to occur with
the cap CLIPTLVERTS.
However when the cap is advertised, clipping
is supposed to be disabled via sw emulation
when D3DRS_CLIPPING is set to FALSE.
Since we don't support that either, instead take the
approach of disabling at least depth clipping, and
not advertising the cap.
Ideally, clipping should be totally disabled.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22644>
Replace max_ps_const_f with a constant.
In practice it already was always the
same value no matter the hw.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22644>
Right now subsampled images are the same as non-subsampled images, this
will change when we actually implement them which will be an ABI break.
Disallow importing/exporting them with modifiers until that's stabilized
to force users to match the driver UUID.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20304>
When rendering a scaled tile, we need to use the original, hardware
FragCoord when accessing input attachments that are on-tile (i.e. were
rendered to in a previous subpass) because they are also scaled in the
same way that FragCoord is scaled. For input attachments that aren't
already on-tile, however, we need to use the fixed gl_FragCoord. Add a
new intrinsic and a bitfield of input attachments which should use it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20304>
We scale the actual rendering by patching the viewport state. This is
helped by a HW bit to make the viewport index equal to the view index,
so that we can have a different scaling per-view.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20304>
FDM is implemented pretty much entirely inside the driver, by patching
various structures for each bin. This adds the core infrastructure to
sample the density map, compute the scaled bin sizes we will use, create
patchpoints, and apply them at the start of each bin before executing
the IB2.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20304>
In order to patch the command stream on the gpu, we need two features:
1. The ability to use a read-write BO instead of a read-only one, when
patching might be performed.
2. The ability to get the iova of the current position after reserving
some number of dwords, even with externally-allocated command
streams.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20304>