It's only used there, and we don't plan to use it in Dozen, so let's move
the code to src/gallium/drivers/d3d12/ and get rid of the static
d3d12_resource_state library.
Suggested-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16042>
GL spec states that the stride for indirect multidraws:
* cannot be negative
* can be zero
* must be a multiple of 4
some drivers can't support strides which are not a multiple of the
size of the indirect struct being used, however, so rewrite those to
direct draws
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15963>
this was incorrectly calculating too small of a map region if
the stride was less than the size of the struct
Fixes: 3eb9932317 ("aux/draw: add a util function for reading back indirect draw params")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15963>
Towards the renderer, venus better uses VK_EXT_image_drm_format_modifier
to force linear with tiling modifier and mod_linear. Doing so won't make
any difference on the mesa implementations we care about given we have
required VK_EXT_image_drm_format_modifier for wsi support.
A lucky side effect of this is to allow common wsi to work with host
implementations not supporting dma_buf export.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15993>
WSI images and Android AHBs can have tiling modifier overrides, thus we
must override the aspectMask upon image subresource layout query.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15993>
this is only possible when tc determines the buffer is not in use
and decides to return a pointer immediately, so just give back a staging
buffer
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15979>
this was an attempt to minimize the number of xfb barriers being emitted,
but really xfb barriers need to always be emitted in order for xfb to work
cc: mesa-stable
fixes (nv):
KHR-GL46.texture_view.reference_counting
KHR-GL46.transform_feedback_overflow_query_ARB.multiple-streams-multiple-buffers-per-stream
KHR-GL46.transform_feedback_overflow_query_ARB.multiple-streams-one-buffer-per-stream
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16065>
a read barrier is needed for resume, yes, but the counter buffer
is always being written to, so write access must always be set
cc: mesa-stable
fixes (nv):
KHR-GL46.transform_feedback.draw_xfb_test
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16065>
this was well-documented, but ultimately wrong: the synchronization
being used was for binding streamout buffers (not counter buffers) as
vertex buffers, which was already handled just fine in the normal
vertex buffer binding
drawing from streamout ONLY uses the counter buffer, which means
the counter buffer needs to be synchronized for reading
cc: mesa-stable
fixes (nv):
KHR-GL46.transform_feedback.draw_xfb_feedbackk_test
KHR-GL46.transform_feedback.draw_xfb_instanced_test
KHR-GL46.transform_feedback.draw_xfb_stream_instanced_test
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16065>
I had some workarounds in ALU op emits trying to fix up when we were asked
to store to unsupported channels when the ALU op had 64bit srcs (so only
vec2 supported) but a 32-bit dest with a >vec2 writemask.
Those workarounds had some bugs breaking 64-bit uniform initializer tests
on virgl, and also set up too wide of a writemask such that they triggered
assertion failures on nvc0. We can avoid the need for those workarounds
at emit time by just having nir_lower_vec_to_movs not generate unsupported
writemasks in the first place.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15934>
The core provide generic helpers to turn Vulkan minor version
features/properties into their KHR counterparts. Let's declare those
core features/properties structs and use those helpers so we get
ready to support newer spec versions without too much pain.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15911>
It's declared an input in TGSI, even though it's an SV in the backend. In
NIR, it shows up as an SV, so it's in this list. Fixes NIR regressions in
primitive-id-in and primitive-id-restart.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16063>
With TGSI, the driver allocates space for the alpha ref as a uniform and
adds a conditional discard to the shader. We could either replicate that
with NIR, or just set the flag saying we need the shader lowering and get
the same thing.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16063>
There's no hardware instructions for them until then. These chips don't
expose the extension provinding the GLSL builtins for operations like
bfrev, but NIR can recognize the construct and optimize it to
bitfield_reverse, which pre-nvc0 would then fail to codegen. Prevents a
regression when moving to nir-to-tgsi. Other lower_bitfield flags are set
as well for when someone comes along and adds optimizations for them too.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16063>
Also adjust the lowering pass to handle wide SSBO loads that we now emit
for the nir case.
This improves generated code quality since memoryopt can't
merge SSBO loads that end up predicated on a bounds check.
This also happens to fix a few test cases, only because the simpler generated
IR is less likely to trigger other compiler bugs. Eg on kepler with
NV50_PROG_USE_NIR=1, this fixes
arb_gpu_shader_fp64-fs-non-uniform-control-flow-ubo
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16063>
This is important so you don't go comparing the number of instructions
emitted when you unrolled loops differently.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16063>
For !8044 I'm working on getting all drivers to accept NIR. The NIR
compiler in the driver is apparently not quite ready, so use NIR-to-TGSI
instead. This is a net win in testcases working on my RV770 and Turks
cards (especially in some important piglit tests involving YUV dma-buf
decode), though it's not regression-free.
shader-db (R600):
total dw in shared programs: 8553412 -> 8358918 (-2.27%)
dw in affected programs: 7476702 -> 7282208 (-2.60%)
total gprs in shared programs: 217286 -> 213217 (-1.87%)
gprs in affected programs: 72747 -> 68678 (-5.59%)
total loops in shared programs: 398 -> 330 (-17.09%)
loops in affected programs: 68 -> 0
total cf in shared programs: 558835 -> 332768 (-40.45%)
cf in affected programs: 420475 -> 194408 (-53.76%)
shader-db (Turks):
total dw in shared programs: 14104598 -> 13556782 (-3.88%)
dw in affected programs: 12161972 -> 11614156 (-4.50%)
total gprs in shared programs: 321068 -> 313690 (-2.30%)
gprs in affected programs: 114899 -> 107521 (-6.42%)
total loops in shared programs: 736 -> 651 (-11.55%)
loops in affected programs: 111 -> 26 (-76.58%)
total cf in shared programs: 925771 -> 581226 (-37.22%)
cf in affected programs: 678600 -> 334055 (-50.77%)
total stack in shared programs: 27853 -> 27855 (<.01%)
stack in affected programs: 5 -> 7 (40.00%)
glmark2 terrain: 0.137649% +/- 0.0511938% (n=6)
glmark2 jellyfish: no change (n=8)
unigine valley (extreme) 5.36 -> 5.45 (n=1 it takes so long to run)
unigine heaven (basic) 16.13 -> 16.15 (n=1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14319>
KHR-GL33.shaders.indexing.tmp_array.vertexid regressed with the switch to
NIR-to-TGSI because the shader got optimized enough to emit a read just
after writing to the array (the kind of situation where a non-rel write
would have been followed by a PV/PS read). The R600 and EG docs say you
always need to do this, but apparently some hardware gives you the right
answer anyway so we don't flag it on all of them.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14319>
util_cpu_detect is an anti-pattern: it relies on callers high up in the call
chain initializing a local implementation detail. As a real example, I added:
...a Mali compiler unit test
...that called bi_imm_f16() to construct an FP16 immediate
...that calls _mesa_float_to_half internally
...that calls util_get_cpu_caps internally, but only on x86_64!
...that relies on util_cpu_detect having been called before.
As a consequence, this unit test:
...crashes on x86_64 with USE_X86_64_ASM set
...passes on every other architecture
...works on my local arm64 workstation and on my test board
...failed CI which runs on x86_64
...needed to have a random util_cpu_detect() call sprinkled in.
This is a bad design decision. It pollutes the tree with magic, it causes
mysterious CI failures especially for non-x86_64 developers, and it is not
justified by a micro-optimization.
Instead, let's call util_cpu_detect directly from util_get_cpu_caps, avoiding
the footgun where it fails to be called. This cleans up Mesa's design,
simplifies the tree, and avoids a class of a (possibly platform-specific)
failures. To mitigate the added overhead, wrap it all in a (fast) atomic
load check and declare the whole thing as ATTRIBUTE_CONST so the
compiler will CSE calls to util_cpu_detect.
Co-authored-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15580>
NIR doesn't have that knob currently, so we end up throwing errors about
it being ignored.
This should fix cases of "tgsi_to_nir: unhandled TGSI property 23 = 1",
and presumably do better at DX9 muls on nv50 and r600.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14883>
Otherwise, we get an ishl that the HW can't support, and a ushr if the NIR
ends up being lowered to ubo_vec4, which may not get constant-folded if
the offset was non-constant.
This matches what mesa/st uses for this arg to uniform lowering.
Fixes: #5971
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14883>
On virglrenderer it is of interest to not re-use temporaries when we
want to handle precise, invariant, and highp/mediump with better
possibility for optimization.
v2: Force optimized RA if the number of registers is too large
(Emma: only 16 bit signed int are reserved for register indices)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16051>
r600 would end up looking for it past the end of its array of inputs
(which expected 1:1 ordering from declarations to driver locations).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16043>
virglrenderer emits GLSL referencing all the swizzles, even if the write
mask doesn't contain them. This is a problem when the output is
TessLevelInner, which has only 2 elements.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16043>
Same splitting method as store_output. Fixes regressions in virgl
with nir-to-tgsi.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16043>