We were writing descriptors into si_context and then copying them into
the command buffer. Just write them into the command buffer directly.
Also set the pointer to VBO descriptors right after them.
When we start a new command buffer or we finish blitting, we no longer
restore precomputed VBO descriptors. Instead, we just reupload them again.
It's a compromise to have the common path simpler and faster (maybe).
This removes a lot of stuff. Now the VBO descriptor upload path looks
very similar to the display list path.
There was an accidental hidden optimization that is now documented as
"last_const_upload_buffer".
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17933>
While this is nice to have, it doesn't include VBO descriptors in user
SGPRs, and we need to remove it, so that we can simplify the VBO code.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17933>
This is more straightforward. Also, radeon_add_to_buffer_list makes
writing VBO descriptors into the command buffer slower after that code
is reordered in following commits. This seems to be the only way that
isn't slower.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17933>
We can't invalidate CCU if there is any dirty data that hasn't been
flushed yet. In the case where we clear depth, we know that the depth
attachment itself isn't dirty but there may be dirty data from other
renderpasses. Therefore we need to flush before invalidating depth.
Fixes: 487aa80 ("tu: Rewrite flushing to use barriers")
Closes: #6987
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17940>
This adds RADV_CMP_COPY to compact copies. Based on ANV.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17649>
Instead of copying every field individually, just use a whole memcpy.
This could be optimized but that's not the point here.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17649>
We already save/restore all other dynamic states unconditionally, it's
not really useful to make an exception for sample locations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17649>
According to LLVM, we only need to care about VOPC which writes exec.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17697>
This prevents issues where we insert a s_waitcnt_vscnt(0) at the start of
a block or very end of the shader because we're joining two blocks (for
example, one with has_VMEM=true and the other with
has_branch_after_DS=true).
fossil-db (navi10):
Totals from 2441 (1.51% of 161220) affected shaders:
Instrs: 1383964 -> 1384094 (+0.01%); split: -0.07%, +0.08%
CodeSize: 7438212 -> 7438760 (+0.01%); split: -0.05%, +0.06%
Latency: 13780665 -> 13679664 (-0.73%); split: -1.53%, +0.80%
InvThroughput: 2950835 -> 2921511 (-0.99%); split: -1.06%, +0.07%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17697>
For example, "DS -> branch -> VMEM -> branch -> DS".
fossil-db (navi10):
Totals from 639 (0.40% of 161220) affected shaders:
Instrs: 629090 -> 628254 (-0.13%); split: -0.19%, +0.06%
CodeSize: 3410164 -> 3406748 (-0.10%); split: -0.14%, +0.04%
Latency: 7834755 -> 7821011 (-0.18%); split: -0.70%, +0.52%
InvThroughput: 1369698 -> 1374495 (+0.35%); split: -0.12%, +0.47%
A lot of the fossil-db changes are noise.
threekingdoms.8db138826c386a62.1.foz/0b222ed175eebad0 is an example of a
shader that actually has this issue.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Fixes: c037ba1bb7 ("aco/gfx10: Mitigate LdsBranchVmemWARHazard.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17697>
In a scenario where a sequence of calls happens like:
* subdata(buffer_a, offset=0, size=64)
* subdata(buffer_a, offset=64, size=64)
* subdata(buffer_a, offset=128, size=64)
* subdata(buffer_a, offset=192, size=64)
and the buffer can't be directly mapped (e.g., because it has bindings), the
subdata calls will now be merged together into one larger subdata call.
This achieves a 3x perf gain in
KHR-GL46.CommonBugs.CommonBug_SparseBuffersWithCopyOps on radeonsi
Before:
real 0m1,923s
user 0m1,017s
sys 0m0,051s
After:
real 0m0,686s
user 0m0,502s
sys 0m0,071s
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17741>
With GPL, it will be possible to create VS prologs and PS epilogs
from libraries, so reference counting is useful here too.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17628>
Introduce helpers like for descriptor set layouts. This will also
help graphics pipeline libraries.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17628>
skqp output is verbose, as we are running multiple backends at the same
job, normally the trace will surpass the Gitlab UI line limit.
This commit wraps every skqp execution in a Gitlab section and removes
some `set -xtrace` from skqp-runner.sh for a cleaner output.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
To fix some warnings, one should write a much complex bash code, such as
SC2086, so prefer to be simple and functional.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
The files are now separated in three: crashes, fails and flakes.
They should be located inside $INSTALL folder at:
- $GPU_VERSION_$SKQP_BACKEND_rendertests-$MODE.txt
- $GPU_VERSION_unittests-$MODE.txt
Where:
- $MODES can be crashes, fails, and flakes
- $SKQP_BACKEND can be gl, gles and vk
crashes and flakes removes tests from skqp, so they will not be run.
As skqp does not have support for flaky test detection, let's not run
them.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
These binaries are used to generate a list of tests that can be run in a
target device and are useful for testing new devices
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
Default results directory was fixed via $PWD variable, but it is safer
to use the same as init-stage2.sh uses: $CI_PROJECT_DIR to indicate the
results folder.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
Some skqp tests may crash the entire job run, assure that the reports
will be showed to the user after the test started to run.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
When the skqp is introduced to a new driver, the best practice is to
run all available tests from skqp and classifying the
failing/crashing/flaking ones.
The default behavior of skqp is to run the tests from the commit where
the skqp built, which may not be adequate for the target driver.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
This might break out-of-order rasterization on GFX8-GFX9 because it
relies on the stencil write mask which can be dynamic.
Found by inspection.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17673>
This reverts commit 96fa23bca5.
The correct fix to the problem was a1bc152340, making this
change obsolete as the pass skips any vars marked with
always_active_io. There was no real advantage to allowing these
vars to be split because they can't be removed anyway. Also there
is no way to split varying arrays gracefully here due to the xfb
layout rules, and this change didn't handle arrays at all.
Removing this obsolete code also fixes an assert in the new CTS
test KHR-Single-GL45.enhanced_layouts.xfb_all_stages. The test
was legally adding xfb offsets to all vertex stages but since
we only mark the varyings in the final vertex stage with the
always_active_io flag the other stages were correctly lowering
to scalars but when an array with an offset hit this code it
asserted since it couldn't handle it.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Fixes: a1bc152340 ("spirv: mark variables decorated with XfbBuffer as always active")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6928
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17878>
In case a shader only use gl_FragCoord.xy, this avoids wasting
coefficient registers for gl_FragCoord.zw which should be a small
optimization. It's also less work for DCE but I'm less worried about
that.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
If we want to break down a 64-bit value into its 32-bit halves, we want
to be able to use a split for this:
lo, hi = split long
Extend the RA to handle this case.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
It is, as the name suggests, broken. Instruction count goes from 50->53
on the shader in
dEQP-GLES2.functional.shaders.operator.binary_operator.div.highp_int_fragment.
I'm happy to eat that cost in exchange for correct results!
There are lots more low-hanging opportunities for optimizations to that
shader:
- fuse double icmpsel for the b2i32(cmp) sequences
- promoting big immediates to uniforms
- fusing integer multiply+add
But for now this is acceptable and anyway I'm doing this on "fix broken
NIR lowering" time and not Asahi time.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
We can implement umul_high (for both 16-bit and 32-bit types)
efficiently by multiplying in the next larger type size and extracting
the upper word. We already have such an implementation (for instancing).
Extract it so we can use it for emit_alu too.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
This seems to be an architectural constraint. Ensure that RA satisfies
it, because otherwise we're left with mysterious fails.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>