This could be made slightly more efficient by only setting the dirty
state that is needed, but eventually you reach a point where it's
cheaper to re-emit everything than work out what can or can't be kept.
Fixes rendering issues in Duckstation.
Fixes: cd2c1ef9da ("panfrost: Dirty track textures/samplers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15250>
Debugging fd lifetime issues can be hard. Add a helper for debug builds
to print out an error if an fd is not a fence fd, and sprinkle it around
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15094>
When discarding the whole resource to create a new one, if this resource
is used by a sampler view, a rebind must be done to use the new
resource.
But this must be done when setting the sampler views, because we don't
have access to those samplers before.
v2:
- Pack shader state on setting sampler views (Iago)
- Use a serial ID to know when to rebind sampler views (Juan)
v3:
- Move check to caller (Iago)
- Keep rebind sampler view on BO change (Iago)
v4:
- Rename "serial_bo" to "serial_id" (Iago)
- Add comments (Iago)
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6027
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15171>
The Gallium pipe video "frame_num" variable is internally used as a
counter of elapsed reference frames since the last IDR. The incoming
frame_num field from VA picture parameters is not equivalent; the VA
value may wrap to zero prematurely, as it is a 16-bit struct field with
a documented max value of 2^(log2_max_frame_num_minus4 + 4)-1.
This change improves "infinite GOP" single-client live streaming, where
it is reasonable for the server to desire an endless series of P-frames
without IDR. Without this change, it is difficult/impossible for an
application to encode a P- or B-frame after the VA frame_num field wraps
around to zero, depending on the backend encoder implementation.
This change has no effect on existing applications that always signal an
IDR frame and reset the VA frame_num to zero before it wraps around. For
example, the FFmpeg vaapi encoder ignores the VA documentation and sends
an un-wrapped VA frame_num, which results in identical computation of
the internal frame_num (as long as each GOP is less than 65536 frames).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5768
Reviewed-by: Thong Thai <thong.thai@amd.com>
patch revision 3: correctly avoid incrementing frame_num when the encoded
frame is not a reference, per h264 spec and ffmpeg behavior
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14332>
Add vs_needs_sgvs_element value check when updating vertex
element dirty state in iris_update_compiled_vs to solve
render error of Android game "Genshin Impact".
Signed-off-by: Xiaohui Gu <xiaohui.gu@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15142>
NIR-to-TGSI produces partial output writes contrary to the old paths
that always wrote the full outputs. Therefore if there is now a partial
output write ready to be scheduled and nothing else besides a tex
is ready, we would schedule the output write first. This was not a
problem before as usually at last some component of the full output write
depended on the tex result.
This is not optimal from the performance point of view and resulted in
~20% slowdown in the Unigine demos. The docs say:
The first OUTPUT instruction will reserve space in the output register
fifo. This space is limited, therefore issuing an OUTPUT earlier than
necessary may cause threads to stall earlier than necessary. You
should not set an ALU instruction as type OUTPUT unless it is actually
writing to an output register, or it is the last instruction of
the program.
Fix it by explicitly prefering a TEX before OUT and restore the
performance: 9.66 -> 12.12 fps (as compared to 11.83 with the old
glsl-to-TGSI path) in Unigine Sanctuary. No change in Lightsmark or
GLmark.
This is also a win from the intructions point of view as we are usually
able to schedule the partial output writes in a single pair at the end.
total instructions in shared programs: 106009 -> 105891 (-0.11%)
instructions in affected programs: 10153 -> 10035 (-1.16%)
helped: 118
HURT: 0
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5840
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15165>
The max_score == -1 condition is already before so this
will never trigger. Its unclear what was the intention anyway. Now we
emit either:
- if we have accumulated enough tex intructions for a full block
- if we have nothing else to emit
- or if we can emit all remaining tex instructions already.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15165>
Something is slightly off in the integer values returned. It passes many
tests without the fixup, but the dEQP-GLES31 tests complain. The blob
ends up doing 3x gathers, and selects between them based on getinfo
results. Since we already have a per-sampler key with some spare bits,
just stick the bit-size info in there. And we can derive signedness from
the associated type info.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14670>
From Section 4.4.1 (Input Layout Qualifiers) of the GLSL 4.50 spec:
"For some blocks declared as arrays, the location can only be applied
at the block level: When a block is declared as an array where
additional locations are needed for each member for each block array
element, it is a compile-time error to specify locations on the block
members. That is, when locations would be under specified by applying
them on block members, they are not allowed on block members. For
arrayed interfaces (those generally having an extra level of
arrayness due to interface expansion), the outer array is stripped
before applying this rule"
From Section 1.2.1 (Changes from Revision 6 of GLSL Version) of the GLSL 4.50 spec:
"Private Bug 15678: Don’t allow location = on block members where
the block needs an array of locations"
From Section 4.4.1 (Input Layout Qualifiers) of the GLSL ES 3.20 spec
"If an input is declared as an array of blocks, excluding per-vertex-arrays
as required for tessellation, it is an error to declare a member of
the block with a location qualifier"
From Section 1.1.3 (Changes from GLSL ES 3.2 revision 3) of the GLSL ES 3.20 spec:
"Arrayed blocks cannot have layout location qualifiers on members"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11522>
if one of these states change then it affects which result needs to be
used for that query, so split it up over multiple query ids to make sure
the correct result is obtained
fixes (lavapipe):
GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_pause_resume
GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_states
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15227>
When the AOS/linear code was added it only worked with TGSI which
meant nothing in mesa upstream was really using it.
This adds support to analyse NIR shaders, and adds aos support
to the backend.
AOS support is limited to mov,vec,fmul,tex sampling in order to
accelerate mostly compositing operations. I've tested weston uses
the fast path. gnome-shell can't use it yet as we can't optimise
the depth test paths.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15140>
This fixes a regressions with overlap in llvmpipe, this is pessimistic
we should write code to make it work properly.
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15219>
This can avoid some cases where a constant has to be loaded into a
temporary register.
v2: Update i915-g33-fails.txt.
total instructions in shared programs: 788625 -> 782376 (-0.79%)
instructions in affected programs: 166269 -> 160020 (-3.76%)
helped: 1578
HURT: 0
helped stats (abs) min: 3 max: 21 x̄: 3.96 x̃: 3
helped stats (rel) min: 1.56% max: 33.33% x̄: 4.82% x̃: 3.45%
95% mean confidence interval for instructions value: -4.06 -3.86
95% mean confidence interval for instructions %-change: -5.00% -4.64%
Instructions are helped.
LOST: 0
GAINED: 35
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15210>
This should reduce follow-on optimization work to copy-propagate and
dead-code away the movs generated in construction of vectors.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14865>
We were just emitting the bad reloc (either an assert fail on a debug
build or for a release build likely a GPU hang from the resulting fault).
Given that the GLES 3.2 spec's robust context requirement says we should
return undefined data but not terminate for element indices outside of the
VB, ignoring the offset in this case seems like a better behavior to have
in all cases.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15198>
Seems we already had implemented this feature (see commit 521e1d0275
"broadcom/vc5: Add support for anisotropic filtering"), but we didn't
enable the proper capability.
Also update the maximum level of anistropy supported.
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4201
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15180>
The problem is that dirty_states must be 0 for any state that is NULL
in "queued". This code was flagging dirty_states for such states because
it was only looking at "emitted". It should have been looking at "queued".
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15209>
moved from radeonsi without the vectorization, which won't be needed for
now. We will lower IO in st/mesa instead of radeonsi to get the transform
feedback info into store instructions.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14388>
It's probably harmless, but it is logically meaningless. The DDK doesn't do it,
I don't see a reason for us to, either. In theory this should be a small
overhead win.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15204>