Rename vlVaSetCscMatrix to vlVaSetProcParameters because it now does
more than just setting csc matrix.
Acked-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Thong Thai <thong.thai@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24869>
Video textures include padding, so this is needed to avoid sampling
outside of src rect due to scaling or additional offset.
Fixes wrong colors on right/bottom edge.
Acked-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Thong Thai <thong.thai@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24869>
Video textures include padding, so make sure to not sample outside
src rect. Also remove the parameter and always use the offset.
When not scaling, this fixes blurry output.
When scaling, this fixes incorrect color at right/bottom edge.
Acked-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Thong Thai <thong.thai@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24869>
This reduces silliness in Dolphin ubershaders by eliminating the double
lowering. It also makes the GLES shader assembly nicer to read.
Dolphin ubershader performance at 4K on MMG improved by about 0.5%. Not massive,
but definitely noticeable and reduces the delta to macOS.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
While desktop GL supports sampler LOD bias, GLES does not. To support the GL use
case, all Gallium drivers are expected to handle sampler LOD bias. However, this
may require shader code to implement (lowering tex to txb, txl to fadd+txl) and
cost resources to push the LOD bias constants into the shader. The issue is
compounded with something like Dolphin's GLES renderer, which does this LOD bias
emulation itself -- meaning that LOD bias is lowered twice when using Dolphin
with GLES! As such, this commit adds a context flag for frontends to communicate
that they will never use sampler LOD bias, allowing the driver to omit the
lowering as a GLES fast path (or, for Dolphin, for performance parity between
GLES and GL).
This will be used on Asahi. It could also be used to optimize a path on
Mali-T720 supported in Panfrost, though I don't intend to write that patch.
Originally https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25034
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
With =deqp. I don't want this exposed before geometry shaders since we run dEQP
(GLES) far more than Piglit (GL), and we need geometry shaders to get adequate
regression testing via dEQP-GLES.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
When rendering to a layered depth/stencil attachment, we specify the layer
stride in pages. That means that depth/stencil targets must be page-aligned to
be rendered to correctly.
If we're merely sampling, not rendering, we do not need the extra alignment. So
we add a flag to handle this case so we keep passing the generated ail tests.
Fixes KHR-GLES31.core.texture_cube_map_array.color_depth_attachments
Similarly, we page-align colour attachments. I don't have a good theoretical
justification for this part, but it seems to be necessary and layered rendering
fails otherwise. Possibly the PBE requires page-aligned layers unconditionally?
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
This lets us save a LOT of instructions at the cost of increased register
pressure. However, on my shader-db, this is still coming out ahead since no
shaders are hurt for thread count/spills, and only 1/10 of the shaders helped
for instruction count are hurt for register pressure. The shaders most hurt
for pressure have very low pressure (7 -> 15 is the worst case) and you need a
certain number of registers to use a 4 source instruction at all. Analyzing the
hurt shaders, nothing concerns me too much ... this isn't as bad as I feared.
So I think at this point it's worth ripping off the bandage, given the massive
potential for instruction count win. This is a big improvement for some of the
shaders I'm working on for my $SECRET_PROJECT.
total instructions in shared programs: 1784943 -> 1775169 (-0.55%)
instructions in affected programs: 644211 -> 634437 (-1.52%)
helped: 3498
HURT: 38
Instructions are helped.
total bytes in shared programs: 11720734 -> 11643224 (-0.66%)
bytes in affected programs: 4370986 -> 4293476 (-1.77%)
helped: 3572
HURT: 36
Bytes are helped.
total halfregs in shared programs: 474094 -> 475165 (0.23%)
halfregs in affected programs: 12821 -> 13892 (8.35%)
helped: 65
HURT: 247
Halfregs are HURT.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
Simple greedy thing that has the potential to inflate register pressure but
reduces instructions. Thanks to the recent loop work that turns if { break }
into while_icmp, this also implicitly handles fusing conditions into loops,
which is what actually prompted this.
Surprisingly, this helps register pressure on my shader-db (no change to thread
count), I guess by eliminating the boolean temps in case where the sources are
used multiple times.
total instructions in shared programs: 1786561 -> 1784943 (-0.09%)
instructions in affected programs: 128557 -> 126939 (-1.26%)
helped: 474
HURT: 13
Instructions are helped.
total bytes in shared programs: 11733236 -> 11720734 (-0.11%)
bytes in affected programs: 976034 -> 963532 (-1.28%)
helped: 521
HURT: 13
Bytes are helped.
total halfregs in shared programs: 474245 -> 474094 (-0.03%)
halfregs in affected programs: 1869 -> 1718 (-8.08%)
helped: 28
HURT: 7
Halfregs are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
Apple doesn't do this, but it should be equivalent and it makes it easier to see
that we can use while_icmp for break_if_icmp in loops that don't use continue
(which Apple does do). So, the effect of this commit is to use while_icmp for
most breaks, which saves an instruction.
total instructions in shared programs: 1764199 -> 1764076 (<.01%)
instructions in affected programs: 24149 -> 24026 (-0.51%)
helped: 78
HURT: 0
Instructions are helped.
total bytes in shared programs: 11609306 -> 11608322 (<.01%)
bytes in affected programs: 164604 -> 163620 (-0.60%)
helped: 78
HURT: 0
Bytes are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
The only role of the while_icmp at the end of a NIR loop is to make continue
jumps work. If, after emitting the loop, we learn that there are no continues,
there is no need to insert a while_icmp since it would be a no-op anyway.
total instructions in shared programs: 1764311 -> 1764199 (<.01%)
instructions in affected programs: 26321 -> 26209 (-0.43%)
helped: 82
HURT: 0
Instructions are helped.
total bytes in shared programs: 11609978 -> 11609306 (<.01%)
bytes in affected programs: 178842 -> 178170 (-0.38%)
helped: 82
HURT: 0
Bytes are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
In general, loops need a push_exec at the start for correctness. However, a
push_exec at the top level (non-nested) is a no-op, so we can omit and save a
few cycles.
total instructions in shared programs: 1764350 -> 1764311 (<.01%)
instructions in affected programs: 7339 -> 7300 (-0.53%)
helped: 36
HURT: 0
Instructions are helped.
total bytes in shared programs: 11610212 -> 11609978 (<.01%)
bytes in affected programs: 48638 -> 48404 (-0.48%)
helped: 36
HURT: 0
Bytes are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
Search for code like
if ... {
break
}
and replace with a break_if pseudo-instruction for optimized handling, since the
break_if lowering is better than the original code.
total instructions in shared programs: 1764596 -> 1764350 (-0.01%)
instructions in affected programs: 24540 -> 24294 (-1.00%)
helped: 78
HURT: 0
Instructions are helped.
total bytes in shared programs: 11611196 -> 11610212 (<.01%)
bytes in affected programs: 166458 -> 165474 (-0.59%)
helped: 78
HURT: 0
Bytes are helped.
shader-db probably understates the benefit here, since this optimizes the body
of loops.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
We use it for two different things. Pseudo-instructions are cheap, split it up
for easier optimization passes. This also fixes the schedule classes.. we can
move the cf_begin around if we want, it's inert.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
They're more trouble than they're worth for us. They were originally lifted
unthinkingly from ACO, where I assume they're necessary for software CF
lowering, but they're just an inconvenient convenience for us. Remove em.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
If we bound 4 render targets but we only write to 1 of them, the other 3 need
their contents preserved. This requires either properly configuring HSR to
implement colour masking (TODO) or using the big hammer of setting TRANSLUCENT.
This patch picks the latter for now.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
The debug flags are already plumbed into driver_flags for the disk
cache, so we just need to actually allow some flags instead of bailing
out of the disk cache init.
We only care about no16 for production right now, and it's probably a
good idea to disable disk caching during most debug sessions, so
allowlist only that one.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
There are way too many broken WebGL apps using the wrong precision
qualifiers, which causes anything from jittery geometry to complete
breakage (e.g. QuakeJS and other games).
In addition, a Firefox bug is breaking basic canvas rendering for the
same reason (mozilla bug #1845309).
Let's just disable fp16 for browsers. There is no hope of getting all
this broken stuff fixed.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
This is the driconf equivalent of our debug no16 flag, which disables
fp16 support to work around apps using bad GLSL precision qualifiers.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
It's time to start using some of these, so add the required scaffolding
to be able to have driver-specific driconf handling for us.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
This kills the hypervisor, let's just print and return.
Also flush after decoding, so that if something else goes wrong at least
we get the logs up to that point.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
Since we register allocate in SSA, the number of registers required (register
demand) equals to the maximum number of simultaneous live values (register
pressure). So if we can reduce register pressure, we are guaranteed to reduce
register demand. Even an ineffective heuristic like randomly swapping
instructions can only reduce pressure as long as it's conservative.
This implements one such heuristic: in each block, schedule backwards, selecting
the free instruction that looks like it will reduce liveness the most. In other
words, the greedy algorithm to reduce register pressure. At the end of the
block, if we haven't actually reduced pressure, we bail. This isn't optimal, but
it's well-motivated and optimally handles special cases (like 0-source
instructions).
This is based on the scheduler I originally wrote for Mali.
In my Dolphin ubershader branch, this improved performance at native 4K by 10fps
(105fps->115fps) when I measured together with some other optimizations. On top
of my current next (which notably includes nir_opt_sink improvements), this
commit alone goes (53fps->54fps) which is considerably less impressive :-p
shader-db results are a win, but not as large as we might hope. Instruction
count win seems to be from the smaller live ranges being easier on RA (fewer
swaps / moves). The two shaders affected for thread count are from fifa mobile,
which go from 640 threads ->
1024 (full occupancy). In other words... this heuristic does an excellent job in
a small subset of shaders. The Dolphin ubershader win was real, though :~)
Note these shader-db wins are on top of a branch with the nir_opt_sink
improvements. Without that, the stats are much better... The schedulers have
some overlap, but they're better together.
total instructions in shared programs: 1766635 -> 1763496 (-0.18%)
instructions in affected programs: 445855 -> 442716 (-0.70%)
helped: 1963
HURT: 350
Instructions are helped.
total bytes in shared programs: 11597648 -> 11586924 (-0.09%)
bytes in affected programs: 3106230 -> 3095506 (-0.35%)
helped: 2003
HURT: 374
Bytes are helped.
total halfregs in shared programs: 504609 -> 481980 (-4.48%)
halfregs in affected programs: 138322 -> 115693 (-16.36%)
helped: 3405
HURT: 311
Halfregs are helped.
total threads in shared programs: 18839936 -> 18840704 (<.01%)
threads in affected programs: 1280 -> 2048 (60.00%)
helped: 2
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>