It turns out that BLEND_MIN and BLEND_MAX in Utgard take blend factors
into account. My guess is that actual equation looks like:
OP(As * S + Ad * D, Ad) for alpha, and
OP(Cs * S + Cd * D, Cd) for color.
So we have to set S factor to 1 and D factor to 0 to be compliant with
GL spec.
Fixes following piglit tests:
spec@!opengl 1.4@blendminmax
spec@arb_blend_func_extended@arb_blend_func_extended-fbo-extended-blend
(with patch my for ES2_compatibility and EXT_blend_func_extended)
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13873>
It was a bit trickier to RE, since blob doesn't expose this
functionality at all, however we had a clue from the very beginning:
lima_blend_factor is 3 bits, i.e. 8 values, but only 5 of them were
used, it just waited till someone tried what 3 unused values do.
Interestingly enough, it turns out "5" works just as "0" (which is
PIPE_BLENDFACTOR_*SRC_*), but only if output register for gl_FragColor
is $0, So it looks suspiciously similar with PIPE_BLENDFACTOR_*SRC1_*
behavior, and looks like secondary output is taken from $0.
Since output regs for all other outputs are configured via RSW, there
must be a field in RSW for output register for secondary color, it's
likely 4 bits and it's currently set to 0 for reg $0.
Then it was just a matter of brute-forcing various consecutive 4 bits
in RSW - and indeed, setting top 4 bits of rsw->aux0 to the index of
gl_FragColor output register fixes blending tests when we use "5"
blend factor instead of "0".
So it must be a register number for gl_SecondaryFragColor. Unlike
gl_FragColor, the field is only repeated once in RSW.
Wire it up in compiler, and piglit arb_blend_func_extended now passes.
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13873>
It's needed by _mesa_half_to_float(), without this change it hits
assertion failure in util_get_cpu_caps().
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13968>
The big comment was not really true.
The other debug options are unused right now, but will be used again
in the future.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13878>
We don't set them and we don't read them if they are disabled, so don't
print them either. This silences valgrind warnings.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13878>
to also get rid of the additional function that I introduced before.
Fixes: 82b261417e ("util/cpu_detect: Add flag for IBM Z (s390x)")
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13958>
Mali4x0 supports writing depth and stencil from fragment shader
and we've been using it quite a while for depth/stencil buffer reload.
The missing part was specifying output register for depth/stencil.
To figure it out, I changed reload shader to use register $4 as output
and poked RSW bits (or rather consecutive 4 bit groups) until tests
that rely on reload started to pass again.
It turns out that register number for gl_FragDepth/gl_FragStencil is in
rsw->depth_test and register number for gl_FragColor is in
rsw->multi_sample and it's repeated 4 times for some reason (likely for
MSAA?)
With this knowledge we now can modify ppir compiler to support multiple
store_output intrinsics.
To do that just add destination SSA for store_output to the registers
list for regalloc and mark them explicitly as output. Since it's never
read in shader we have to take care about it in liveness analysis -
basically just mark it alive from the time when it's written to the end
of the block. If it's live only in the last instruction, mark it as
live_internal, so regalloc doesn't clobber it.
Then just let regalloc do its job, and then copy register number to the
shader state and program it in RSW.
The tricky part is gl_FragStencil, since it resides in the same register
as gl_FragDepth and with the current design of the compiler it's hard to
merge them. However gl_FragStencil doesn't seem to be part of GL2
or GLES2, so we can just leave it not implemented.
Also we need to take care of stop bit for instructions - now we can't
just set it in every instruction that stores output, since there may be
several outputs. So if there's any store_output instructions in the
block just mark that block has a stop, and set stop bit in the last
instruction in the block. The only exception is discard - we always need
to set stop bit in discard instruction.
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13830>
We can't insert mul node into add node instruction if it's a virtual dep
(sequence or write_or_read dep), so use ppir_node_has_single_src_succ
in addition to ppir_node_has_single_succ.
We can't use ppir_node_has_single_src_succ alone, since node may have
a virtual dependency in addition to source dependency, and we can't
insert it either in this case.
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13830>
The function need_temp_reg_initialization looks suspecious.
It will only ever return true if we get past this if:
if (!(emit->info.indirect_files && (1u << TGSI_FILE_TEMPORARY)) ...
Using the logical && means the intended initialization done
based on the result of this check is not performed.
This code was both introduced and altered in MR 5317.
ccb4ea5a introduces the function.
ba37d408 is a collection of performance improvements and misc
fixes. This altered the if from using bitwise to logical and.
This commit changes it back to bitwise.
Spotted from a compile warning.
Fixes: ba37d408da ("svga: Performance fixes")
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12157>
LLVM has all the required intrinsics available on IBM Z, so use them for
rounding operations (they will be implemented as a single instruction).
This change makes the test case lp_test_arit pass, because it avoids
using the buggy generic code.
v2: update .gitlab-ci/cross-xfail-s390x to reflect passing lp_test_arit
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13927>
Add a new command associated to glLinkProgram. With this we should be
able to compile and link shaders when requested by the user, thus
avoiding that to happen in the middle of a frame.
Together with the command we pass an array of shader handles attached to
the program, where each position of the array corresponds to a pipe
shader type.
Signed-off-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13674>
Allow drivers to register a callback for when a shader program is linked.
Signed-off-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13674>
The float value may be out of range, so must be clamped to the allowed
range. Unclear if a3xx also has a SNORM factor that we're just missing
there, but that will be a separate investigation.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13903>
Unlike with regular textures, we really have to support all the formats
directly for TBOs to work properly. Add the missing formats to fix
arb_texture_buffer_object-formats piglit.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13906>
Otherwise some of these fall back to RGBA_SNORM, which can screw up
blend factors.
Fixes spec@ext_texture_snorm@fbo-blending-formats.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13904>
It already does compaction, so we just need to load vertex positions
and cull. This was easier than expected.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13829>
si_llvm_translate_nir() changes ctx.stage, so the outside code shouldn't
use it. This hasn't caused any issues yet. Since ctx.stage starts as 0,
the first use in this commit was a tautology.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13829>
Unlike Linux dma-bufs, D3D12 resources are strongly typed, and
can't necessarily just reinterpret the memory arbitrarily.
Allow importing resources with no description coming from the frontend,
and populate the resource desc from the driver instead. If there was
a template, make sure that it matches the incoming resource.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13054>