LLVM 11 removes the intrinsic for half to float conversion, so use the fpext
function instead. This function actually works now with half to float, albeit
a quick experiment showed at least the x86 backend cannot lower it itself if
the cpu doesn't support it natively and tries to call external library, which
crashes (and would be very slow anyway as it would be lowered to scalar code),
so for now only use it where we previously used the f16c intrinsic.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2603
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4800>
The SFID field of the SHADER_OPCODE_MEMORY_FENCE and
SHADER_OPCODE_INTERLOCK instructions now indicates the target function
of the memory fence. Account the cycle-count cost to the right shared
unit.
Fixes: f858fa26b4 ("intel/fs,vec4: Pull stall logic for memory fences up into the IR")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4817>
the implicit sync flag gets set at the beginning at the function,
but I used = instead of |= later.
Fixes: bec9285027 "radv: Stop using memory type indices."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4814>
BITSET_FOR_EACH_SET can walk a sparse set (such as a register class's set
of registers) much faster than just iterating over individual bits.
Improves freedreno startup time (as measured by shader-db ./run
shaders/closed/gputest/triangle on my x86 system) by -4.12679% +/-
1.99006% (n=151)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>
This make the code significantly more readable, I think (along with
shorter). Also, using util_dynarray_delete_unordered() saves us a move of
the rest of the list when removing adjacency on a node.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>
This patch brings support for Adreno A405
as found on MSM8939. That chip is a cut-down
version of A4XX IP and requires no special handling.
Tested on Asus Zenfone 2 Laser (Z00T) smartphone.
Signed-off-by: Konrad Dybcio <konradybcio@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4753>
this allows passing scissored clear calls through the driver where it can
be handled by a repclear shader
fixkwg/mesa#61
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4310>
this adds a new pipe cap that drivers can support which enables passing buffer
clears with scissor test enabled through to be handled by the driver instead
of having mesa draw a quad
also adjust all existing clear() hooks to have the new parameter
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4310>
Shader-db results on ICL:
total instructions in shared programs: 17133088 -> 17133287 (<.01%)
instructions in affected programs: 61300 -> 61499 (0.32%)
helped: 0
HURT: 199
This means it's likely fixing 199 bugs. :-) All the changed shaders are
in Mad Max. It's surprisingly difficult to get the back-end compiler to
generate a pattern that hits this we don't tend to emit a lot coalescable
MOVs. The pattern in Mad Max that's able to hit is fsign(fsat(x)) under
the right conditions.
Closes: #2820
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4773>