Beneficial so we can fuse the comparison.
total instructions in shared programs: 2188179 -> 2185535 (-0.12%)
instructions in affected programs: 392512 -> 389868 (-0.67%)
helped: 894
HURT: 9
Instructions are helped.
total alu in shared programs: 1706063 -> 1703445 (-0.15%)
alu in affected programs: 275063 -> 272445 (-0.95%)
helped: 880
HURT: 9
Alu are helped.
total fscib in shared programs: 1702385 -> 1699743 (-0.16%)
fscib in affected programs: 276199 -> 273557 (-0.96%)
helped: 894
HURT: 9
Fscib are helped.
total ic in shared programs: 462494 -> 462490 (<.01%)
ic in affected programs: 124 -> 120 (-3.23%)
helped: 1
HURT: 0
total bytes in shared programs: 14476964 -> 14464512 (-0.09%)
bytes in affected programs: 2870824 -> 2858372 (-0.43%)
helped: 888
HURT: 155
Bytes are helped.
total regs in shared programs: 662444 -> 662461 (<.01%)
regs in affected programs: 1025 -> 1042 (1.66%)
helped: 14
HURT: 12
Inconclusive result (value mean confidence interval includes 0).
total uniforms in shared programs: 1638301 -> 1638374 (<.01%)
uniforms in affected programs: 17778 -> 17851 (0.41%)
helped: 22
HURT: 54
Uniforms are HURT.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29179>
It can't be reordered globally, since its value is control-flow dependent.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29179>
There seems to be a hardware issue where fragment shaders with side effects get
skipped if depth testing with NEVER. Add a workaround for this case where we
discard programmatically instead.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29179>
AGX has limited border colour hardware. To support full
customBorderColorWithoutFormat semantics, we're forced to emulate in shaders at
a substantial performance penalty. Actually, that's needed just to pass CTS
because of other hardware issues stacking on top of each others... Hooray!
Add the texops we need to facilitate efficient custom border colour lowering.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29179>
needed for correct flatshading with fans, without falling back on software input
assembly.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29179>
This fixes most of the egl_image_dma_buf* piglit tests. The remaining
fails are YCbCr tests which are likely unrelated to core dma-buf
import/export.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24795>
This is our compromise to make NVK and nouveau GL play nice when it
comes to modifiers. The old GL driver depends heavily on the PTE kind
and tile mode, even for images with modifiers. While it correctly
encodes the PTE kind and tile mode in the modifiers it advertises, it
may ignore the modifier and just trust what's set on the BO when it
imports a dma-buf image. This is partly because it doesn't support
VM_BIND and partly because of preexisting bugs in the modifiers
implementation. In either case, we can't fix it retroactively.
To work around this, NVK also sets the PTE kind and tile mode on the BO
when it's a dedicated allocation created for a DRM format modifiers
image. If DRM format modifiers are used without dedicated allocations,
things may still break but that's getting into vanishingly unlikely
scenarios.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24795>
Taken from drm-misc-next-fixes:
commit 959314c438caf1b62d787f02d54a193efda38880
Author: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Date: Thu May 9 23:43:52 2024 +0300
drm/nouveau: use tile_mode and pte_kind for VM_BIND bo allocations
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24795>
No shader-db or fossil-db changes on any Intel platform. In this MR, the
only benefit of these changes is to convert some "-a > 0" CSEL
comparisons to "a < 0" for improved readability.
v2: Add integer CSEL support
v3: Use fs_inst::resize_sources and brw_type_is_sint. Both suggested by
Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
No shader-db or fossil-db changes on any Intel platform. This ends up
begin helpful in "intel/brw: Use range analysis to optimize fsign."
v2: Add integer CSEL support
v3: Massive simplification (-20 lines!) of constant propagation
logic. Suggested by Ken. Add missing CSEL case in supports_src_as_imm.
Noticed by Ken.
v4: While MAD can mix F and HF sources on some platforms, CSEL
cannot. Found by skqp on TGL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v3]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
Gfx9 can only have F, but newer GPUs can have F, HF, *D, or *W. The
source and destination types must still match in size.
v2: Simplify the float vs integer logic. Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
This bit from the comment should have been a big red flag:
There are currently zero instances of fsign(double(x))*IMM in
shader-db or any test suite, so it is hard to care at this time.
The implementation of that path was incorrect. The XOR instructions
should be predicated like the OR instruction in the non-multiplication
path. As a result, dsign(zero_value) * x will not produce the correct
result.
Instead of fixing this code that is never exercised by anything, replace
it with the simple lowering in NIR.
Ironically, the vec4 implementation is correct. The odds of encountering
an application that is performace limited by dsign performance in vertex
processing stages on Ivy Bridge or Haswell is infinitesimal.
No shader-db changes on any Intel platform.
v2: Delete 's' in emit_fsign as it is now unused.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
This bit from the comment should have been a big red flag:
There are currently zero instances of fsign(double(x))*IMM in
shader-db or any test suite, so it is hard to care at this time.
The implementation of that path was incorrect. The XOR instructions
should be predicated like the OR instruction in the non-multiplication
path. As a result, dsign(zero_value) * x will not produce the correct
result.
Instead of fixing this code that is never exercised by anything, replace
it with the simple lowering in NIR.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>