We were aiming for very square tiles, but it's actually better for us to
reduce the number of different bins so you take fewer trips through the
geometry and keep the caches hotter. Example changes to aztec ruins on
angle:
3x3 tiles of 352x352 to 4x2 tiles of 256x512
4x5 tiles of 256x224 to 5x4 tiles of 224x256
17x11 tiles of 160x128 to 14x11 tiles of 192x128
12x7 tiles of 224x224 to 7x11 tiles of 384x128
12x8 tiles of 224x192 to 7x11 tiles of 384x128
11x6 tiles of 256x256 to 12x5 tiles of 224x288
11x7 tiles of 256x224 to 7x9 tiles of 384x160
8x4 tiles of 352x352 to 6x5 tiles of 448x288
and minecraft:
3x3 tiles of 352x352 to 4x2 tiles of 256x512
12x6 tiles of 256x256 to 3x23 tiles of 1024x64
12x7 tiles of 256x224 to 8x9 tiles of 384x160
FPS changes:
VK aztec ruins normal: 1.12478% +/- 0.213393% (n=67)
ANGLE manhattan_31: +1.42813% +/- 0.893332% (n=7).
ANGLE minecraft: no change (n=21)
ANGLE google_maps: +6.80618% +/- 2.40857% (n=4)
ANGLE trex_200: no change (n=11)
ANGLE pubg: no change (n=21)
Fixes: #8160
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21004>
while lowering image_samples to one, we need to take
nir_intrinsic_image_deref_samples and
nir_intrinsic_bindless_image_samples intrinsic into account.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8211
Fixes: ab4c2990ed ("intel/compiler: use lower_image_samples_to_one")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21053>
Submitting a batch with the first command buffer with the simultaneous
bit set followed by a command buffer without the bit set gets past the
check and triggers this assert attempting to chain them:
../src/intel/vulkan/anv_batch_chain.c:1147: anv_cmd_buffer_chain_command_buffers: Assertion `num_cmd_buffers == 1' failed.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21056>
This file hasn't really been updated since 2016, apart from a single
search-replace two years ago.
That's an eternity in ANV-land, so let's just remove these.
While we're at it, also remove the duplicate in hasvk.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21044>
etna_resource_from_handle() is called for each plane of a multiplanar
resource, so there is no point in looping over all planes to do the
renderonly scanout import. In fact that will cause us to lose track
of the scanout imports from later planes when the earlier planes are
redoing the import, overwriting the pointer to the allocated
renderonly_scanout struct.
Drop the loop and just do the import for the current plane.
Fixes: 826f95778a ("etnaviv: always try to create KMS side handles for imported resources")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20993>
When all shader stages have already been imported it's possible to
skip radv_graphics_pipeline_compile() entirely. This makes GPL
fast-linking VERY fast.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21068>
Only the blend state was relying on the graphics pipeline key. This
will allow us to skip generating it when there is no compilation at
all (for fast-linking with GPL).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21068>
Otherwise, a noop FS will be always compiled during linking if not
provided by the application and that is too slow for fast-linking.
This should be improved to use a global noop FS but it's really tricky
because NIR linking doesn't do anything when the next stage is unknown,
and hence doesn't remove unused varyings.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21042>
compute programs can be reused across contexts, which means storing any
pointers directly like this is going to lead to desync and crash
instead, make this like regular descriptor templates and calculate the offset
from the current context to ensure that everything works as it should
fixes#8201
Fixes: 7ab5c5d36d ("zink: use EXT_descriptor_buffer with ZINK_DESCRIPTORS=db")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21020>
All the ifdef __APPLE__ is getting really silly. Let's split off the
macOS UAPI abstraction into its own file, so we can have parallel
implementations.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21058>
In preparation for splitting off the macOS backend implementation into
its own file, pull out the shared BO code from agx_device.c into
agx_bo.c.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21058>
So we're not tied to the macOS or Linux UAPIs and are not translating awkwardly
from one to the other when creating BOs. They're not quite equivalent -- macOS
doesn't include writeback information in this flag field, and Linux doesn't have
a executable flag. (Maybe we should add one, though? Then we can enforce W^X.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21058>
Quoting a condition is apparently an effective way of working around
YAML parsing weirdness. However, the quotes need to surround the whole
expression, not just parts of it.
Fixes: f6c06ef2f6 ("ci: Add manual rules variations to disable.")
Suggested-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21036>
Lowering buffer textures will interact with multiple of our existing lowerings,
and it's convenient to have it all in one place. This also keeps the pass
ordering dependencies centralized.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21060>
the original code was quite conservative and always created a new layout,
but many times this is unnecessary, and the original layout can just be refcounted
since it doesn't need to be merged
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21051>
On sway+xwayland, both explicit and implicit modifiers are advertised.
While dri3proto says nothing about it, zwp_linux_dmabuf_v1 says
A compositor that sends valid modifiers and DRM_FORMAT_MOD_INVALID for
a given format supports both explicit modifiers and implicit
modifiers.
"glmark2 -b build:model=bunny --fullscreen" goes from 468 to 598fps on
a618 @ 2160x1440.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20892>
We lower NIR's load_constant to load_global_constant, which uses A64
bindless messages. As such, we do the following math to produce the
address for each load:
base_lo@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_LOW
base_hi@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_HIGH
base@64 <- pack_64_2x32_split(base_lo, base_hi)
addr@64 <- iadd(base@64, u2u64(offset@32))
On platforms that emulate 64-bit math, we have to emit additional code
for the 64-bit iadd to handle the possibility of a carry happening and
affecting the top bits.
However, NIR constant data is always uploaded adjacent to the shader
assembly, in the same buffer. These buffers are required to live in a
4GB region of memory starting at Instruction State Base Address. We
always place the base address at a 4GB address. So the constant data
always lives in a buffer entirely contained within a 4GB region, which
means any offsets from the start of the buffer cannot possibly affect
the high bits.
So instead, we can simply do a 32-bit addition between the low bits of
the base and the offset, then pack that with the unchanged high bits.
On anv, INSTRUCTION_STATE_POOL_MIN_ADDRESS is 8GB, so the high bits are
always 0x2. We don't even need to patch that portion of the address and
can just use an immediate value. We do still need to pack, however.
fossil-db on Icelake indicates the following for affected shaders:
Instrs: 10830023 -> 10750080 (-0.74%)
Cycles: 1048521282 -> 1046770379 (-0.17%); split: -0.33%, +0.16%
Subgroup size: 103104 -> 103112 (+0.01%)
Send messages: 570886 -> 570760 (-0.02%)
Loop count: 14428 -> 14429 (+0.01%)
Spill count: 14246 -> 14244 (-0.01%); split: -0.06%, +0.04%
Fill count: 22802 -> 22794 (-0.04%); split: -0.04%, +0.01%
Scratch Memory Size: 654336 -> 662528 (+1.25%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20999>
We lower NIR's load_constant to load_global_constant, which uses A64
bindless messages. As such, we do the following math to produce the
address for each load:
base_lo@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_LOW
base_hi@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_HIGH
base@64 <- pack_64_2x32_split(base_lo, base_hi)
addr@64 <- iadd(base@64, u2u64(offset@32))
On platforms that emulate 64-bit math, we have to emit additional code
for the 64-bit iadd to handle the possibility of a carry happening and
affecting the top bits.
However, NIR constant data is always uploaded adjacent to the shader
assembly, in the same buffer. These buffers are required to live in a
4GB region of memory starting at Instruction State Base Address. We
always place the base address at a 4GB address. So the constant data
always lives in a buffer entirely contained within a 4GB region, which
means any offsets from the start of the buffer cannot possibly affect
the high bits.
So instead, we can simply do a 32-bit addition between the low bits of
the base and the offset, then pack that with the unchanged high bits.
On iris, IRIS_MEMZONE_SHADER is at [0, 4GB) so the high bits are always
zero. We don't even need to patch that portion of the address and can
simply use u2u64 to promote the 32-bit add result to a 64-bit value
where the top bits are 0.
shader-db on Icelake indicates that this:
- Helps instructions: -1.13% in 135 affected programs
- Helps spills/fills: -4.08% / -4.18% in 4 affected programs
- Gains us 1 SIMD16 compute shader instead of SIMD8
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20999>
People who use RADV on eGPU have reported poor performance by default.
They also noted that the "nosam" option helps.
This commit disables placing CS objects in VRAM when the bandwidth is
below that of PCIe 3.0 x8. Note that eGPUs are typically PCIe 3.0 x4.
Contributes-to: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7340
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20842>
This is so that we can tell whether the current kernel
has the PCIe bandwidth info available or not.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20842>