Clover uses very specific sizes and alignments for images and samplers
to pass various bits of data. We need to add a new size/align helper
for inputs which matches the standard CL size/align for most types but
also has the right size/align for images and samplers.
v2 (Karol): use sizeof(cl_mem) instead of 8 to fix 32 bit runtimes.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7069>
Read images are really just normal textures and OpenCL requires 128 anyway.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7069>
This is required by llvmpipe as it sets explicit strides on buffers and
textures.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7069>
v2 (Karol Herbst): remove filling values as it was incorrect.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7069>
From CL1.2 Section 5.3.2:
returns CL_INVALID_VALUE ... if num_entries is 0 and image_formats is not NULL.
We were checking the num_image_formats param, not num_entries.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7069>
Long term we want to git rid of set_compute_resources, this is just one
piece. The other bit would be to use the proper const buffer interfaces,
but because that path is only hit with r600/radeonsi I won't touch it.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7069>
Now that we have the HDC, using the data cache for UBO pulls seems to
help things quite a bit:
GTA V DXVK 104.0%
Talos Principle GL 102.8%
Rise of Tomb Raider VK 102.8%
Dark Souls 3 DXVK 101.4%
Witcher3 DXVK 101.3%
Bioshock Infinite GL 100.5%
Doom 2016 VK 97.7%
Doom is a bit of a loss but it helps enough other stuff, it's probably
worth the hit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7230>
Both commits add nir_to_tgsi.c to a different variable, causing a
build-time error when compiling in an AOSP tree:
build/make/core/binary.mk:970: error: overriding commands for target `..../obj/STATIC_LIBRARIES/libmesa_gallium_intermediates/nir/nir_to_tgsi.o', previously defined at build/make/core/binary.mk:970
Move all sources into NIR_SOURCES to resolve this issue.
Fixes: d0f8fe5909 ("softpipe: Switch to using NIR as the shader format from mesa/st.")
Fixes: 34cc6a804e ("gallium: Add a nir-to-TGSI pass.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7237>
This causes our TGSI to use far more temps, since NTT is currently not
releasing temps from registers. On the other hand, this interpreter is
already spectacularly slow, and if we wanted to go fast we should probably
write a scalar NIR intrepeter.
For now, using NTT means that we test that codepath in preparation for
switching TGSI-consuming HW drivers over, so that we can eventually
garbage collect st_glsl_to_tgsi.
As this is a major restructuring, there are some impacts on piglit:
- Several tests start assert failing about 64-bit NIR registers for temp
arrays not getting split to vec2s:
- fs-frexp-dvec4-variable-index.shader_test
- arb_gpu_shader_fp64/uniform_buffers/{vs,fs,gs}-array-copy.shader_test
- arb_gpu_shader_int64/execution/indirect-array-two-accesses.shader_test
- dEQP-GLES31.functional.primitive_bounding_box.wide_points.global_state.vertex_geometry_fragment.fbo_bbox_larger
starts crashing depending on various bits of state (previous tests run
before it, presence of valgrind, presence of glib's memcheck). Doesn't
seem really NTT-specific, added to flakes list with other GS flakes.
- Almost 200 fp64/int64-related tests start passing, mostly around i/o loayout.
shader-db:
total instructions in shared programs: 3492656 -> 3081674 (-11.77%)
total loops in shared programs: 1418 -> 1387 (-2.19%)
total temps in shared programs: 340041 -> 615527 (81.02%)
total const in shared programs: 3158970 -> 1528630 (-51.61%)
total imm in shared programs: 117586 -> 101349 (-13.81%)
Total CPU time (seconds): 430.36 -> 900.94 (109.35%)
FPS results:
glmark2 texture +7.32484% +/- 3.76528% (n=10)
glmark2 desktop:effect=shadow +20% +/- 0% (n=10)
glmark2 shadow +6.49351% +/- 3.65335% (n=7)
glmark2 conditionals +18.75% +/- 2.74658% (n=9)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3395>
The goal is to replace glsl_to_tgsi.cpp and its supporting code (~10k
LOC). This code ends up being smaller because NIR has many lowering
passes that help it map better to TGSI than GLSL IR does.
As a benefit, this brings NIR optimizations to TGSI-only drivers.
Many of the softpipe shaders I've looked at end up being significantly
shorter. Some potentially relevant changes to TGSI consumers:
- All immediates are now UINT typed. This means they're less legible
in printouts, but means that they get deduplicated better (no more
multiple copies of 0x0!)
- Sampler views are not currently declared.
- nir_registers don't have their live ranges tracked, so TGSI temp usage
may go up with a lot of control flow.
- nir_lower_vec_to_mov naively inserts movs instead of trying to coalesce
the movs with the generators of the ssa values, sometimes increasing
instruction count.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3395>
If we lower mem constants first, then direct array accesses to
constants never get lowered, so do a constant fold pass first to
remove direct const array accesses.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7209>
On Gen12+, we can enable additional caches in certain usage situations.
This routes that decision making to a central place in ISL, based on
surface usage flags, and updates both drivers to use it. (i965 doesn't
need to change because it doesn't support Gen12.)
We continue handling the "external" decision via an anv_mocs() wrapper
for now, since we store that flag in anv_bo, which isl doesn't know
about. (We could introduce an ISL_SURF_USAGE_EXTERNAL, but I'm not
actually sure that would be cleaner.)
This patch should not have any functional nor performance effects, as
we continue selecting the exact same MOCS values for now.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7104>
Zink doesn't support forwarding DRM modifiers yet, so whenever those are
used, we end up ignoring them. That's not going to do the right thing in
most cases, so let's reject them instead.
Since d686835171, the dri2 code tries to create a 0x0 surface without
any format when trying to import. This makes this go from
rendering-issues to asserting in debug builds, making things even worse.
Fixes: d686835171 ("gallium/dri2: Support I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS import")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3654
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7214>
There is a barrier between lds_pos_cull_* use and lds_pos_* use, so they
can occupy the same LDS space.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7172>
The system values are written into LDS after the new thread ID is known,
so it removes pointer indirection with the old thread ID.
Also, the LDS stores are skipped entirely if vertices are culled.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7172>
If we store the position into LDS after we know the new thread ID,
we don't need to remember the old thread ID.
The culling code only needs W, X/W, Y/W, so we have to keep those.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7172>
other than the vaguely gross case of primitive restart with incompatible
draw modes and/or restart index, this is no trouble since the buffer formats
are compatible
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7191>
we've transformed this in the vertex output, so we need to undo that here
ideally we'd only be performing this transform once, but that's going to get
complicated later with the halfz extension which requires shader keys on top
of this, so we can get around to simplifying things at that stage
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7139>