Instead of storing a BO and offset separately, use an anv_address. This
changes anv_fill_buffer_surface_state to use anv_address and we now call
anv_address_physical and pass that into ISL.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
This refactors surface state filling to work entirely in terms of
anv_addresses instead of offsets. This should make things simpler for
when we go to soft-pin image buffers. Among other things,
add_image_view_relocs now only cares about the addresses in the surface
state and doesn't really need the image view anymore.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
These will be used to assign virtual addresses to soft pinned
buffers in a later patch.
Two allocators are added for separate 'low' and 'high' virtual
memory areas. Another alternative would have been to add a
double-sided allocator, which wasn't done here just because it
didn't appear to give any code complexity advantages.
v2 (Scott Phillips):
- rename has_exec_softpin to use_softpin (Jason)
- Only remove bottom one page and top 4 GiB from virt (Jason)
- refer to comment in anv_allocator about state address + size
overflowing 48 bits (Jason)
- Mention hi/lo allocators vs double-sided allocator in
commit message (Chris)
- assign state pool memory ranges statically (Jason)
v3 (Jason Ekstrand):
- Use (LOW|HIGH)_HEAP_(MIN|MAX)_ADDRESS rather than (1 << 31) for
determining which heap to use in anv_vma_free
- Only return de-canonicalized addresses to the heap
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
The test pseudo-randomly makes allocations and deallocations with
the virtual memory allocator and checks that the results are
consistent. Specifically, we test that:
* no result from the allocator overlaps an already allocated range
* allocated memory fulfills the stated alignment requirement
* a failed result from the allocator could not have been fulfilled
* memory freed to the allocator can later be allocated again
v2: - fix if() in test() to actually run fill()
v3: - add c++11 build flag (Jason)
- test the full 64-bit range (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is simple linear-walk first-fit allocator roughly based on the
allocator in the radeon winsys code. This allocator has two primary
functional differences:
1) It cleanly returns 0 on allocation failure
2) It allocates addresses top-down instead of bottom-up.
The second one is needed for Intel because high addresses (with bit 47
set) need to be canonicalized in order to work properly. If we allocate
bottom-up, then high addresses will be very rare (if they ever happen).
We'd rather always have high addresses so that the canonicalization code
gets better testing.
v2: - [scott-ph] remove _heap_validate() if NDEBUG is defined (Jordan)
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Tested-by: Scott D Phillips <scott.d.phillips@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This adds a RADV_DEBUG=startup option to dump more info about
instance creation and device enumeration.
A common question end users have is why the direver is not loading
for them, and this has two common reasons:
1) They did not install the driver.
2) AMDGPU is not used for the card in the kernel.
This adds some info messages so we can easily get a some useful
output from end users.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Errors are not that common of a case so we can eat a slight perf
hit in having to call a function and do a runtime check.
In turn this makes debugging random errors happening for end users
easier, because they don't have to have a debug build on hand.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Totals from affected shaders:
SGPRS: 80 -> 80 (0.00 %)
VGPRS: 48 -> 48 (0.00 %)
Code Size: 2120 -> 2096 (-1.13 %) bytes
Max Waves: 16 -> 16 (0.00 %)
Only two Rise of Tomb Raider shaders are affected on my side.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Patch skips useless and possibly dangerous calls down to the driver
in case invalid arguments were given. I noticed this would be happening
with demo of Darwinia game. AFAIK this does not fix anything but makes
this path safer and more like how other API functions are implemented.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
CXXLD gallium_dri.la
../../../../src/broadcom/.libs/libbroadcom.a(clif_dump.o): In function `clif_dump_packet':
src/broadcom/clif/clif_dump.c:87: undefined reference to `v3d33_clif_dump_packet'
src/broadcom/clif/clif_dump.c:85: undefined reference to `v3d41_clif_dump_packet'
../../../../src/broadcom/.libs/libbroadcom.a(clif_dump.o): In function `clif_process_worklist':
src/broadcom/clif/clif_dump.c:140: undefined reference to `v3d41_clif_dump_gl_shader_state_record'
src/broadcom/clif/clif_dump.c:144: undefined reference to `v3d33_clif_dump_gl_shader_state_record'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes this use all 32 bits, so future sets need to be
defined in a new struct.
Reviewed-by: Jakob Bornecrantz <jakob@collabora.com>
Signed-off-by: Jakob Bornecrantz <jakob@collabora.com>
This avoids loop unrolling regressions in Wolfenstein II on DXVK
with an upcoming optimisation series from Samuel.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The current implementation depends on bpermute, which
is VI+.
Fixes: f2c6a55061 "radv: enable subgroup capabilities"
Reviewed-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This was terribly wrong, I forced use of 32-bit pointers when
emitting shader descriptor pointers. This fixes GPU hangs with
LLVM 5&6 because 32-bit pointers are only supported with LLVM 7.
Fixes: 88d1ed0f81 ("radv: emit shader descriptor pointers consecutively")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: f7604d8af5 ("st/dri: only expose config formats that are display targets")
Cc: "18.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Gallium drivers don't expose this yet due to:
"st/mesa: use PIPE_CAP_GLSL_FEATURE_LEVEL_COMPATIBILITY"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This requires layered FBOs from GL 3.2.
Gallium drivers don't expose this yet due to:
"st/mesa: use PIPE_CAP_GLSL_FEATURE_LEVEL_COMPATIBILITY"
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Gallium drivers don't expose this yet due to:
"st/mesa: use PIPE_CAP_GLSL_FEATURE_LEVEL_COMPATIBILITY"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bindless texture handles can be passed via vertex attribs using this type.
They use the double codepath, so don't use st_pipe_vertex_format.
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Bindless texture handles can be passed via vertex attribs using this type.
This fixes a bunch of bindless piglit tests on radeonsi.
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
I disliked removing the const here, function tables are meant
to be const just to avoid having to think about them,
make a second table for the shm vs non-shm paths to use.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Implements putImageShm from DRIswrastLoaderExtension.
If XShm extension is not available, or fails, it will fallback on
regular XPutImage().
Tested on Linux only with 16bpp and 32bpp visual.
(airlied: tested on 24bpp as well)
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
If drisw_loader_funcs implements put_image_shm, allocates display
target data with shared memory and display with put_image_shm().
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
If the DRIswrastLoaderExtension implements putImageShm, bind it to
drisw_loader_funcs.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Add new API to put and get an image using shared memory. Instead of only
passing the data pointer, 3 arguments are given: the shmid, the data
offset and the shmaddr.
Bump interface version.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
This just renames this as we want to add an shm handle which
isn't really drm related.
Originally by: Marc-André Lureau <marcandre.lureau@gmail.com>
(airlied: I used this sed script instead)
This was generated with:
git grep -l 'DRM_API_' | xargs sed -i 's/DRM_API_/WINSYS_/g'
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When using multiple RT write messages to the same RT such as for
dual-source blending or all RT writes in SIMD32, we have to set the
"Last Render Target Select" bit on all write messages that target the
last RT but only set EOT on the last RT write in the shader.
Special-casing for dual-source blend works today because that is the
only case which requires multiple RT write messages per RT. When we
start doing SIMD32, this will become much more common so we add a
dedicated bit for it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The only reason it was it's own opcode was so that we could detect it
and adjust the source register based on the payload setup. Now that
we're using the ATTR file for FS inputs, there's no point in having a
magic opcode for this.
v2 (Jason Ekstrand):
- Break the bit which removes the CINTERP opcode into its own patch
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This replaces the special magic opcodes which implicitly read inputs
with explicit use of the ATTR file.
v2 (Jason Ekstrand):
- Break into multiple patches
- Change the units of the FS ATTR to be in logical scalars
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2 (Jason Ekstrand):
- Break the refactor into its own patch
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>