Now that 'set' can't be NULL because the meta operations no
longer bind a NULL descriptor, the logic can be simplified
a little bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
I don't want to waste CPU cycles for nothing.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We mostly use the same priority for all buffer objects, so
I don't think that matter much. This should reduce CPU
overhead a little bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
A ZPASS_DONE or PIXEL_STAT_DUMP_EVENT (of the DB occlusion
counters) must immediately precede every timestamp event to
prevent a GPU hang on GFX9.
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
VkCreateRenderPass2KHR() is quite similar to VkCreateRenderPass()
but refactoring the code is a bit painful.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Needed for VK_KHR_create_renderpass2.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This might fix some synchronization issues. I don't know if
that will affect performance but it's required for correctness.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Looks like I broke intel CI compiles.
Fixes: 6f3aee40f9 (radv: using tls to store llvm related info and speed up compiles (v10))
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
This uses the common compiler passes abstraction to help radv
avoid fixed cost compiler overheads. This uses a linked list per
thread stored in thread local storage, with an entry in the list
for each target machine.
This should remove all the fixed overheads setup costs of creating
the pass manager each time.
This takes a demo app time to compile the radv meta shaders on nocache
and exit from 1.7s to 1s. It also has been reported to take the startup
time of uncached shaders on RoTR from 12m24s to 11m35s (Alex)
v2: fix llvm6 build, inline emit function, handle multiple targets
in one thread
v3: rebase and port onto new structure
v4: rename some vars (Bas)
v5: drag all code into radv for now, we can refactor it out later
for radeonsi if we make it shareable
v6: use a bit more C++ in the wrapper
v7: logic bugs fixed so it actually runs again.
v8: rebase on top of radeonsi changes.
v9: drop some C++ headers, cleanup list entry
v10: use pop_back (didn't have enough caffeine)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
That might reduce CPU overhead a little bit when using
RADV_TRACE_FILE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The number of enabled descriptors for a given pipeline stage
can be computed at compile time.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
pthread types are used in some files without explicitely including pthread.h.
This leads to compile errors on Android 7.x nougat-x86
e.g. in src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.h
In file included from external/mesa/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c:31:
In file included from external/mesa/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.h:32:
external/mesa/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.h:52:2: error: unknown type name 'pthread_mutex_t'
pthread_mutex_t global_bo_list_lock;
^
1 error generated.
Including pthread.h explicitely solves the building error
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In Python 2, `print` was a statement, but it became a function in
Python 3.
Using print functions everywhere makes the script compatible with Python
versions >= 2.6, including Python 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
For merged shaders, VS as HS for example.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If the given image doesn't enable CMASK, FMASK or DCC that's
useless to flush CB metadata.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If the given image doesn't have HTILE, that's useless to flush
DB metadata.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
GCC 4.8 fails to compile with "static const", while GCC 8.1
fails to compile with only "static".
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
On Python 2, the default JSON separators are ', ' for items and ': ' for
dicts.
On Python 3, the default is the same when no indent is specified, but if
one is (and we do specify one) then the default items separator becomes
',' (the dict separator remains unchanged).
This change explicitly specifies the Python 3 default, which helps
ensuring that the output is identical, whether it was generated by
Python 2 or 3.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Always emitting a bottom-of-pipe event is quite dumb. Instead,
start to optimize these functions by syncing PFP for the
top-of-pipe and syncing ME for the post-index-fetch event.
This can still be improved by emitting EOS events for
syncing PS and CS stages.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This introduces radv_barrier() (same as the draw/dispatch codepath).
This helper is used for merging the code from CmdWaitEvents() and
CmdPipelineBarrier because it's quite similar.
We do ignore the source stage mask for CmdWaitEvents because
it's irrelevant when event objects are used.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This ports radv to the shared code, however due to a bug in LLVM
version prior to 7, radv cannot add target info at this stage,
as it would leak one for every shader compile, however I'd prefer
to keep this llvm damage in the shared code, since it isn't the
driver at fault here. We just add a flag to denote if the driver
can support leaking the target info or not, and the common code
does the right thing depending on the llvm version.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is prep work for moving this to a per-thread struct
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This adds a inline always pass, but otherwise should work the
same.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This just splits out the non-shared code and reuses ac_get_llvm_target in radv.
v2: rebase on Marek's patch - fixup brace position/whitespace
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
After performing a fast-clear eliminate, a FMASK decompress,
or a DCC decompress, we can reset the predicate to FALSE.
With that, the GPU should be able to skip unnecessary color
decompression passes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Performing a DCC decompression pass is currently pretty rare,
but using predication allows the GPU to skip unnecessary passes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The Vulkan spec says:
"pipelineBindPoint is a VkPipelineBindPoint indicating whether
the descriptors will be used by graphics pipelines or compute
pipelines. There is a separate set of bind points for each of
graphics and compute, so binding one does not disturb the other."
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Ported from RadeonSI.
This appears to fix some random fails with:
dEQP-VK.query_pool.statistics_query.*
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The driver already supports exporting the stencil value.
The following CTS test now pass:
dEQP-VK.pipeline.shader_stencil_export.op_replace
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
From the Vulkan spec:
"If this is a primary command buffer, then this value is ignored."
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If the driver ends up by performing a slow depthstencil clear,
the HTILE metadata won't be initialized correctly.
This fixes random VM faults on Polaris while running CTS
with Bas's runner. This doesn't seem to regress performance.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This extension provides fences and frame count information to direct
display contexts. It uses new kernel ioctls to provide 64-bits of
vblank sequence and nanosecond resolution.
v2:
Rework fence integration into the driver so that waiting for
any of a mixture of fence types (wsi, driver or syncobjs)
causes the driver to poll, while a list of just syncobjs or
just driver fences will block. When we get syncobjs for wsi
fences, we'll adapt to use them.
v3: Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace between
types and names. Wrap lines to 80 columns.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v4: Adapt to WSI fence API change. It now returns VkResult and
no longer has an option for relative timeouts.
v5: wsi_register_display_event and wsi_register_device_event now
use the default allocator when NULL is provided, so remove the
computation of 'alloc' here.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>