Assignment and usage of this variable both happen inside an
if(rad_image_has_dcc()) {} blocks. It seems gcc plays it safe and
assumes that both function calls could have different return values.
But in this case we should be safe.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Inherited commands buffers are not supported.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
By default, our internal rendering commands are discarded
only if the predicate is non-zero (ie. DRAW_VISIBLE). But
VK_EXT_conditional_rendering also allows to discard commands
when the predicate is zero, which means we have to use a
different flag.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
VK_EXT_conditional_rendering allows to discard draw commands
(not only normal draws).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
VK_EXT_conditional_rendering allows to discard dispatch commands.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We don't need to emit PS_PARTIAL_FLUSH for the pre fragment shader
stages (ie. geometry/tessellation). Emitting VS_PARTIAL_FLUSH
is enough for these stages. Note that PS_PARTIAL_FLUSH also
synchronizes all vertex stages.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If we know that the given image doesn't have any metadata,
we don't need to flush.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The Vulkan 1.1.80 spec says:
"viewMask has the same effect for the described subpass as
VkRenderPassMultiviewCreateInfo::pViewMasks has on each
corresponding subpass."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
There are no fixed sized array arguments in C, those are simply pointers
to unsized arrays and as the size is passed in anyway, just rely on that.
where possible calls are replaced by nir_channel and nir_channels.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Now that 'set' can't be NULL because the meta operations no
longer bind a NULL descriptor, the logic can be simplified
a little bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
I don't want to waste CPU cycles for nothing.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We mostly use the same priority for all buffer objects, so
I don't think that matter much. This should reduce CPU
overhead a little bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
A ZPASS_DONE or PIXEL_STAT_DUMP_EVENT (of the DB occlusion
counters) must immediately precede every timestamp event to
prevent a GPU hang on GFX9.
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
VkCreateRenderPass2KHR() is quite similar to VkCreateRenderPass()
but refactoring the code is a bit painful.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Needed for VK_KHR_create_renderpass2.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This might fix some synchronization issues. I don't know if
that will affect performance but it's required for correctness.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Looks like I broke intel CI compiles.
Fixes: 6f3aee40f9 (radv: using tls to store llvm related info and speed up compiles (v10))
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
This uses the common compiler passes abstraction to help radv
avoid fixed cost compiler overheads. This uses a linked list per
thread stored in thread local storage, with an entry in the list
for each target machine.
This should remove all the fixed overheads setup costs of creating
the pass manager each time.
This takes a demo app time to compile the radv meta shaders on nocache
and exit from 1.7s to 1s. It also has been reported to take the startup
time of uncached shaders on RoTR from 12m24s to 11m35s (Alex)
v2: fix llvm6 build, inline emit function, handle multiple targets
in one thread
v3: rebase and port onto new structure
v4: rename some vars (Bas)
v5: drag all code into radv for now, we can refactor it out later
for radeonsi if we make it shareable
v6: use a bit more C++ in the wrapper
v7: logic bugs fixed so it actually runs again.
v8: rebase on top of radeonsi changes.
v9: drop some C++ headers, cleanup list entry
v10: use pop_back (didn't have enough caffeine)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
That might reduce CPU overhead a little bit when using
RADV_TRACE_FILE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The number of enabled descriptors for a given pipeline stage
can be computed at compile time.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
pthread types are used in some files without explicitely including pthread.h.
This leads to compile errors on Android 7.x nougat-x86
e.g. in src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.h
In file included from external/mesa/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c:31:
In file included from external/mesa/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.h:32:
external/mesa/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.h:52:2: error: unknown type name 'pthread_mutex_t'
pthread_mutex_t global_bo_list_lock;
^
1 error generated.
Including pthread.h explicitely solves the building error
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In Python 2, `print` was a statement, but it became a function in
Python 3.
Using print functions everywhere makes the script compatible with Python
versions >= 2.6, including Python 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
For merged shaders, VS as HS for example.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If the given image doesn't enable CMASK, FMASK or DCC that's
useless to flush CB metadata.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If the given image doesn't have HTILE, that's useless to flush
DB metadata.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
GCC 4.8 fails to compile with "static const", while GCC 8.1
fails to compile with only "static".
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
On Python 2, the default JSON separators are ', ' for items and ': ' for
dicts.
On Python 3, the default is the same when no indent is specified, but if
one is (and we do specify one) then the default items separator becomes
',' (the dict separator remains unchanged).
This change explicitly specifies the Python 3 default, which helps
ensuring that the output is identical, whether it was generated by
Python 2 or 3.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
In Python, dictionaries and sets are unordered, and as a result their
is no guarantee that running this script twice will produce the same
output.
Using ordered dicts and explicitly sorting items makes the build more
reproducible, and will make it possible to verify that we're not
breaking anything when we move the build scripts to Python 3.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Always emitting a bottom-of-pipe event is quite dumb. Instead,
start to optimize these functions by syncing PFP for the
top-of-pipe and syncing ME for the post-index-fetch event.
This can still be improved by emitting EOS events for
syncing PS and CS stages.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This introduces radv_barrier() (same as the draw/dispatch codepath).
This helper is used for merging the code from CmdWaitEvents() and
CmdPipelineBarrier because it's quite similar.
We do ignore the source stage mask for CmdWaitEvents because
it's irrelevant when event objects are used.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is basically LLVMTargetMachineEmitToMemoryBuffer inlined and reworked.
struct ac_compiler_passes (opaque type) contains the main pass manager.
ac_create_llvm_passes -- the result can go to thread local storage
ac_destroy_llvm_passes -- can be called by a destructor in TLS
ac_compile_module_to_binary -- from LLVMModuleRef to ac_shader_binary
The motivation is to do the expensive call addPassesToEmitFile once
per context or thread.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This ports radv to the shared code, however due to a bug in LLVM
version prior to 7, radv cannot add target info at this stage,
as it would leak one for every shader compile, however I'd prefer
to keep this llvm damage in the shared code, since it isn't the
driver at fault here. We just add a flag to denote if the driver
can support leaking the target info or not, and the common code
does the right thing depending on the llvm version.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We want to share this code with radv in the future, so port
it out of radeonsi.
Add a return value as radv will want that to know if this
succeeds
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
As precursor to moving init to common code, just rename the struct
and move it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is prep work for moving this to a per-thread struct
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This adds a inline always pass, but otherwise should work the
same.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This just splits out the non-shared code and reuses ac_get_llvm_target in radv.
v2: rebase on Marek's patch - fixup brace position/whitespace
Reviewed-by: Marek Olšák <marek.olsak@amd.com>