Mostly copied from src/gallium/include/pipe/p_config.h, so I kept its
copyright and authorship.
Other than the obvious rename, the big difference is that these are
always defined, to be used as `#if DETECT_OS_LINUX`.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We can have a scenario like:
A -> B
A -> C -> B
When adding the A->C dependency, it doesn't really matter that C depends
on something that A depends on, that isn't a necessary condition for a
dependency loop.
Instead what we want to know is that nothing C depends on, directly or
indirectly, depends on A. We can detect this by recursively OR'ing the
dependents_mask of C and all it's dependencies.
Signed-off-by: Rob Clark <robdclark@chromium.org>
If no clear, and no geometry according to VSC_STATE[pipe] we can skip
the tile entirely. If there is a fast-clear, we can't skip restore
(clear) or resolve IBs, but we can still skip draw IB.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Check VSC_SIZE/VSC_SIZE2 regs from cmdstream to detect overflow, and
skip use of VSC visibility stream when overflow is detected, to avoid
GPU hangs. This is done w/ introduction of some CP_REG_TEST/
CP_COND_REG_EXEC packet pairs.
In addition, eventually (after a frame or two) detect the condition and
resize the VSC buffers until overflow no longer happens.
Note that this significantly reduces the initial size of the VSC
buffers, backing out a previous hack to make them 16x larger than
what should be typically required (the previous "solution" for
VSC overflow).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Seems this isn't needed anymore on a6xx to control whether visibility
stream is used. And it would be hard to deal with if it was, for
disabling use of VSC stream in draw pass. So just remove it and
simplify things.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rename to "control_mem", and switch to using a struct to manage the
layout, rather than just ad-hoc hard-coded offsets.
For recovering from VSC stream overflow, we'll need to add more, but
best to clean it up first.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fix some #ifdef'd bitrot, and get rid of #ifdef so it doesn't bitrot
again.
And add a prints for per-tile state.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since it ends up contended, it is a bit of a bottleneck for workloads
with high driver overhead. Worth nearly +10% at gfxbench driver2.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Not all flush paths come thru fd_context_flush(), so we should also set
last_fence in the batch flush path. This avoids some no-op flushes just
to get a fence. For example when pctx->flush_resource() triggers a
flush.
We should probably keep the last_fence update in fd_context_flush() as
well to handle deferred flush case.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The pscreen param was just there to satisfy pipe_screen::fence_reference
But some of the internal uses passed NULL for screen. Which is a bit
ugly. Instead drop the param and add a shim function to plug into the
screen.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We would like to flip ops to have a constant in the second place to
enable inlining of the constant.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
nir_lower_alu_to_scalar can now be used to only lower certain ops, so we
don't need the custom pass. And we can lower fall_equal/fany_nequal with
lower_vector_cmp instead.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Previously we would get a fmov with modifiers, but now that mov has no type
these opcodes need to be supported.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
int_to_float needs to come after bool_to_float, and lower_to_source_mods
needs to come after both, since they don't deal wih source mods.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Not sure how this happened, but apparently all cubemaps need swapped XY.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
This line was mistakenly added while there is already a `-D tools=all`
a few lines below.
Fixes: f60defa72d ("gitlab-ci: Add a shader-db run using v3d on drm-shim.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Test-case with depth-clear 0.5 and format
MESA_FORMAT_Z24_UNORM_X8_UINT fails due inconsistent
clear-value of 0.4999997.
Maybe its better to improve?
CC: Jason Ekstrand <jason.ekstrand@intel.com>
Fixes: 0ae9ce0f29 (i965/clear: Quantize the depth clear value based on the format)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111113
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The driver should now rely on cmask_offset because CMASK can be
disabled by the driver for some reasons (eg. mipmaps). Apply the
same change for FMASK, although it should be useless.
Fixes: ad1bc8621d ("radv: remove radv_get_image_fmask_info()")
Fixes: 10d08da52c ("radv/gfx10: add missing dcc_tile_swizzle tweak")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's unnecessary to duplicate fields in another struct.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's unnecessary to duplicate fields in another struct.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's 0 for depth surfaces with TC compat HTILE enabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
LLVM 9 does not have a 64-bit buffer compswap intrinsic, so this
extracts the ptr, does a bound check and then uses a cmpxchg LLVM
instruction.
Not ideal, but the earliest release we're going to get a proper
intrinsic is LLVM 10.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
>From the EGL_KHR_create_context spec:
"* If OpenGL 3.1 is requested, the context returned may implement
any of the following versions:
* Version 3.1. The GL_ARB_compatibility extension may or may
not be implemented, as determined by the implementation.
* The core profile of version 3.2 or greater."
Fixes CTS tests:
dEQP-EGL.functional.create_context_ext.gl_31.rgb888_depth_stencil
dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_depth_stencil
dEQP-EGL.functional.create_context_ext.gl_31.rgb888_depth_no_stencil
dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_depth_no_stencil
dEQP-EGL.functional.create_context_ext.gl_31.rgba8888_depth_no_stencil
dEQP-EGL.functional.create_context_ext.gl_31.rgb888_no_depth_no_stencil
dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba8888_depth_no_stencil
dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_no_depth_no_stencil
dEQP-EGL.functional.create_context_ext.gl_31.rgba8888_no_depth_no_stencil
dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba8888_no_depth_no_stencil
dEQP-EGL.functional.create_context_ext.gl_31.rgba8888_depth_stencil
dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba8888_depth_stencil
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When we detect a scalar/vector copy through load_deref/store_deref, we
have to be careful since those can bitcast an int to a float and
vice-versa even though copy_deref can't.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111251
Fixes: 156306e5e6 ("nir/find_array_copies: Handle wildcards and overlapping copies")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This makes it cheaper to just change the dynamic offsets with
the same descriptor sets.
This optimization has been reverted a while back because of
random GPU hangs on GFX9, no it looks fine, at least CTS no longer
hangs on GFX9 and it doesn't hang on GFX10 as well.
It fixes a performance problem with Wolfenstein Youngblood.
Suggested-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
It can be enabled with RADV_PERFTEST=gewave32.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It can be enabled with RADV_PERFTEST=pswave32.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This exposes the textureSamplesIdenticalEXT function in GLSL.
We enable it for iris and radeonsi, because their compilers already
have support for this. Tested on Intel Kabylake and AMD Vega 64.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Specifically the optimization of a conditional BREAK + WHILE sequence
into a conditional WHILE seems pretty broken. The list of successors
of "earlier_block" (where the conditional BREAK was found) is emptied
and then re-created with the same edges for no apparent reason. On
top of that the list of predecessors of the block immediately after
the WHILE loop is emptied, but only one of the original edges will be
added back, which means that potentially several blocks that still
have it on their list of successors won't be on its list of
predecessors anymore, causing all sorts of hilarity due to the
inconsistency in the control flow graph.
The solution is to remove the code that's removing valid edges from
the CFG. cfg_t::remove_block() will already clean up after itself.
The assert in bblock_t::combine_with() also needs to be removed since
we will be merging a block with multiple children into the first one
of them.
Found the issue on a hardware enabling branch originally, but
apparently somebody reproduced the same problem independently on
master in the meantime.
Fixes: d13bcdb3a9 ("i965/fs: Extend predicated break pass to predicate WHILE.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111009
Cc: jiradet.jd@gmail.com
Cc: Sergii Romantsov <sergii.romantsov@globallogic.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Paul Chelombitko <qamonstergl@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The device info initializer makes several fuctions internal:
- handling of device override
- updating topology from kernel information
The implementation file is slightly reordered due to the renamed
functions being static.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>