This is a more accurate description of what happens in processing the
OA reports.
Previously we only had a somewhat difficult to parse state machine
tracking the context ID.
What we really only need to do to decide if the delta between 2
reports (r0 & r1) should be accumulated in the query result is :
* whether the r0 is tagged with the context ID relevant to us
* if r0 is not tagged with our context ID and r1 is: does r0 have a
invalid context id? If not then we're in a case where i915 has
resubmitted the same context for execution through the execlist
submission port
v2: Update comment (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 8c0b058263)
If we read the OA reports late enough after the query happens, we can
get a timestamp in the report that is significantly in the past
compared to the start timestamp of the query. The current code must
deal with the wraparound of the timestamp value (every ~6 minute). So
consider that if the difference is greater than half that wraparound
period, we're probably dealing with an old report and make the caller
aware it should read more reports when they're available.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit b364e920bf)
We always add an empty buffer in the list when creating the query.
Let's set the len appropriately so that we can recognize it when we
read OA reports up to the end of a query.
We were using an 0 timestamp value associated with the empty buffer
and incorrectly assuming this was a valid value. In turn that led to
not reading enough reports and resulted in deltas added to our counter
values which should have been discarded because those would be flagged
for a different context.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 9d0a5c817c)
Accumulation happens between 2 reports, it can be between a start/end
report from another context. So only consider updating the hw_id of
the results when it's not already valid and that we have a valid value
to put in there.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 41b54b5faf ("i965: move OA accumulation code to intel/perf")
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit acea59dbf8)
Fixes: 13ab63bb62 ('radv: Implement VK_EXT_buffer_device_address.')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 35fab1ba33)
If an app first creates a compute pipeline with
VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT set, then re-compile it
without that flag, the driver should re-compile the compute shader.
Otherwise, it will return the unoptimized one.
Fixes: ce188813bf ("radv: add initial support for VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 9ab27647ff)
This implementation is loosely based on ROCm.
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/master/ockl/src/wfredscan.cl
This fixes dEQP-VK.subgroups.arithmetic.*.subgroupexclusive* on GFX10.
Fixes: 227c29a80d ("amd/common/gfx10: implement scan & reduce operations")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit c9aa843961)
Conflicts resolved by Dylan Baker
When a fragment shader includes an input variable decorated with
SampleId or SamplePosition, sample shading should be enabled
because minSampleShadingFactor is expected to be 1.0.
Cc: 19.2, 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 86a5fbfd4a)
Conflicts resolved by Dylan Baker
Do not link libgallium_nine with libgalliumvl_stub if it's already
linked with libgalliumvl. Linking with stub leads to "duplicate
symbol" errors.
Fixes: 6b4c7047d5
("meson: build gallium nine state_tracker")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2040
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit 9af22ccddc)
Conflicts resolved by Dylan Baker
gl_Viewport is also in the VUE header so we need to whack the read
offset to 0 and emit a default (no overrides) SBE_SWIZ entry in that
case as well.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit b1f37688ba)
brw_performance_query_metrics.h was removed in
134e750e16 and
brw_performance_query.h was removed in
8ae6667992
remove reference to these files from Makefile.sources
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Fixes: 134e750e16 ("i965: extract performance query metrics")
Fixes: 8ae6667992 ("intel/perf: move query_object into perf")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 34dda0ca65)
BACK_LEFT attachment can be outdated when the user calls
KHR_partial_update() (->lastStamp != ->texture_stamp), leading to a
damage region update on the wrong pipe_resource object.
Let's delay the ->set_damage_region() call until the attachments are
updated when we're in that case.
Reported-by: Carsten Haitzler <raster@rasterman.com>
Fixes: 492ffbed63 ("st/dri2: Implement DRI2bufferDamageExtension")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit b196e1a8cf)
pthread_mutex_unlock() when unlocked is documented by posix as
being undefined behaviour. On OpenBSD pthread_mutex_unlock() will call
abort(3) if this happens.
This occurs in amdgpu_winsys_create() after
cb446dc0fa
winsys/amdgpu: Add amdgpu_screen_winsys
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 3fe3bde4f2)
They were out of sync. Besides syncing, lets ensure they never diverge
again.
Fixes: 8d2654a419 "radv: Support VK_EXT_inline_uniform_block."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 4cde0e04e3)
Fixes: 946193ae00 "radv: add support for VK_AMD_buffer_marker"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 25bc9102d8)
Fixes: cea39af2fb ("freedreno/ir3: Generalize ir3_shader_disasm()")
Reviewed-by: Rob Clark <robdclark@gmail.com>
(cherry picked from commit d0f38394b1)
Specifically when we are in non-uniform control flow, as we would need
to set the condition for the last instruction. If (for example) a
image atomic load stores directly their value on a NIR register,
last_inst would be a nop, and would fail when set the condition.
Fixes piglit test:
spec/glsl-es-3.10/execution/cs-ssbo-atomic-if-else-2.shader_test
Fixes: 6281f26f06 ("v3d: Add support for shader_image_load_store.")
v2: (Changes suggested by Eric Anholt)
* Cover all sig.ld* signals, not just ldunif and ldtmu, as all of
them have the same restriction.
* Update comment explaining why we add a MOV in that case
* Tweak commit message.
v3:
* Drop extra set of parens (Eric)
* Add missing ld signal to is_ld_signal to fix shader-db regression.
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit b4bc59e37e)
Two files exist in that directory:
- vulkan_xlib_randr.h
- vulkan_xlib_xrandr.h
Both were imported in 205c271562 ("vulkan: Update the XML and
headers to 1.1.70") with identical contents (ie. the
VK_EXT_acquire_xlib_display extension), but the former was never
included anywhere and can't be found upstream [1], while the latter is
included in vulkan.h and found upstream.
[1] https://github.com/KhronosGroup/Vulkan-Headers/tree/master/include/vulkan
Fixes: 205c271562 ("vulkan: Update the XML and headers to 1.1.70")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 344859c32d)
The bounds checking is actually less safe than just pushing the data.
If the bounds checking actually ever kicks in and it's not on the last
UBO push range, then the shrinking will cause all subsequent ranges to
be pushed to the wrong place in the GRF. One of the behaviors we
definitely don't want is for OOB UBO access to result in completely
unrelated UBOs returning garbage values. It's safer to just push the
UBOs as-requested. If we're really concerned about robustness, we can
emit shader code to do bounds checking which should be stupid cheap (a
CMP followed by SEL).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
A security advisory (TALOS-2019-0857/CVE-2019-5068) found that
creating shared memory regions with permission mode 0777 could allow
any user to access that memory. Several Mesa drivers use shared-
memory XImages to implement back buffers for improved performance.
This path changes the shmget() calls to use 0600 (user r/w).
Tested with legacy Xlib driver and llvmpipe.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
(cherry picked from commit 02c3dad0f3)
Re-emitting 3DSTATE_CC_STATE_POINTERS after emitting
3DSTATE_BLEND_STATE_POINTERS fixes the shadow flickering in
SuperTuxCart and Tropico 6 which was seen only on Haswell.
The reason for this is unknown and fix was found empirically.
The closest mention in PRM is that it should improve performance.
From the HSW PRM, volume 2b, page 823 (3DSTATE_BLEND_STATE_POINTERS):
"When the BLEND_STATE pointer changes but not the CC_STATE pointer,
driver needs to force a CC_STATE pointer change to improve
blend performance in pixel backend."
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1834
Fixes: eca4a654 ("i965: Disable dual source blending when shader doesn't support it on gen8+")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 6f17fe0606)
Large programs, e.g. gnome-shell and firefox, may tax the
addressability of the Medium code model once a (potentially unbounded)
number of dynamically generated JIT-compiled shader programs are
linked in and relocated. Yet the default code model as of LLVM 8 is
Medium or even Small.
The cost of changing from Medium to Large is negligible:
- an additional 8-byte pointer stored immediately before the shader entrypoint;
- change an add-immediate (addis) instruction to a load (ld).
Testing with WebGL Conformance
(https://www.khronos.org/registry/webgl/sdk/tests/webgl-conformance-tests.html)
yields clean runs with this change (and crashes without it).
Testing with glxgears shows no detectable performance difference.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1753327, 1753789, 1543572, 1747110, and 1582226
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/223
Co-authored by: Nemanja Ivanovic <nemanjai@ca.ibm.com>, Tom Stellard <tstellar@redhat.com>
CC: mesa-stable@lists.freedesktop.org
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
(cherry picked from commit 9c3be6d21f)
Conflicts resolved Dylan (PIPE_ARCH -> UTIL_ARCH rename)
Use unsigned values otherwise signed extension will produce a 64 bits value where
the 32 left-most bits are 1.
Fixes: 2afeed3010 ("radeonsi: tell the shader disk cache what IR is used")
Until 8bef4df196 the IR (TGSI or NIR) was used in disk_cache driver_flags.
This commit restores this features to avoid crashing when switching from
one IR to the other.
As radeonsi's default is TGSI, I used "driver_flags & 0x8000000 = 0" for TGSI
to keep the same driver_flags.
Fixes: 8bef4df196 ("radeonsi: add si_debug_options for convenient adding/removing of options")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This caused a failure in NIR validation.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 3906fce88b)
After the discussion in
https://github.com/KhronosGroup/OpenGL-API/issues/45
the section 8.17 (texture completeness) of the OpenGL 4.6 core profile
was changed to explicitly say that multisample texture completeness
ignores filter state of the texture.
"Using the preceding definitions, a texture is complete unless any of the
following conditions hold true:
...
- The minification filter requires a mipmap (is neither NEAREST nor LINEAR),
the texture is not multisample, and the texture is not mipmap complete.
- The texture is not multisample; either the magnification filter is not
NEAREST, or the minification filter is neither NEAREST nor NEAREST_-
MIPMAP_NEAREST; and any of
– The internal format of the texture is integer (see table 8.12).
– The internal format is STENCIL_INDEX.
– The internal format is DEPTH_STENCIL, and the value of DEPTH_-
STENCIL_TEXTURE_MODE for the texture is STENCIL_INDEX."
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Signed-off-by: Illia Iorin <illia.iorin@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 6b672e342a)
This prevents some additional optimizations that would change the
original result. This includes things like (b < a && b < c) => b <
min(a, c) and !(a < b) => b >= a. Both of these optimizations were
specifically observed in the piglit tests added in piglit!160.
This was discovered while investigating
https://gitlab.freedesktop.org/mesa/mesa/issues/1958. However, the
problem in that issue was Chrome or Angle is replacing calls to isnan()
with some stuff that we (correctly) optimize to false. If they had left
the calls to isnan() alone, everything would have just worked.
No shader-db changes on any Intel platform.
I also tried marking the comparison generated by the isnan() function
precise. The precise marker "infects" every computation involved in
calculating the parameter to the isnan() function, and this severely
hurt all of the (few) shaders in shader-db that use isnan().
I also considered adding a new ir_unop_isnan opcode that would implement
the functionality. During GLSL IR-to-NIR translation, the resulting
comparison operation would be marked exact (and the samething would need
to happen in SPIR-V translation).
This approach taken by this patch seemed easier, but we may want to do
the ir_unop_isnan thing anyway.
Fixes: d55835b8bd ("nir/algebraic: Add optimizations for "a == a && a CMP b"")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
(cherry picked from commit 9be4a422a0)
On ICL we have the src1 restriction which is applied through
fix_byte_src() and potentially changes the type of the operands from 8
to 32 bits. When this change happens, we fall into the "else if
(bit_size < 32)" case and miscompute src_type because it takes into
consideration bit_size (8) instead of the adjusted size of temp_op
(32). This results in the shader reading unused memory, giving us
mostly failures, but occasional passes due to whatever was already in
the registers we were reading.
This commit fixes a lot of dEQP subgroup i8vec2 tests on ICL, such as:
dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2
This can also be verified by simply changing fix_byte_src() to apply
on all platforms.
Fixes: 5847de6e9a ("intel/compiler: don't use byte operands for src1 on ICL")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
(cherry picked from commit eb6352162d)
They use the "sample" keyword as a variable name.
Cc: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit e00791c552)
We seem to have forgotten about the semaphore in the
acquireNextImageInfo.
v2: Signal semaphore/fence regardless of presentation status (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit edc6606d4e)
This doesn't seem to fix anything because those destroy() calls happen
right before the command buffer object & its list of batch_bo is also
destroyed. Still looks a bit cleaner.
v2: Found a second occurence
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
Fixes: 26ba0ad54d ("vk: Re-name command buffer implementation files")
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 935f8f0e56)
We always close the in_fence at the end the anv_cmd_buffer_execbuf()
so when we take it from the semaphore, let's not forget to invalidate
it.
Note that the code leaks the fence_in if we get any error before
reaching the close(). Let's fix that in another patch or better,
rewrite the whole thing!
v2: drop redundant fd = -1 (Jason)
v3: Update commit message (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 048f0690ee)