Number of dwords in MI_FLUSH_DW changed from 4 to 5 in gen8+.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
We already have this workaround in OpenGL driver.
See Mesa commit 3cf4fe2219.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Nanley Chery <nanley.g.chery@intel.com>
Cc: Rafael Antognolli <rafael.antognolli@intel.com>
The MOV instruction can extract bytes to words/double words, and
words/double words to quadwords, but not byte to quadwords.
For unsigned byte to quadword, we can read them as words and AND off the
high byte and extract to quadword in one instruction. For signed bytes,
we need to first sign extend to word and the sign extend that word to a
quadword.
Fixes the following test on CHV, BXT, and GLK:
KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103628
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes the following tests on CHV, BXT, and GLK:
KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot
dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint32_to_int64
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103115
Speed up simd16 frontend (default) on avx/avx2 platforms;
fixes performance regression caused by switch to simdlib.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Speed up avx512 platforms; fixes performance regression caused
by swithc to simdlib.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Cc: mesa-stable@lists.freedesktop.org
When queryImage doesn't support __DRI_IMAGE_ATTRIB_FOURCC wayland clients
will die with a NULL derefence in wl_proxy_add_listener.
Attempt to provide a simple fallback to keep ancient systems working.
Fixes: 6595c69951 ("egl/wayland: Remove more surface specifics from
create_wl_buffer")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103519
Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Already implemented for Gallium drivers.
Useful for gbm_bo_(un)map.
Tests:
By porting wayland/weston/clients/simple-dmabuf-drm.c to GBM.
kmscube --mode=rgba
kmscube --mode=nv12-1img
kmscube --mode=nv12-2img
piglit ext_image_dma_buf_import-refcount -auto
piglit ext_image_dma_buf_import-transcode-nv12-as-r8-gr88 -auto
piglit ext_image_dma_buf_import-sample_rgb -fmt=XR24 -alpha-one -auto
piglit ext_image_dma_buf_import-sample_rgb -fmt=AR24 -auto
piglit ext_image_dma_buf_import-sample_yuv -fmt=NV12 -auto
piglit ext_image_dma_buf_import-sample_yuv -fmt=YU12 -auto
piglit ext_image_dma_buf_import-sample_yuv -fmt=YV12 -auto
v2: add early return if (flag & MAP_INTERNAL_MASK)
v3: take input rect into account and test with kmscube and piglit.
v4: handle wraparound and bo reference.
v5: indent, exclude 0 width and height on the boundary, map bo
independently of the image.
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
It seems safe and it improves performance by +4% (73->76).
A drirc based solution is not what we want for now, keep it
simple and improve later if it's really needed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
It's ok to use an empty list for dependencies:, but it's not ok to try to
use the found() method of it.
See also https://github.com/mesonbuild/meson/issues/2324
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Otherwise we can leak the old syncobj.
Fixes: eaa56eab6d "radv: initial support for shared semaphores (v2)"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Previously, we just had one hash set for tracking depth and render
caches called brw_context::render_cache. This is less than ideal
because the depth and render caches are separate and we can't track
moves between the depth and the render caches. This limitation led
to some unnecessary flushing around the depth cache. There are cases
(mostly with BLORP) where we can end up touching a depth or stencil
buffer through the render cache. To guard against this, blorp would
unconditionally do a render_cache_set_check_flush on it's destination
which meant that if you did any rendering (including a BLORP operation)
to a given surface and then used it as a blorp destination, you would
end up flushing it out of the render cache before rendering into it.
Things get worse when you dig into the depth/stencil state code for
regular GL draw calls. Because we may end up rendering to a depth
or stencil buffer via BLORP, we did a render_cache_set_check_flush on
all depth and stencil buffers in brw_emit_depthbuffer to ensure that
they got flushed out of the render cache prior to using them for depth
or stencil testing. However, because we also need to track dirtiness
for depth and stencil so that we can implement depth and stencil
texturing correctly, we were adding all depth and stencil buffers to the
render cache set in brw_postdraw_set_buffers_need_resolve. This meant
that, if anything caused 3DSTATE_DEPTH_BUFFER to get re-emitted
(currently _NEW_BUFFERS, BRW_NEW_BATCH, and BRW_NEW_BLORP), we would
almost always do a full pipeline stall and render/depth cache flush.
The root cause of both of these problems is that we can't tell the
difference between the render and depth caches in our tracking. This
commit splits our cache tracking into two sets, one for render and one
for depth, and properly handles transitioning between the two. We still
flush all the caches whenever anything needs to be flushed. The idea is
that if we're going to take the hit of a flush and stall, we may as well
flush everything in the hopes that we can avoid a flush by something
else later.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Right now we just always flush the destination for render and aren't
particularly careful about depth or stencil. Soon, flush_for_render
isn't going to do the same thing as flush_for_depth and we may be doing
a good deal less depth flushing so we should be a bit more precise.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In theory, this will let us track the depth and render caches
separately. Right now, they're just wrappers around
brw_render_cache_set_*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We may access them as a texture using blorp regardless of whether or not
stencil texturing is enabled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
We were already using PTE for all render targets in case one happened to
get scanned out. However, this still wasn't 100% correct because there
are still possibly cases where we may want to texture from an external
buffer even though we don't know the caching mode. This can happen, for
instance, on buffers imported from another GPU via prime.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101691
Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Tested-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes our MOCS settings significantly more flexible.
Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Tested-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a bit more annoying than your average shader - we need to look
at MEDIA_INTERFACE_DESCRIPTOR_LOAD in the batch buffer, then hop over
to the dynamic state buffer to read the INTERFACE_DESCRIPTOR_DATA, then
hop over to the instruction buffer to decode the program.
Now that we store all the buffers before decoding, we can actually do
this fairly easily.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
while loops skip the first field of the instruction/structure, which
is not what the code intended. It works out because the field we're
looking for doesn't happen to be first, but we ought to do it right
regardless.
Found while writing the next patch, where Kernel Start Pointer is
the first field of INTERFACE_DESCRIPTOR_DATA.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This makes aubinator_error_decode's shader dumping work like aubinator.
Instead of printing them after the fact, it prints them right inside the
3DSTATE_VS/HS/DS/GS/PS packet that references them. This saves you the
effort of cross-referencing things and jumping back and forth.
It also reduces a bunch of book-keeping, and eliminates the limitation
that we could only handle 4096 programs. That code was also broken and
failed to print any shaders if there were under 4096 programs.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This lets us complete parsing and storing of each buffer's data before
we begin decoding the batchbuffer. This makes it possible to inspect
the state buffer and program buffer, so we can properly decode any
indirect state or shader programs.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Based on a similar patch to intel_error_decode by Chris Wilson.
While we're de-duplicating the gtt_offset calculation, we can simplify
it to assume two hex digits are there - the kernel has done this since
v4.6, and we already require error states from v4.10.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Also change count from a pointer into a value. We were supposed to
be resetting it to 0 (and failed to), but that's gone since we dropped
the pre-ascii85 handling.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Error state files used to look like:
render ring --- gtt_offset = 0x0e8f6000
00000000 : 69040000
00000004 : 79090000
...
00007ffc : 00000000
--- ringbuffer = 0x00001000
There were thousands of lines between sections. The file format changed
with Kernel 4.10, and now has a single ascii85-encoded line following
each section heading. This is much easier to parse.
There are a bunch of bugs in our handling of the old style format,
where we'd decode the wrong data, at the wrong time. Fixing all of
these is going to be a giant pain. It's also a lot of extra code
complexity. In order to properly decode indirect state, or compute
shaders, we'll also need to parse data in advance of decoding, which
is going to be a giant pain with this ad-hoc "decode everywhere!"
mentality. So, let's just drop support for the older file format.
This unfortunately requires an error state generated by Kernel 4.10 or
later. That's probably not the end of the world, as we encourage users
to upgrade to the latest kernel when encountering GPU hangs anyway. It
might be a giant pain for people with LTS kernels, though...
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The dashes "---" may occur within an ascii85 block, but only an ascii85
block starts with ':' or '~'.
Ported from Chris Wilson's intel-gpu-tools commit:
bceec7e1d8a160226b783c6344eae8cbf4ece144
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
It's a neat idea, and still useful in some cases, but the intel common
code is used by i965 and anvil only, this is a little clearer.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Using build_by_default : false is convenient for dependencies that can
be pulled in by various diverse components of the build system, the
gallium hardware/software drivers and state trackers do not fit that
description. Instead, these should be guarded using the variable that tracks
whether that driver should be enabled.
This leaves a few helper libraries: trace, rbug, etc, and the generic
winsys bits as `build_by_default : false` because there are a large
number of gallium components that pull them in.
v2: - remove build_by_default from winsys convenience libs as well.
v3: - Always put drivers before winsys for consistency
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
This handles the bits >= 32 corner case in bitfieldInsert.
Fixes:
tests/spec/arb_gpu_shader5/execution/built-in-functions/fs-bitfieldInsert.shader_test.
Signed-off-by: Dave Airlie <airlied@redhat.com>