Use SVGA3D_QUERYTYPE_MAX instead of SVGA_QUERY_MAX for
svga query type check.
Tested with various OpenGL apps with GALLIUM_HUD set.
Reviewed-by: Brian Paul <brianp@vmware.com>
Replace the num-resources-mapped hud with
num-textures-mapped and num-buffers-mapped, so we can
differentiate the map counts for these two different resources.
Reviewed-by: Brian Paul <brianp@vmware.com>
Some OpenGL apps, like Cinebench R15, have many glDrawElements(GL_QUADS)
calls. Since we don't directly support quads we have to convert these
calls into GL_TRIANGLES which involves generating a new index buffer.
This patch saves the new/translated index buffer in the hope that it
can be reused for a later draw call.
Cinebench R15 increases by about 20% with this change.
The NobelClinician Viewer app also hits this code.
Tested with full piglit run.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
If a consecutive sequence of drawing commands references the same
vertex/index buffers, there should be no need to rebind the surfaces
for the second and subsequent drawing commands.
Apps that use multiple display lists benefit from this since the vertex
data for several display lists is often stored in one buffer.
In the case of the legacy E&S Glaze demo, this reduces the size of our
command buffers from 91KB to 44KB. One WSI Fusion trace shows a 33%
reduction in command buffer sizes.
Tested with full piglit run.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Previously, every time we put shader constants into the default constant
buffer we called u_upload_alloc(), which mapped the buffer, and
u_upload_unmap(). We had to unmap the buffer before calling
svga_buffer_handle() to get the winsys handle for the buffer. But we
really only need to do that the first time we reference the const buffer.
Now we try to keep the upload manager's buffer mapped until we fill it or
flush the command buffer.
v2: add additional comment on the buffer unmapping code in
svga_context_flush(), per Charmaine.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
When we migrate a buffer from sw/malloc storage to a hardware buffer,
don't memcpy the whole buffer, just copy the part we've written to.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
PredCopyRegion support copy between same type of textures.
Instead of comparing src and dst pipe texture type, compare svga texture
type which can avoid some software fallback.
for example, it avoids a software blit with the Redway3D Aston demo.
Tested piglit tests on VGPU9 and VGPU10 on GL/DX11Renderer, Redway3D Aston demo
v2: some nit pick suggested by Charmaine.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This function returns svga texture type for corresponding pipe texture.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Offset was wrong, it's at bit 8, not 4. Also, uses subr instead
of sub when src2 has neg. Similar to GK110 now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
The comment for the commutative flags was wrong because OP_MUL is
before OP_MAD. While we are at it add missing opcodes, and fix
the comment about the short forms.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
This patch switches non-TGSI compute shaders over to using the HSA
ABI described here:
https://github.com/RadeonOpenCompute/ROCm-Docs/blob/master/AMDGPU-ABI.md
The HSA ABI provides a much cleaner interface for compute shaders and allows
us to share more code in the compiler with the HSA stack.
The main changes in this patch are:
- We now pass the scratch buffer resource into the shader via user sgprs
rather than using relocations.
- Grid/Block sizes are now passed to the shader via the dispatch packet
rather than at the beginning of the kernel arguments.
Typically for HSA, the CP firmware will create the dispatch packet and set
up the user sgprs automatically. However, in Mesa we let the driver do
this work. The main reason for this is that I haven't researched how to
get the CP to do all these things, and I'm not sure if it is supported
for all GPUs.
v2:
- Add comments explaining why we are setting certain bits of the scratch
resource descriptor.
v3:
- Use amdgcn-mesa-mesa3d triple instead of amdgcn--mesa3d.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This fix getting the size of a struct arg. vec3 types still work ok.
Only buit-in args need to have power of two alignment, getTypeAllocSize
reports the correct size in all cases.
Acked-by: Francisco Jerez <currojerez@riseup.net>
The gallium interface defines these like DX10. Note that OpenGL ignores
these options if MSAA is disabled or the dest buffer doesn't support
MSAA.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
A possible error (-1) was being lost because it was first converted to an
unsigned int and only then checked.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Martina Kollarova <martina.kollarova@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Track rendering to each FBO independently and flush rendering only when
necessary. This lets us avoid the overhead of storing and loading the
frame when an application momentarily switches to rendering to some other
texture in order to continue rendering the main scene.
Improves glmark -b desktop:effect=shadow:windows=4 by 27%
Improves glmark -b
desktop:blur-radius=5:effect=blur:passes=1:separable=true:windows=4
by 17%
While I haven't tested other apps, this should help X rendering a lot, and
I've heard GLBenchmark needed it too.
This is done in vc4_flush currently, but I'm going to make the job always
track the surfaces it might be rendering to instead of putting in the
destinations at flush time.
It's really just an upgrade to attempting WHOLE_RESOURCE. Pulling the
logic out caught two bugs in it: We would try to do so on cubemaps (even
though we're only mapping 1 of the 6 slices), and we would break
persistent coherent mappings by trying to reallocate when we shouldn't.
The clear of Z or stencil will end up clearing the other as well, instead
of masking. There's no way around this that I know of, so if we are
clearing just one then we need to draw a quad.
Fixes a regression in the job-shuffling code, where the clear values move
to the job and don't just have the last clear's value laying around when
you do glClear(DEPTH) and then glClear(STENCIL) separately
(ext_framebuffer_multisample-clear 4 depth)).
This causes regressions in ext_framebuffer_multisample/multisample-blit
depth and ext_framebuffer_multisample/no-color depth, but these were
formerly false positives due to the reference image also being black. Now
the reference and test images are both being drawn, and it looks like
there's an incorrect resolve of depth during blitting to an MSAA FBO.
radeonsi depends on the interp flags a little bit too much.
This fixes 9 randomly failing tests:
GL45-CTS.shader_multisample_interpolation.render.interpolate_at_centroid.*
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Unify the clear and copy paths, clean up the definitions.
It looks more like a rework. It's a preparation for GDS support,
which might or might not come.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If si_sampler_view_add_buffer ends up flushing, then the code
in begin_new_cs would previously have added the buffer(s) for
whatever was previously bound to that slot. Now it would add only
the new buffer.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Heaven and Valley write gl_SampleMask and not Z.
Use 16_ABGR instead of 32_ABGR if Z isn't written.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This was missed in:
commit 0d2e43fcb1
Author: Marek Olšák <marek.olsak@amd.com>
Date: Thu Aug 18 16:30:00 2016 +0200
gallium/radeon: derive buffer placement and flags only at initialization
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Fixes segfaults in EG compute since:
commit 21de3be8e6
radeonsi: fix texture format reinterpretation with DCC
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fix VAAPI YV12/I420 convert to NV12 U/V reversal.
Input order is YVU when this is called.
Signed-off-by: Andy Furniss <adf.lists@gmail.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Android porting of the following commits:
f1f1ba3 "radeonsi: move sid.h/r600d_common.h to a common place."
69fca64 "amd/addrlib: move addrlib from amdgpu winsys to common code"
This patch fixes android building errors
Reviewed-by: Dave Airlie <airlied@redhat.com>
This fixes a crash when using the prefered video format with vaapisink
on Nvidia hardwares.
Also caught by the following assert:
nouveau_vp3_video.c:91: Assertion `templat->interlaced' failed.
TEST= gst-launch-1.0 videotestsrc ! video/x-raw, format=NV12 ! vaapisink
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Tested-by: Víctor Manuel Jáquez Leal <vjaquez@igalia.com>
Tested-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>