This streamout state code will be used by radeonsi.
There are new structures r600_common_context and r600_common_screen.
What is inherited by what is shown here:
pipe_context -> r600_common_context -> r600_context
pipe_screen -> r600_common_screen -> r600_screen
The common structures reside in drivers/radeon. Currently they only contain
enough functionality to be able to handle streamout. Eventually I'd like
the whole pipe_screen implementation to be shared and some of the context
stuff too.
This is quite big, but most changes are because of the new structures and
the fact r600_write_value is replaced by radeon_emit.
Thanks to Tom Stellard for fixing the build for r600g/compute.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
This started as an attempt to add support for MSAA texture transfers and
MSAA depth-stencil decompression for the DB->CB copy path.
It has gotten a bit out of control, but it's for the greater good.
Some changes do not make much sense, they are there just to make it look
like the other driver.
With a few cosmetic modifications, r600_texture.c can be shared with
a symlink.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
It was wrong, because the offset shouldn't be applied to MSAA depth buffers.
This small cleanup should prevent such issues in the future.
This fixes a lockup in "piglit/fbo-depthstencil default_fb -samples=n".
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
We did downsample (=resolve) MSAA resources to make ReadPixels work with MSAA
GLX visuals, which was enough for read-only color-only transfers.
This commit makes write color transfers and depth-stencil transfers work
in a similar manner. It does downsampling in transfer_map and upsampling
in transfer_unmap.
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes and enables texturing with compressed MSAA colorbuffers
on Evergreen and Cayman. For the first time, multisample textures work
on Cayman.
This requires the libdrm flag RADEON_SURF_FMASK.
v2: require libdrm_radeon 2.4.45
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This fixes a crash when a resource cannot be mapped to the CPU's address space
because it's too big.
This puts a global pipe_context in r600_screen, which is guarded by a mutex,
so that we can use pipe_context when there isn't one around.
Hopefully our multi-context support is solid.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
NOTE: This is a candidate for the 9.1 branch.
Only the disassembler is used to dump shaders. Here's a few examples
how to use R600_DEBUG.
Log compute info:
R600_DEBUG=compute
Dump all shaders:
R600_DEBUG=fs,vs,gs,ps,cs
Dump pixel shaders only:
R600_DEBUG=ps
Disable Hyper-Z:
R600_DEBUG=nohyperz
Disable the LLVM backend:
R600_DEBUG=nollvm
Or use any combination of the above, or print all options:
R600_DEBUG=help
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
I should say "fix", but it has never been used until now.
S8Z24 is the format equivalent to the GL_UNSIGNED_INT_24_8 packing,
so we'll start to see it more often with st/mesa now making smart decisions
about formats.
The DB<->CB copy can change the channel ordering for transfers, other than
that, the internal DB format doesn't really matter.
R600-R700 support is possible except shadow mapping.
FMT_24_8 is broken if the SAMPLE_C instruction is used (no idea why).
Also the sampler swizzling was broken in theory and the fact it worked was
a lucky coincidence.
radeonsi might need to port this.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Old kernel do not have dma support, patch pushed were missing some
of the check needed to not use dma.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
We keep track of ring emission order in a stack, whenever we need to
flush we empty the stack in a fifo order. There is few helpers function
for bo mapping and other ring activities that will make sure that
the ring stack is properly flush and submitted.
v2: fix st flush path, and other flush path to properly flush all
rings if necessary
v3: - improve name of ring helpers
- make sure that each time a cs is gona be written it endup at
top of the stack to avoid any issue such as :
STACK[0] = dma (withbo A,B)
STACK[1] = gfx (withbo C,D)
Now if code try to emit a dma command relative to bo C or D
it will start writting cmd stream into the cs and once it
reach the point where it adds relocation it will flush.
At that point the cs will have cmd that don't have proper
relocation into the relocation buffer and kernel will just
refuse to run.
v4: - Drop the stack idea as it turn out there is no way to use it
or benefit from it. Any time the driver start command on other
ring, it always need to flush the previous ring. So make code
simpler by not using a stack.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
npix_x/y/z is wrong with NPOT textures, since it's always aligned to POT
if the level is non-zero, so we can't use that.
This fixes piglit/spec/EXT_texture_shared_exponent/fbo-generatemipmap-formats.
This adds TBO support to r600g, and with GLSL 1.40 enabled,
we now get 3.1 core profiles advertised for r600g.
The r600/700 implementation is a bit different from the evergreen one,
as r6/7 hw lacks vertex fetch swizzles. So we implement it by passing 5
constants per sampler to the shader, the shader uses the first 4 as masks
for each component and the 5th as the alpha value to OR in.
Now TXQ is also broken so we have to pass a constant for the buffer size,
on evergreen we just pass this, on r6/7 we pass it as the 6th element
in the const info buffer.
v1.1: drop return as DDX doesn't use a texture type
v2: add r600/700 support.
Signed-off-by: Dave Airlie <airlied@redhat.com>
i.e. we have to allocate a temporary tiled resource if dst isn't tiled.
This fixes hardlocks on r6xx-r7xx, though using a linear resource is forbidden
on later asics as well.
NOTE: This is a candidate for the stable branches.
htile is used for HiZ and HiS support and fast Z/S clears.
This commit just adds the htile setup and Fast Z clear.
We don't take full advantage of HiS with that patch.
v2 really use fast clear, still random issue with some tiles
need to try more flush combination, fix depth/stencil
texture decompression
v3 fix random issue on r6xx/r7xx
v4 rebase on top of lastest mesa, disable CB export when clearing
htile surface to avoid wasting bandwidth
v5 resummarize htile surface when uploading z value. Fix z/stencil
decompression, the custom blitter with custom dsa is no longer
needed.
v6 Reorganize render control/override update mecanism, fixing more
issues in the process.
v7 Add nop after depth surface base update to work around some htile
flushing issue. For htile to 8x8 on r6xx/r7xx as other combination
have issue. Do not enable hyperz when flushing/uncompressing
depth buffer.
v8 Fix htile surface, preload and prefetch setup. Only set preload
and prefetch on htile surface clear like fglrx. Record depth
clear value per level. Support several level for the htile
surface. First depth clear can't be a fast clear.
v9 Fix comments, properly account new register in emit function,
disable fast zclear if clearing different layer of texture
array to different value
v10 Disable hyperz for texture array making test simpler. Force
db_misc_state update when no depth buffer is bound. Remove
unused variable, rename depth_clearstencil to depth_clear.
Don't allocate htile surface for flushed depth. Something
broken the cliprect change, this need to be investigated.
v11 Rebase on top of newer mesa
v12 Rebase on top of newer mesa
v13 Rebase on top of newer mesa, htile surface need to be initialized
to zero, somehow special casing first clear to not use fast clear
and thus initialize the htile surface with proper value does not
work in all case.
v14 Use resource not texture for htile buffer make the htile buffer
size computation easier and simpler. Disable preload on evergreen
as its still troublesome in some case
v15 Cleanup some comment and remove some left over
v16 Define name for bit 20 of CP_COHER_CNTL
Signed-off-by: Pierre-Eric Pelloux-Prayer <pelloux@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This force surface allocated from ddx to be consider as height
aligned on 8 and fix 1D->2D tiling transition that result from
this.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Brian reported seeing:
r600_texture.c: In function ‘r600_texture_create_object’:
r600_texture.c:468:12: warning: format ‘%llu’ expects type ‘long long unsigned int’, but argument 3 has type ‘uint64_t’
r600_texture.c:468:12: warning: format ‘%llu’ expects type ‘long long unsigned int’, but argument 4 has type ‘uint64_t’
r600_texture.c:485:12: warning: format ‘%llu’ expects type ‘long long unsigned int’, but argument 3 has type ‘uint64_t’
r600_texture.c:485:12: warning: format ‘%llu’ expects type ‘long long unsigned int’, but argument 4 has type ‘uint64_t’
this should wrap over them fine.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This contains the evergreen support.
Support is possible on rv670 upwards and the code in here
should work, but it doesn't and I haven't debugged it to
figure out why.
Beyond just adding support for the cube map array sampling,
r600 resinfo isn't conformant with the GL specification,
which states the number of layers should be returned for
the textureSize, so we have to track in an external
constant buffer the layers for each sampler if we need
them in the shader.
v2: only update the sampler constants if the sampler views have changed,
as suggested by Marek.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The decompression is done in-place and only the compressed tiles are
decompressed. Note: R6xx-R7xx can do that only with Z16 and Z32F.
The texture unit is programmed to use non-displayable tiling and depth
ordering of samples, so that it can fetch the texture in the native DB format.
The latest version of the libdrm surface allocator is required for stencil
texturing to work. The old one didn't create the mipmap tree correctly.
We need a separate mipmap tree for stencil, because the stencil mipmap
offsets are not really depth offsets/4.
There are still some known bugs, but this should save some memory and it also
improves performance a little bit in Lightsmark (especially with low
resolutions; tested with Radeon HD 5000).
The DB->CB copy is still used for transfers.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
The problem was we set VRAM|GTT for relocations of STATIC resources.
Setting just VRAM increases the framerate 4 times on my machine.
I rewrote the switch statement and adjusted the domains for window
framebuffers too.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
"get_transfer + transfer_map" becomes "transfer_map".
"transfer_unmap + transfer_destroy" becomes "transfer_unmap".
transfer_map must create and return the transfer object and transfer_unmap
must destroy it.
transfer_map is successful if the returned buffer pointer is not NULL.
If transfer_map fails, the pointer to the transfer object remains unchanged
(i.e. doesn't have to be NULL).
Acked-by: Brian Paul <brianp@vmware.com>
pipe_resource can be shared between contexts, we shouldn't modify its
description. Instead, let's use the resource "views" (sampler views and
surfaces), where we can freely change almost any property of a resource.
The blend state is different and the resolve single-sample buffer must have
FMASK and CMASK enabled. I decided to have one CMASK and one FMASK
per context instead of per resource.
There are new FMASK and CMASK allocation helpers and a new buffer_create
helper for that.
This adds the FMASK and CMASK buffers. They share the same resource
with color data.
COMPRESSION and FAST_CLEAR are always enabled if both FMASK and CMASK are
allocated. We initialize the CMASK to a "compressed" state (not "fast cleared"),
so that we can keep FAST_CLEAR enabled all the time.
Both FMASK and CMASK must be present at the moment. If either one is missing,
the other one is not used.
v2: add cayman regs in the list
Reviewed-by: Jerome Glisse <jglisse@redhat.com>