Commit graph

262 commits

Author SHA1 Message Date
Marek Olšák
39801d4ba7 r600g,radeonsi: consolidate transfer, cmask, and fmask structures 2013-09-29 15:18:08 +02:00
Grigori Goronzy
56d9a397aa r600g: add support for separately allocated CMASKs
v2: check for NULL cbufs

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2013-09-20 20:35:55 +02:00
Axel Davy
e8f9195e5f gallium, intel: Implements new __DRI_IMAGE_USE_LINEAR and PIPE_BIND_LINEAR flags to enforce no tiling.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2013-09-06 15:02:34 -07:00
Marek Olšák
d5b23dfc1c r600g: move streamout state to drivers/radeon
This streamout state code will be used by radeonsi.

There are new structures r600_common_context and r600_common_screen.
What is inherited by what is shown here:

pipe_context -> r600_common_context -> r600_context
pipe_screen -> r600_common_screen -> r600_screen

The common structures reside in drivers/radeon. Currently they only contain
enough functionality to be able to handle streamout. Eventually I'd like
the whole pipe_screen implementation to be shared and some of the context
stuff too.

This is quite big, but most changes are because of the new structures and
the fact r600_write_value is replaced by radeon_emit.

Thanks to Tom Stellard for fixing the build for r600g/compute.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2013-08-31 01:34:30 +02:00
Marek Olšák
356c041167 radeonsi: port texture improvements from r600g
This started as an attempt to add support for MSAA texture transfers and
MSAA depth-stencil decompression for the DB->CB copy path.
It has gotten a bit out of control, but it's for the greater good.

Some changes do not make much sense, they are there just to make it look
like the other driver.

With a few cosmetic modifications, r600_texture.c can be shared with
a symlink.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-08-17 01:48:25 +02:00
Marek Olšák
94d294137e r600g: don't read back the MSAA depth buffer if the read flag is not set
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-08 20:25:18 +02:00
Marek Olšák
141b892620 r600g: don't flush the context in texture_transfer_map
the winsys does this automatically

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-08 20:25:18 +02:00
Marek Olšák
ae87aae0c4 r600g: fix texture offset computation for mapped MSAA depth buffers
It was wrong, because the offset shouldn't be applied to MSAA depth buffers.
This small cleanup should prevent such issues in the future.

This fixes a lockup in "piglit/fbo-depthstencil default_fb -samples=n".

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-08 20:25:18 +02:00
Marek Olšák
4d59258856 r600g: upsample and downsample MSAA resources for transfers
We did downsample (=resolve) MSAA resources to make ReadPixels work with MSAA
GLX visuals, which was enough for read-only color-only transfers.

This commit makes write color transfers and depth-stencil transfers work
in a similar manner. It does downsampling in transfer_map and upsampling
in transfer_unmap.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-06-13 03:54:13 +02:00
Marek Olšák
61c995bc47 r600g: rewrite FMASK allocation, fix FMASK texturing with 2 and 4 samples
This fixes and enables texturing with compressed MSAA colorbuffers
on Evergreen and Cayman. For the first time, multisample textures work
on Cayman.

This requires the libdrm flag RADEON_SURF_FMASK.

v2: require libdrm_radeon 2.4.45

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-05-15 20:19:45 +02:00
Marek Olšák
b692076420 r600g: initialize CMASK and HTILE with the GPU using streamout
This fixes a crash when a resource cannot be mapped to the CPU's address space
because it's too big.

This puts a global pipe_context in r600_screen, which is guarded by a mutex,
so that we can use pipe_context when there isn't one around.
Hopefully our multi-context support is solid.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

NOTE: This is a candidate for the 9.1 branch.
2013-04-23 20:26:20 +02:00
Marek Olšák
413ca78af3 r600g: add a debug flag for printing virtual addresses of resources 2013-04-16 13:56:47 +02:00
Marek Olšák
52efa01de0 r600g: allocate FMASK right after the texture, so that it's aligned with it
This avoids the kernel CS checker errors with MSAA textures.

Reviewed-by: Jerome Glisse <jglisse@redhat.com>
2013-03-11 13:43:36 +01:00
Marek Olšák
4bf0ebdd4f r600g: use a single env var R600_DEBUG, disable bytecode dumping
Only the disassembler is used to dump shaders. Here's a few examples
how to use R600_DEBUG.

Log compute info:
  R600_DEBUG=compute

Dump all shaders:
  R600_DEBUG=fs,vs,gs,ps,cs

Dump pixel shaders only:
  R600_DEBUG=ps

Disable Hyper-Z:
  R600_DEBUG=nohyperz

Disable the LLVM backend:
  R600_DEBUG=nollvm

Or use any combination of the above, or print all options:
  R600_DEBUG=help

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2013-03-11 13:43:36 +01:00
Marek Olšák
2ca73bc7f7 r600g: cleanup #include recursion between r600_pipe.h and evergreen_compute.h
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2013-03-11 13:43:36 +01:00
Marek Olšák
43d3e0cd3d r600g: don't check for R600_ENABLE_S3TC env var 2013-03-11 13:43:36 +01:00
Marek Olšák
3857f450a6 gallium/util: add helper util_max_layer from r600g 2013-02-26 01:14:05 +01:00
Marek Olšák
2b9659c9e6 r600g: properly implement S8Z24 depth-stencil format for Evergreen
I should say "fix", but it has never been used until now.
S8Z24 is the format equivalent to the GL_UNSIGNED_INT_24_8 packing,
so we'll start to see it more often with st/mesa now making smart decisions
about formats.

The DB<->CB copy can change the channel ordering for transfers, other than
that, the internal DB format doesn't really matter.

R600-R700 support is possible except shadow mapping.
FMT_24_8 is broken if the SAMPLE_C instruction is used (no idea why).

Also the sampler swizzling was broken in theory and the fact it worked was
a lucky coincidence.

radeonsi might need to port this.

Reviewed-by: Jerome Glisse <jglisse@redhat.com>
2013-02-14 14:51:46 +01:00
Marek Olšák
5c86a728d4 r600g: fix htile buffer leak
NOTE: This is a candidate for the 9.1 branch.
2013-01-31 15:35:18 +01:00
Jerome Glisse
72916698b0 r600g: fix segfault with old kernel
Old kernel do not have dma support, patch pushed were missing some
of the check needed to not use dma.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
2013-01-28 14:51:40 -05:00
Jerome Glisse
325422c494 r600g: add async for staging buffer upload v2
v2: Add virtual address to dma src/dst offset for cayman

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
2013-01-28 11:30:35 -05:00
Jerome Glisse
bff07638a8 r600g: add multi ring support with dma as first second ring v4
We keep track of ring emission order in a stack, whenever we need to
flush we empty the stack in a fifo order. There is few helpers function
for bo mapping and other ring activities that will make sure that
the ring stack is properly flush and submitted.

v2: fix st flush path, and other flush path to properly flush all
    rings if necessary
v3: - improve name of ring helpers
    - make sure that each time a cs is gona be written it endup at
      top of the stack to avoid any issue such as :
      STACK[0] = dma (withbo A,B)
      STACK[1] = gfx (withbo C,D)
      Now if code try to emit a dma command relative to bo C or D
      it will start writting cmd stream into the cs and once it
      reach the point where it adds relocation it will flush.
      At that point the cs will have cmd that don't have proper
      relocation into the relocation buffer and kernel will just
      refuse to run.
v4: - Drop the stack idea as it turn out there is no way to use it
      or benefit from it. Any time the driver start command on other
      ring, it always need to flush the previous ring. So make code
      simpler by not using a stack.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
2013-01-28 11:30:35 -05:00
Marek Olšák
26c872c2a2 r600g: don't use radeon_surface_level::npix_x/y/z
npix_x/y/z is wrong with NPOT textures, since it's always aligned to POT
if the level is non-zero, so we can't use that.

This fixes piglit/spec/EXT_texture_shared_exponent/fbo-generatemipmap-formats.
2013-01-26 14:58:52 +01:00
Dave Airlie
d23aa65001 r600g: texture buffer object + glsl 1.40 enable support (v2)
This adds TBO support to r600g, and with GLSL 1.40 enabled,
we now get 3.1 core profiles advertised for r600g.

The r600/700 implementation is a bit different from the evergreen one,
as r6/7 hw lacks vertex fetch swizzles. So we implement it by passing 5
constants per sampler to the shader, the shader uses the first 4 as masks
for each component and the 5th as the alpha value to OR in.

Now TXQ is also broken so we have to pass a constant for the buffer size,
on evergreen we just pass this, on r6/7 we pass it as the 6th element
in the const info buffer.

v1.1: drop return as DDX doesn't use a texture type
v2: add r600/700 support.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-01-11 22:31:54 +00:00
Marek Olšák
1aebb6911e r600g: implement 3D transfers
That means we can map and read multiple slices with one transfer_map call.
2013-01-04 14:06:54 +01:00
Marek Olšák
9ef26fc667 r600g: remove redundant parameter alloc_bo from r600_texture_create_object
alloc_bo == !buf
2012-12-22 19:39:29 +01:00
Marek Olšák
9c6410e5c3 r600g: always use a tiled resource as the destination of MSAA resolve
i.e. we have to allocate a temporary tiled resource if dst isn't tiled.

This fixes hardlocks on r6xx-r7xx, though using a linear resource is forbidden
on later asics as well.

NOTE: This is a candidate for the stable branches.
2012-12-21 23:43:34 +01:00
Marek Olšák
eccc74f5d3 r600g: remove a false comment 2012-12-21 23:42:09 +01:00
Jerome Glisse
6532eb17ba r600g: add htile support v16
htile is used for HiZ and HiS support and fast Z/S clears.
This commit just adds the htile setup and Fast Z clear.
We don't take full advantage of HiS with that patch.

v2 really use fast clear, still random issue with some tiles
   need to try more flush combination, fix depth/stencil
   texture decompression
v3 fix random issue on r6xx/r7xx
v4 rebase on top of lastest mesa, disable CB export when clearing
   htile surface to avoid wasting bandwidth
v5 resummarize htile surface when uploading z value. Fix z/stencil
   decompression, the custom blitter with custom dsa is no longer
   needed.
v6 Reorganize render control/override update mecanism, fixing more
   issues in the process.
v7 Add nop after depth surface base update to work around some htile
   flushing issue. For htile to 8x8 on r6xx/r7xx as other combination
   have issue. Do not enable hyperz when flushing/uncompressing
   depth buffer.
v8 Fix htile surface, preload and prefetch setup. Only set preload
   and prefetch on htile surface clear like fglrx. Record depth
   clear value per level. Support several level for the htile
   surface. First depth clear can't be a fast clear.
v9 Fix comments, properly account new register in emit function,
   disable fast zclear if clearing different layer of texture
   array to different value
v10 Disable hyperz for texture array making test simpler. Force
    db_misc_state update when no depth buffer is bound. Remove
    unused variable, rename depth_clearstencil to depth_clear.
    Don't allocate htile surface for flushed depth. Something
    broken the cliprect change, this need to be investigated.
v11 Rebase on top of newer mesa
v12 Rebase on top of newer mesa
v13 Rebase on top of newer mesa, htile surface need to be initialized
    to zero, somehow special casing first clear to not use fast clear
    and thus initialize the htile surface with proper value does not
    work in all case.
v14 Use resource not texture for htile buffer make the htile buffer
    size computation easier and simpler. Disable preload on evergreen
    as its still troublesome in some case
v15 Cleanup some comment and remove some left over
v16 Define name for bit 20 of CP_COHER_CNTL

Signed-off-by: Pierre-Eric Pelloux-Prayer <pelloux@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
2012-12-20 18:23:51 -05:00
Marek Olšák
ef11ed61a0 r600g: add assertions to prevent creation of invalid surfaces 2012-12-20 17:13:18 +01:00
Jerome Glisse
50880314e3 Revert "r600g: work around ddx over alignment"
This reverts commit d8287bac1f.

Cause more issue than it fix. Need to think of a proper solution.
2012-12-19 09:56:17 -05:00
Jerome Glisse
d8287bac1f r600g: work around ddx over alignment
This force surface allocated from ddx to be consider as height
aligned on 8 and fix 1D->2D tiling transition that result from
this.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
2012-12-18 16:10:54 -05:00
Marek Olšák
448cd5ea60 winsys/radeon: don't use BIND flags, add a flag for the cache bufmgr instead 2012-12-12 13:09:54 +01:00
Marek Olšák
25409c6da8 gallium: remove pipe_surface::usage
Not really used by anybody now.

Reviewed-by: Brian Paul <brianp@vmware.com>
2012-12-12 13:09:54 +01:00
Marek Olšák
49f1104c44 r600g: transfers of MSAA color textures should do the resolve
so that ReadPixels and various fallbacks work.

Reviewed-by: Brian Paul <brianp@vmware.com>
2012-12-07 14:19:28 +01:00
Marek Olšák
186579e724 r600g: use LINEAR_ALIGNED tiling for 1D array textures and if height0 <= 3
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2012-11-13 17:17:05 +01:00
Marek Olšák
d4780fddb1 r600g: untiled window-system buffers should be LINEAR_ALIGNED
though I guess the DDX allocates them as LINEAR_GENERAL

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2012-11-13 15:00:37 +01:00
Marek Olšák
c9e5309223 r600g: use LINEAR_ALIGNED tiling for 1D textures
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2012-11-13 15:00:37 +01:00
Marek Olšák
ac4f61b232 r600g: use LINEAR_ALIGNED tiling for staging textures, reorder the code
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2012-11-13 15:00:37 +01:00
Marek Olšák
f5ac60152b r600g: remove redundant parameter in r600_init_surface 2012-11-13 00:34:35 +01:00
Dave Airlie
afcaa03f7e r600g: fix printk warnings
Brian reported seeing:
r600_texture.c: In function ‘r600_texture_create_object’:
r600_texture.c:468:12: warning: format ‘%llu’ expects type ‘long long unsigned int’, but argument 3 has type ‘uint64_t’
r600_texture.c:468:12: warning: format ‘%llu’ expects type ‘long long unsigned int’, but argument 4 has type ‘uint64_t’
r600_texture.c:485:12: warning: format ‘%llu’ expects type ‘long long unsigned int’, but argument 3 has type ‘uint64_t’
r600_texture.c:485:12: warning: format ‘%llu’ expects type ‘long long unsigned int’, but argument 4 has type ‘uint64_t’

this should wrap over them fine.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-11-10 06:39:38 +10:00
Dave Airlie
eb44c36df8 r600g: add initial cube map array support (v2)
This contains the evergreen support.

Support is possible on rv670 upwards and the code in here
should work, but it doesn't and I haven't debugged it to
figure out why.

Beyond just adding support for the cube map array sampling,
r600 resinfo isn't conformant with the GL specification,
which states the number of layers should be returned for
the textureSize, so we have to track in an external
constant buffer the layers for each sampler if we need
them in the shader.

v2: only update the sampler constants if the sampler views have changed,
as suggested by Marek.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-11-10 06:20:46 +10:00
Marek Olšák
428e37c2da r600g: add in-place DB decompression and texturing with DB tiling
The decompression is done in-place and only the compressed tiles are
decompressed. Note: R6xx-R7xx can do that only with Z16 and Z32F.

The texture unit is programmed to use non-displayable tiling and depth
ordering of samples, so that it can fetch the texture in the native DB format.

The latest version of the libdrm surface allocator is required for stencil
texturing to work. The old one didn't create the mipmap tree correctly.
We need a separate mipmap tree for stencil, because the stencil mipmap
offsets are not really depth offsets/4.

There are still some known bugs, but this should save some memory and it also
improves performance a little bit in Lightsmark (especially with low
resolutions; tested with Radeon HD 5000).

The DB->CB copy is still used for transfers.

Reviewed-by: Jerome Glisse <jglisse@redhat.com>
2012-11-06 02:54:16 +01:00
Marek Olšák
fa58644855 r600g: fix abysmal performance in Reaction Quake
The problem was we set VRAM|GTT for relocations of STATIC resources.
Setting just VRAM increases the framerate 4 times on my machine.

I rewrote the switch statement and adjusted the domains for window
framebuffers too.

NOTE: This is a candidate for the stable branches.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
2012-11-01 03:17:58 +01:00
Marek Olšák
369e468889 gallium: unify transfer functions
"get_transfer + transfer_map" becomes "transfer_map".
"transfer_unmap + transfer_destroy" becomes "transfer_unmap".

transfer_map must create and return the transfer object and transfer_unmap
must destroy it.

transfer_map is successful if the returned buffer pointer is not NULL.
If transfer_map fails, the pointer to the transfer object remains unchanged
(i.e. doesn't have to be NULL).

Acked-by: Brian Paul <brianp@vmware.com>
2012-10-11 21:12:16 +02:00
Marek Olšák
6db53ca490 r600g: don't modify pipe_resource in resource_copy_region, fixing race condition
pipe_resource can be shared between contexts, we shouldn't modify its
description. Instead, let's use the resource "views" (sampler views and
surfaces), where we can freely change almost any property of a resource.
2012-10-06 04:31:16 +02:00
Marek Olšák
e386972f5b r600g: don't use a staging resource for large transfers
It kills performance if the resource is linear.
2012-09-13 20:25:47 +02:00
Marek Olšák
78354011f9 r600g: implement color resolve for r600
The blend state is different and the resolve single-sample buffer must have
FMASK and CMASK enabled. I decided to have one CMASK and one FMASK
per context instead of per resource.

There are new FMASK and CMASK allocation helpers and a new buffer_create
helper for that.
2012-08-30 19:43:56 +02:00
Marek Olšák
8698a3b85d r600g: implement MSAA for r700
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
2012-08-30 19:43:55 +02:00
Marek Olšák
a3d9d7ec79 r600g: implement compression for MSAA colorbuffers for evergreen
This adds the FMASK and CMASK buffers. They share the same resource
with color data.

COMPRESSION and FAST_CLEAR are always enabled if both FMASK and CMASK are
allocated. We initialize the CMASK to a "compressed" state (not "fast cleared"),
so that we can keep FAST_CLEAR enabled all the time.

Both FMASK and CMASK must be present at the moment. If either one is missing,
the other one is not used.

v2: add cayman regs in the list

Reviewed-by: Jerome Glisse <jglisse@redhat.com>
2012-08-27 04:31:00 +02:00