Needs to be set for depth, stencil, and fmask just
like other blocks.
v2: drop additional cayman bits for now
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On cayman, 128bpp surfaces require non_disp ordering for hw
access to both linear and tiled surfaces. When we use the 3D
engine we can set the non_disp ordering on both the tiled and
linear sides (via CB or texture), but when we use the DMA
engine, we can only set the non_disp ordering on the tiled
side, so after a L2T operation with the DMA engine, the data
ends up in the wrong order on the tiled side.
v2: cayman/TN only
v3: fix comments
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=60802
Note: this is a candidate for the 9.1 branch.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2: Andreas Boll <andreas.boll.dev@gmail.com>
- Fix formatting - use one CFLAG per line
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59238
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
There were two different NUM_ENTRIES #defines for the framebuffer
tile cache and the texture tile cache. Rename the later to fix
the warnings:
In file included from sp_flush.c:40:0:
sp_tex_tile_cache.h:76:0: warning: "NUM_ENTRIES" redefined
sp_tile_cache.h:78:0: note: this is the location of the previous definition
In file included from sp_context.c:50:0:
sp_tex_tile_cache.h:76:0: warning: "NUM_ENTRIES" redefined
sp_tile_cache.h:78:0: note: this is the location of the previous definition
Also, replace occurances of NUM_ENTRIES with Element() macro to
be safer.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
- each softpipe_tex_tile_cache 50*64*64*4*4 = 3,276,800 bytes
- each softpipe_context has 3*32 softpipe_tex_tile_cache, i.e, each softpipe
context is 314,572,800 bytes, i.e, 300MB
That is, in a 32bits process (around 3GB virtual memory max), we can
only fit 10 contexts.
This change is a short-term hack to shrink the context size. Longer
term we'll need to change how the texture cache works.
Reviewed-by: Brian Paul <brianp@vmware.com>
We can't handle them yet, however we can safely just warn (we will
just render to first layer, which is fine since we can't handle
rendertarget system value neither).
Also make behavior more predictable with buffer surfaces
(it would sometimes hit bogus asserts because of the union in the surface,
instead create the surface but assert when trying to set a buffer
in the framebuffer).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Just delete unused kernels rather than marking them as internal and
running the GlobalDCE pass.
Also implement this function in C and inline it into
radeon_llvm_get_kernel_module()
In cases where the vertex element size is smaller than the vertex buffer
stride, the previous calculation could end up 1 too low. This would result
in the GPU using index 0 instead of the maximum index for those elements,
which would be visible as intermittent distorted triangles.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
And remove non-working code for indirect sampler/resource selection.
Will be added back later.
Includes code from "nv50/ir/tgsi: Resource indirect indexing" by
Francisco Jerez (when mixing the R and S handles we can only specify
them via a register, i.e. indirectly, unless we upload all the used
handle combinations to c[] space, which we don't for now).
Squashed and (heavily) modified original patches by Francisco Jerez:
nv50/ir/tgsi: Implement resource LOAD/STORE (wip).
nv50/ir/tgsi: Emit SUST/SULD for surface access, and add CB LOAD/STORE support
nv50/ir/tgsi: Fix/clean up the LOAD/STORE handling code.
Left out for now:
nv50/ir/tgsi: Resource indirect indexing
Treating raw, read-only surfaces as constant buffers (CBs) was removed
because CBs are limited to a size of 64 KiB which isn't desireable, and
because this decision should probably be made by the state tracker.
If we used a number of CB slots for surfaces, it might find that we
cannot accomodate the advertised limit.
OpenGL is nice and makes the user specify a format with an image unit.
OpenCL is evil and doesn't, and what's better than adding a huge load
of functions that we call indirectly to handle the conversion ?