v2: remove superfluous mask, use buffer_size instead of constant
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Cleanup the code and implement indirect addressing.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
To further improve the optimization of source and destination
indirect addressing we need the ability to store a reference
to the declaration of the addressed operands.
Since most of the fields in tgsi_src_register doesn't apply for
an indirect addressing operand replace it with a separate
tgsi_ind_register structure and so make room for extra information.
v2: rename Declaration to ArrayID, put the ArrayID into () instead of []
Signed-off-by: Christian König <christian.koenig@amd.com>
Remember which declarations are declared as "arrays" and so
can be indirectly addressed. ArrayIDs start at 1, cause for
compatibility reasons zero is treaded as no array present.
Signed-off-by: Christian König <christian.koenig@amd.com>
Don't bother with free temporaries, just allocate them at
the end and also emit them in their own declaration.
Signed-off-by: Christian König <christian.koenig@amd.com>
Instead of emitting each temporary separately, emit them in a chunk.
v2: keep separate function for emitting temps
Signed-off-by: Christian König <christian.koenig@amd.com>
Needs to be set for depth, stencil, and fmask just
like other blocks.
v2: drop additional cayman bits for now
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On cayman, 128bpp surfaces require non_disp ordering for hw
access to both linear and tiled surfaces. When we use the 3D
engine we can set the non_disp ordering on both the tiled and
linear sides (via CB or texture), but when we use the DMA
engine, we can only set the non_disp ordering on the tiled
side, so after a L2T operation with the DMA engine, the data
ends up in the wrong order on the tiled side.
v2: cayman/TN only
v3: fix comments
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=60802
Note: this is a candidate for the 9.1 branch.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2: Andreas Boll <andreas.boll.dev@gmail.com>
- Fix formatting - use one CFLAG per line
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59238
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
This involved adding another driOptionCache to dri_screen. The
existing one just held the default values. But now we also need
to have the values from the DRI config file so that we can get at
the always_have_depth_buffer config option, which is per-screen.
wayland_roundtrip() was given an incorrect parameter.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=62362
Note: This is a candidate for the stable branches.
Signed-off-by: Brian Paul <brianp@vmware.com>
There were two different NUM_ENTRIES #defines for the framebuffer
tile cache and the texture tile cache. Rename the later to fix
the warnings:
In file included from sp_flush.c:40:0:
sp_tex_tile_cache.h:76:0: warning: "NUM_ENTRIES" redefined
sp_tile_cache.h:78:0: note: this is the location of the previous definition
In file included from sp_context.c:50:0:
sp_tex_tile_cache.h:76:0: warning: "NUM_ENTRIES" redefined
sp_tile_cache.h:78:0: note: this is the location of the previous definition
Also, replace occurances of NUM_ENTRIES with Element() macro to
be safer.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
- each softpipe_tex_tile_cache 50*64*64*4*4 = 3,276,800 bytes
- each softpipe_context has 3*32 softpipe_tex_tile_cache, i.e, each softpipe
context is 314,572,800 bytes, i.e, 300MB
That is, in a 32bits process (around 3GB virtual memory max), we can
only fit 10 contexts.
This change is a short-term hack to shrink the context size. Longer
term we'll need to change how the texture cache works.
Reviewed-by: Brian Paul <brianp@vmware.com>
We can't handle them yet, however we can safely just warn (we will
just render to first layer, which is fine since we can't handle
rendertarget system value neither).
Also make behavior more predictable with buffer surfaces
(it would sometimes hit bogus asserts because of the union in the surface,
instead create the surface but assert when trying to set a buffer
in the framebuffer).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Just delete unused kernels rather than marking them as internal and
running the GlobalDCE pass.
Also implement this function in C and inline it into
radeon_llvm_get_kernel_module()
If geometry shader is present its stream output info should
be used instead of the vs and we shouldn't use the pre-clipped
corrdinates.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
In cases where the vertex element size is smaller than the vertex buffer
stride, the previous calculation could end up 1 too low. This would result
in the GPU using index 0 instead of the maximum index for those elements,
which would be visible as intermittent distorted triangles.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>