r600_translate_colorformat is rewritten to look like radeonsi.
r600_translate_colorswap is shared with radeonsi.
r600_colorformat_endian_swap is consolidated.
This adds some formats which were missing. Future "plain" formats will
automatically be supported.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
To avoid 32-bit integer overflow for large textures. Note: we're
already doing this in llvmpipe.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The logic to count number of block outputs was out of sync with the
actual array construction. But to simplify / make things less fragile,
we can just allocate the arrays for worst case size.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
A value may be assigned on only one side of an if/else. In this case we
can simply substitute a mov.f32f32.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Add option to generate fragment shader to emulate two sided color.
Additional inputs are added to shader for BCOLOR's (on corresponding to
each COLOR input). CMP instructions are used to select whether to use
COLOR or BCOLOR.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
If vertex writes pointsize, there are a few extra bits we need to turn
on in the cmdstream here and there.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Now that we have the infrastructure for shader variants, add support to
generate an optimized shader for hw binning pass (with varyings/outputs
other than position/pointsize removed). This exposes the possibility
that the shader uses fewer constants than what is bound, so we have to
take care to not emit consts beyond what the shader uses, lest we
provoke the wrath of the HLSQ lockup!
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Fixes anything that tries to use gl_FrontFacing/gl_FragCoord. Also,
face support is needed to emulate two sided color.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
An unused input might not have a register assigned. We don't want bogus
regid to result in impossibly high max_reg..
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This prevents clover from using unsupported devices.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
CC: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
This makes it easy to compare output between different cards, especially
for ones that you don't have (and/or not in the current machine).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This should pave the way to being able to use the compiler without a
context. Also leads to cleaner code.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Similar to u_blitter, u_upload_mgr is now a client of the pipe context. Its
creation needs to be delayed until the context has been (almost) initialized.
This will be used for changing texture properties without modifying
pipe_resource like r600g, but not in this series. For now, this change
allows consolidation of pipe_surface functions.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
db_z_info was unused. This just renames the variable to match the register
name.
Now, db_depth_info is unused on Evergreen.
Both variables will be needed on SI though.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This adds support to gallium for a TG4 instruction,
and two CAPs. The first CAP is required for GL_ARB_texture_gather.
The second CAP is required to expose GL_ARB_gpu_shader5.
However so far we haven't found any hardware that natively
exposes the textureGatherOffsets feature from GL, so just
lower it for now. If hardware appears for this we can add
another CAP to allow TG4 to take 4 offsets.
v2: add component selection src and a cap to say
hw can do it. (st can use to help control
GL_ARB_gpu_shader5/GLSL 4.00). Add docs.
v3: rename to SM5, add docs.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The offsets will be stored in the handles parameter. This makes
it possible to use sub-buffers.
v2:
- Style fixes
- Add support for constant sub-buffers
- Store handles in device byte order
v3:
- Use endian helpers
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Using generic shaders caused a measurable fps drop, which was isolated to
use of full precision (vs half precision) output. This is an attempt to
regain that lost performance by using half precision solid/blit shaders
(when the output format is not float32).
Note: for the built-in shaders, I would not expect them to be register
starved. And in fact it is the solid frag shader that seems to have the
biggest impact. So I suspect you get double the pixel pipe units (or
half the cycles) when the output is half precision. So there may be
some gain to using half precision output for application shaders as
well, even though the rest of register usage is still full precision.
But for half precision to work for more complex shaders, we need to deal
with some constraints, like cat2 needing same precision for it's two src
registers. So for now it is not enabled by default except for the
built-in shaders.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Start putting in place infrastructure to deal with multiple shader
variants. Initially we'll use this for two sided color (frag) and
binning pass (vert) shaders. Possibly need for others later (such
as YUV vs RGB eglImage?).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Easier than making more extensive use of rpt, and the more compact
shaders seem to bring some bit of performance boost. (Perhaps repeat
flag benefits are more than just instruction cache, possibly it saves
on instruction decode as well?)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Instead in the common code, construct these shaders from TGSI. For now
we let a2xx keep it's hand coded shaders, as it's compiler isn't quite
up to the job yet. All the same it is a net drop in code size and gets
rid of special cases.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Make things configurable, and tweak the API a bit to avoid an extra
tgsi_shader_scan(). Getting closer to something generic which can be
moved out of freedreno and shaderd by other drivers.
Signed-off-by: Rob Clark <robclark@freedesktop.org>