Commit graph

11465 commits

Author SHA1 Message Date
Marek Olšák
1337da5115 r600g: implement edge flags
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2014-03-04 12:26:16 +01:00
Marek Olšák
ac35ded473 r600g: port color buffer format conversion from radeonsi
r600_translate_colorformat is rewritten to look like radeonsi.
r600_translate_colorswap is shared with radeonsi.
r600_colorformat_endian_swap is consolidated.

This adds some formats which were missing. Future "plain" formats will
automatically be supported.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2014-03-04 12:26:16 +01:00
Marek Olšák
dff3eccd15 radeonsi: move translate_colorswap to common code
Also translate the Y__X swizzle.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2014-03-04 12:26:16 +01:00
Brian Paul
465b2c42bc softpipe: use 64-bit arithmetic in softpipe_resource_layout()
To avoid 32-bit integer overflow for large textures.  Note: we're
already doing this in llvmpipe.

Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-03-03 10:41:42 -07:00
Rob Clark
ecb71cfa66 freedreno/a3xx/compiler: overflow in trans_endif
The logic to count number of block outputs was out of sync with the
actual array construction.  But to simplify / make things less fragile,
we can just allocate the arrays for worst case size.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-03-02 11:26:35 -05:00
Rob Clark
e0007f733d freedreno/a3xx/compiler: fix for resolving PHI's
A value may be assigned on only one side of an if/else.  In this case we
can simply substitute a mov.f32f32.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-03-02 11:26:35 -05:00
Rob Clark
26530716ab freedreno/lowering: two-sided-color
Add option to generate fragment shader to emulate two sided color.
Additional inputs are added to shader for BCOLOR's (on corresponding to
each COLOR input).  CMP instructions are used to select whether to use
COLOR or BCOLOR.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-03-02 11:26:35 -05:00
Rob Clark
8dd70125fc freedreno/a3xx/compiler: add SSG
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-03-02 11:26:35 -05:00
Rob Clark
44c8f96b0d freedreno/a3xx: fix gl_PointSize
If vertex writes pointsize, there are a few extra bits we need to turn
on in the cmdstream here and there.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-03-02 11:26:35 -05:00
Rob Clark
05a9bda971 freedreno: resync generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-03-02 11:26:35 -05:00
Rob Clark
cb540c21f2 freedreno/a3xx: binning-pass vertex shader variant
Now that we have the infrastructure for shader variants, add support to
generate an optimized shader for hw binning pass (with varyings/outputs
other than position/pointsize removed).  This exposes the possibility
that the shader uses fewer constants than what is bound, so we have to
take care to not emit consts beyond what the shader uses, lest we
provoke the wrath of the HLSQ lockup!

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-03-02 11:26:35 -05:00
Rob Clark
664045752f freedreno/a3xx: add support for frag coord/face
Fixes anything that tries to use gl_FrontFacing/gl_FragCoord.  Also,
face support is needed to emulate two sided color.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-03-02 11:26:35 -05:00
Rob Clark
76924e3b51 freedreno/a3xx: fix for unused inputs
An unused input might not have a register assigned.  We don't want bogus
regid to result in impossibly high max_reg..

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-03-02 11:26:35 -05:00
Siavash Eliasi
0fe8d71667 r300g/tests: Added missing fclose for FILE resource.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2014-02-28 15:57:15 -08:00
Tom Stellard
f61e382f0a r600g/compute: PIPE_CAP_COMPUTE should be false for pre-evergreen GPUs
This prevents clover from using unsupported devices.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

CC: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
2014-02-28 16:17:34 -05:00
Ilia Mirkin
51fc093421 nouveau: add a nouveau_compiler binary to compile TGSI into shader ISA
This makes it easy to compare output between different cards, especially
for ones that you don't have (and/or not in the current machine).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-02-26 23:35:48 -05:00
Ilia Mirkin
dd370f0af6 nv30: remove nv30_context use from nvfx_*prog
This should pave the way to being able to use the compiler without a
context. Also leads to cleaner code.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-02-26 23:35:47 -05:00
Ilia Mirkin
41dbc4c444 nv30: remove unused sprite flipping parameter
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-02-26 23:35:47 -05:00
Ilia Mirkin
fe2738f998 nv30: remove unused render_mode and hw_pointsprite_control
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-02-26 23:35:46 -05:00
Ilia Mirkin
8f23d08928 nv30: remove use_nv4x, it is identical to is_nv4x
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-02-26 23:35:45 -05:00
Michel Daenzer
59936a49dd radeonsi: Prevent geometry shader from emitting too many vertices 2014-02-27 10:27:55 +09:00
Chia-I Wu
bb9c8071ea ilo: create u_upload_mgr last
Similar to u_blitter, u_upload_mgr is now a client of the pipe context.  Its
creation needs to be delayed until the context has been (almost) initialized.
2014-02-26 11:33:37 +08:00
Ilia Mirkin
d1b1329c3a nv50: enable txg where supported
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-02-25 14:42:34 -05:00
Ilia Mirkin
0e71c65db0 nv50: enable cube map array texture support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-02-25 14:42:34 -05:00
Marek Olšák
9855477e90 r600g,radeonsi: consolidate create_surface and surface_destroy
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-02-25 16:08:26 +01:00
Marek Olšák
b9aa8ed009 radeonsi: inline util_blitter_copy_texture
This will be used for changing texture properties without modifying
pipe_resource like r600g, but not in this series. For now, this change
allows consolidation of pipe_surface functions.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-02-25 16:08:22 +01:00
Marek Olšák
f7176d700f radeonsi: remove useless psbox variable from resource_copy_region
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-02-25 16:08:20 +01:00
Marek Olšák
80eb377a37 radeonsi: compute depth surface registers only once
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-02-25 16:08:18 +01:00
Marek Olšák
629b019a40 radeonsi: compute color surface registers only once
Same as r600g.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-02-25 16:08:17 +01:00
Marek Olšák
6b4e03216a r600g: remove r600_resource.h
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-02-25 16:08:15 +01:00
Marek Olšák
ec266d06d0 r600g: remove r600_surface::htile_enabled
v2: use one of the htile registers instead

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-02-25 16:08:12 +01:00
Marek Olšák
7fc6ece40e r600g: use r600_surface::db_z_info
db_z_info was unused. This just renames the variable to match the register
name.

Now, db_depth_info is unused on Evergreen.
Both variables will be needed on SI though.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-02-25 16:08:10 +01:00
Marek Olšák
40b9812a76 r600g,radeonsi: share r600_surface
I'm gonna use this in radeonsi.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-02-25 16:08:08 +01:00
Marek Olšák
933eaeee25 radeonsi: move PA_SU_POLY_OFFSET_DB_FMT_CNTL to framebuffer state
It doesn't depend on anything else.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-02-25 16:08:05 +01:00
Marek Olšák
db8886ed09 gallium: the other drivers don't support ARB_buffer_storage
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
2014-02-25 16:07:33 +01:00
Marek Olšák
6381dd7e9d r300g,r600g,radeonsi: add support for ARB_buffer_storage
All GTT memory mappings are coherent and therefore can be persistent.

Reviewed-by: Fredrik Höglund <fredrik@kde.org>
2014-02-25 16:05:41 +01:00
Marek Olšák
5f61f052b5 gallium: add interface for persistent and coherent buffer mappings
Required for ARB_buffer_storage.
2014-02-25 16:05:41 +01:00
Emil Velikov
882070cc81 nv50: correctly calculate the number of vertical blocks during transfer map
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-02-25 12:19:07 +00:00
Dave Airlie
2fcbec48d7 gallium: add texture gather support to gallium (v3)
This adds support to gallium for a TG4 instruction,
and two CAPs. The first CAP is required for GL_ARB_texture_gather.

The second CAP is required to expose GL_ARB_gpu_shader5.

However so far we haven't found any hardware that natively
exposes the textureGatherOffsets feature from GL, so just
lower it for now. If hardware appears for this we can add
another CAP to allow TG4 to take 4 offsets.

v2: add component selection src and a cap to say
hw can do it. (st can use to help control
GL_ARB_gpu_shader5/GLSL 4.00). Add docs.

v3: rename to SM5, add docs.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2014-02-25 13:29:17 +10:00
Tom Stellard
945d87f958 clover: Pass buffer offsets to the driver in set_global_binding() v3
The offsets will be stored in the handles parameter.  This makes
it possible to use sub-buffers.

v2:
  - Style fixes
  - Add support for constant sub-buffers
  - Store handles in device byte order

v3:
  - Use endian helpers

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2014-02-24 12:56:27 -08:00
Tom Stellard
eac7236042 radeonsi: Use SI_BIG_ENDIAN now that it exists
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-02-24 12:56:27 -08:00
Tom Stellard
8f3bcedde2 r600g: Use util_cpu_to_le32() instead of bswap32() on big-endian systems
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-02-24 12:56:27 -08:00
Tom Stellard
195ee10673 radeonsi: Use util_cpu_to_le32() instead of bswap32() on big-endian systems
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2014-02-24 12:56:27 -08:00
Rob Clark
3f7239ca0e freedreno/a3xx/compiler: half-precision output
Using generic shaders caused a measurable fps drop, which was isolated to
use of full precision (vs half precision) output.  This is an attempt to
regain that lost performance by using half precision solid/blit shaders
(when the output format is not float32).

Note: for the built-in shaders, I would not expect them to be register
starved.  And in fact it is the solid frag shader that seems to have the
biggest impact.  So I suspect you get double the pixel pipe units (or
half the cycles) when the output is half precision.  So there may be
some gain to using half precision output for application shaders as
well, even though the rest of register usage is still full precision.
But for half precision to work for more complex shaders, we need to deal
with some constraints, like cat2 needing same precision for it's two src
registers.  So for now it is not enabled by default except for the
built-in shaders.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-02-23 14:58:24 -05:00
Rob Clark
141ae71671 freedreno/a3xx: add shader variants
Start putting in place infrastructure to deal with multiple shader
variants.  Initially we'll use this for two sided color (frag) and
binning pass (vert) shaders.  Possibly need for others later (such
as YUV vs RGB eglImage?).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-02-23 14:58:23 -05:00
Rob Clark
9bbfae6265 freedreno/a3xx/compiler: collapse nop's with repeat
Easier than making more extensive use of rpt, and the more compact
shaders seem to bring some bit of performance boost.  (Perhaps repeat
flag benefits are more than just instruction cache, possibly it saves
on instruction decode as well?)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-02-23 14:58:23 -05:00
Rob Clark
bb255fdf06 freedreno/a3xx: drop hand-coded blit/solid shaders
Instead in the common code, construct these shaders from TGSI.  For now
we let a2xx keep it's hand coded shaders, as it's compiler isn't quite
up to the job yet.  All the same it is a net drop in code size and gets
rid of special cases.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-02-23 14:58:23 -05:00
Rob Clark
1c953b7cda freedreno/lowering: cleanup api
Make things configurable, and tweak the API a bit to avoid an extra
tgsi_shader_scan().  Getting closer to something generic which can be
moved out of freedreno and shaderd by other drivers.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-02-23 14:58:23 -05:00
Rob Clark
67cea4b32a freedreno/a3xx: add float 16 and 32bit formats
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-02-23 14:58:23 -05:00
Rob Clark
e819885b99 freedreno: resync generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-02-23 14:58:23 -05:00