Overall a nice 5-10% gain for most games. And more for things like
glmark2 texture benchmark.
There are some rough edges. In particular, the hardware seems to only
support tiling or component swap. (Ie. from hw PoV, ARGB/ABGR/RGBA/
BGRA are all the same format but with different component swap.) For
tiled formats, only ARGB is possible. This isn't a big problem for
*sampling* since we also have swizzle state there (and since
util_format_compose_swizzles() already takes into account the component
order, we didn't use COLOR_SWAP for sampling). But it is a problem if
you try to render to a tiled BGRA (for example) surface.
The next patch introduces a workaround for blitter, so we can generate
tiled textures in ABGR/RGBA/BGRA, but that doesn't help the render-
target case. To handle that, I think we'd need to keep track that the
tiled format is different from the linear format, which seems like it
would get extra fun with sampler views/etc.
So for now, disabled by default, enable with FD_MESA_DEBUG=ttile. In
practice it works fine for all the games I've tried, but makes piglit
grumpy.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The rules are sufficiently different for a5xx with tiled textures, so
split this out into something that can be implemented per-generation.
The a5xx specific implementation will come in a later patch.
Signed-off-by: Rob Clark <robdclark@gmail.com>
zlib provides a faster slice-by-4 CRC32 implementation than the
traditional single byte lookup one used by mesa. As most supported
platforms now link zlib unconditionally, we can easily use it.
Improvement for a 1MB buffer (avg MB/s, n=100, zlib 1.2.8):
i5-6600K C2D E4500
mesa zlib mesa zlib
443 1443 225% +/- 2.1% 403 1175 191% +/- 0.9%
It has been verified the calculation results stay the same after this
change.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The next change wants to use some optional zlib functionality, however
not all platforms currently use zlib. Based on earlier Jordan Justen's
patches and their review feedback.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Fixes a number of int64 piglit tests, for example:
generated_tests/spec/arb_gpu_shader_int64/execution/built-in-functions/fs-sign-i64vec2.shader_test
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
V2: just zero-extend the 32-bit value.
Fixes a number of int64 piglet tests, for example:
generated_tests/spec/arb_gpu_shader_int64/execution/conversion/frag-conversion-explicit-bool-int64_t.shader_test
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
assert() is replaced by unreachable(), to avoid following building error:
external/mesa/src/gallium/drivers/radeonsi/si_shader.c:1967:1:
error: control may reach end of non-void function [-Werror,-Wreturn-type]
}
^
1 error generated.
Fixes: c797cd6 ("ac: add load_patch_vertices_in() to the abi")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Fixes a bunch of arb_gpu_shader_fp64 piglit tests for example:
generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-mix-double-double-double.shader_test
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
I had 3.x putting swizzling in the texture state only for 16-bit texture
returns, and in the shader for 32-bit. This may be due to having mixed up
the return channel setup on 3.x back before I had moved it into the
compiler. On 4.x, the non-border-color texwrap tests are passing nicely
with both 16 and 32-bit returns with swizzling in the texture state.
The LDVARY signal now writes an arbitrary register, so I took out the
magic src register file and replaced it with an instruction with LDVARY
set so we have somewhere to hang a QFILE_TEMP destination for register
allocation.
The V3D 3.x series of TMU writes with meaning depending on the texture
type is replaced with writes to specific registers for each texture
argument semantic.
This fills in the delay slots of thread end as much as we can (other than
being cautious about potential TLBZ writes).
In the process, I moved the thread end THRSW instruction creation to the
scheduler. Once we start emitting THRSWs in the shader, we need to
schedule the thread-end one differently from other THRSWs, so having it in
there makes that easy.
Now, instead of a magic write register for VPM stores we have an
instruction to do them (which means no packing of other ALU ops into it),
with the ability to reorder the VPM stores due to the offset being baked
into the instruction.
VPM loads also gain the ability to be reordered by packing the row into
the A argument. They also no longer write to the r3 accumulator, and
instead must be stored to a physical register.
This required moving the register accesses to a separate v3dx file, since
the register definitions for each V3D version collide. It seems that
initializing the v3d_hw from a file dictating 3.3
(v3d_simulator_wrapper.cpp) is safe, though.
The WRTMUC replaces the implicit uniform loads in the first two texture
instructions. LDVPM disappears in favor of an ALU op. LDVARY, LDTMU,
LDTLB, and LDUNIF*RF now write to arbitrary registers, which required
passing the devinfo through to a few more functions.
The TLB load/store path is rebuilt in this version. There is no longer a
single-byte resolved store or the 3-byte extended store. Instead, you get
to always use general loads/stores (which, honestly, was tempting even in
previous versions).
To conditionally compile cl_emit() macros per V3D version, we need it to
expand to whatever V3D we're building for. This required emitting #define
V3D_VERSION 33 in all our currently 3.3-only code.