In ureg src registers could have an indirect register that was
either a temp or an addr register, while dst registers allowed
only addr. That made moving between them a little difficult so
make them behave the same way and allow temp's and addr registers
as indirect files for both (tgsi supports it, just ureg didn't).
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
A lot of them were missing. Others were moved from the Compute ISA
to a new Integer ISA section as that seemed more appropriate.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Eliminating this we no longer need to copy between linear and swizzled layout.
This is probably not quite ideal since it's a bit more work for now, could do
some optimizations by moving depth testing outside the fragment shader loop
(but tricky for early depth test as we don't have neither the mask nor the
interpolated z in the right order handy).
The large amount of tile/untile code is no longer needed will be deleted
in next commit.
No piglit regressions.
v2: change a forgotten LAYOUT_NONE to LAYOUT_LINEAR.
v3: fix (bogus) uninitialized variable warnings, add comments, fix a bad type
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Assigning a struct only copies the members - any padding is left as is.
Thus this code:
struct foo_t foo;
foo = bar;
leaves the padding of foo intact, ie uninitialized random garbage.
This patch fixes constant shader recompiles by initializing the struct
to zero. For completeness, memcpy is used to copy the key to the shader
struct.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
v2: Removed extra libs as requested by Matt Turner.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
One build system for linux/unix only drivers should be enough.
Additionally the nouveau target was disabled anyway.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
post_scheduler clears interference set for reallocatable values when
the value becomes live first time, and then updates it to take into
account modified order of operations, but this was not handled properly
if the value appears first time as a source in copy operation.
Fixes issues with webgl demo: http://madebyevan.com/webgl-water/
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Some inputs may be preloaded into predefined GPRs,
so we can't reallocate arrays with such inputs.
Fixes issues with webgl demo: http://oos.moxiecode.com/js_webgl/snake/
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
New disassembler is not completely isolated yet from further processing
in r600g/sb that is not required for printing the dump, so it has higher
probability to fail in case of any unexpected features in the bytecode.
This patch adds "sbdisasm" flag for R600_DEBUG that allows to use new
disassembler in r600g/sb for shader dumps when shader optimization
is not enabled.
If shader optimization is enabled, new disassembler is used by default.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Mesa build is too complex to rely on successful builds. On refactorings
it is always a good idea to use git grep to prevent missing cases:
$ git grep u_assembled_primitive
src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c: u_assembled_primitive(in_prim);
The differences from the previous releases that affect st/egl are
- logging macros are prefixed with an 'A'
- dequeueBuffer() and enqueueBuffer() require an additoinal argument for
fence fd, acquired from libsync
Additionally, include gralloc_drm.h with extern "C".
The function returns the number of reduced/tessellated primitives for the
given vertex count.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Zack Rusin <zackr@vmware.com>
Switch to '>=' for comparisons, and it becomes obvious that the comparison for
PIPE_PRIM_QUAD_STRIP was wrong.
Add minimum vertex count check for PIPE_PRIM_LINE_LOOP. Return 1 for
PIPE_PRIM_POLYGON with 3 vertices.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Zack Rusin <zackr@vmware.com>
As a side effect, primitives with adjacency are now correctly validated.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Zack Rusin <zackr@vmware.com>
Move together (or add) functions to decompose/reduce/assemble a primitive,
give them consistent names, and document them. Add u_prim_vertex_count() so
that the vertex count information can be used elsewhere.
u_assembled_primitive() will be removed in a folow-on commit.
[olv: fix a warning when -Wold-style-declaration is enabled]
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Zack Rusin <zackr@vmware.com>
While this is ignorant of dependency control, it's still good for a 0.39%
+/- 0.08% performance improvement on GLBenchmark 2.7 (n=548)
v2: Rewrite as a subclass of the base class for the FS instruction
scheduler, inheriting the same latency information.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These will get virtualized as we add VS scheduling support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I need this so I can look at vec4 and fs registers' files from the same
.cpp file without namespaces. As far as I can tell we never rely on the
particular numerical values of the files, though I thought it sounded like
a good idea when doing the VS (it turns out having 0 be BAD_FILE is nicer).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This will free instruction scheduling to make better choices. No
statistically significant performance difference on GLB2.7 (n=93).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
instead of crashing just fill zeros at the input slots that don't
match, that's the mandated behavior and it avoids debug asserts.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It's valid because we reuse certain arithmetic operations
for both signed and unsigned types (e.g. uadd, umad, which
have a bit unfortunate naming)
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The GPU apparently goes looking for constants even though there are no
shader stages enabled, and gets stuck because we haven't told it there are
no constants to collect. If any other user of the 3D pipeline had run
(even the Render accel of the X server!) since power on, then the in-GPU
constant buffers would have been set up with some contents we didn't use,
and we would succeed.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56416
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Dave Airlie <airlied@redhat.com>
NOTE: This is a candidate for the stable branches.
We are already emitting a EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet
when this flush flag is set, so flushing the dest caches with a
SURFACE_SYNC should not be necessary.
The motivation for this change is that emitting a SURFACE_SYNC packet with
the CB bits set was causing compute shaders to hang on Cayman.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Remove all the glDraw* functions from the GLvertexformat structure.
The point of that dispatch struct is to handle all the functions which
dispatch differently depending on whether we're inside glBegin/End.
glDraw* are never allowed inside glBegin/End so we can remove those
entries.
This simplifies the code paths and gets rid of quite a bit of code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>