We need to allocate new space every time to avoid blocking on the last
HiZ op completing. There are two easy ways to do this:
brw_state_batch() and intel_upload_data(). brw_state_batch() is
simpler and avoids another buffer allocation.
Improves Unigine Tropics performance 0.376416% +/- 0.148722% (n=7).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The ralloc string appending functions were originally intended for
simple, non-hot-path uses like printing to an info log.
Cuts Unigine Tropics load time by around 20% (6 seconds).
v2: Avoid strlen() on every newline, too.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Acked-by: José Fonseca <jfonseca@vmware.com> [v1]
Both callers of rewrite_tail immediately compute the new total string
length by adding the (known) length of the existing string plus the
length of the newly appended text. Unfortunately, callers generally
won't know the length of the new text, as it's printf-formatted.
Since ralloc already computes this length, it makes sense to add it in
and save the caller the effort. This simplifies both existing callers,
but more importantly, will allow for cheap-appending in the next commit.
v2: The link_uniforms code needs both the old and new length.
Apply the obvious fix (which sadly makes it less of a cleanup).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Acked-by: José Fonseca <jfonseca@vmware.com> [v1]
This adds support for all the opcodes needed for native integer
support with GLSL 1.20 enabled, and some of the ones for GLSL1.30
support.
I've split them between non-cpu and cpu along the same lines
Tom's code did for the other ones I think, but I'm open to review
on which ones should go where.
With instance ids fixed I get no regressions on my box here
with LLVM 2.8, will test with later LLVMs as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Backends usually advertise a SVGA3D_DEVCAP_MAX_POINT_SIZE between 63 and
256, so an hardcoded max point size of 80 is often incorrect.
This limitation does not apply for anti-aliased points (as they are done
via draw module) but we still advertise the same limit for both, because
all others pipe drivers do.
Reviewed-by: Brian Paul <brianp@vmware.com>
Mesa has a fast path for the generic fallback when using glReadPixels
for RGBA data which uses memcpy. However it was really difficult to
hit this case because it would not be used if any transferOps are
enabled. Any type apart from floating point or non-normalized integer
types (so any of the common types) would force enabling clamping so
the fast path could not be used. This patch makes it ignore clamping
when determining whether to use the fast path if the data type of the
buffer is an unsigned normalized type because in that case clamping
will not have any effect anyway.
https://bugs.freedesktop.org/show_bug.cgi?id=46631
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Brian Paul <brianp@vmware.com>
postpone unreferences until end of function, as the ones in use will
get naturally dereferenced.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Can't see any reason this wouldn't be better off as an inline.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some backends may advertise more temps than SVGA3D_TEMPREG_MAX, but the
driver is hardwired to only support up to the value defined by
SVGA3D_TEMPREG_MAX, so clamp to it.
Reviewed-by: Brian Paul <brianp@vmware.com>
r600g is the only driver which has made use of it. The reason the CAP was
added was to fix some piglit tests when the GLSL pass lower_output_reads
didn't exist.
However, not removing output reads breaks the fallback for glClampColorARB,
which assumes outputs are not readable. The fix would be non-trivial
and my personal preference is to remove the CAP, considering that reading
outputs is uncommon and that we can now use lower_output_reads to fix
the issue that the CAP was supposed to workaround in the first place.
KILP instruction inside IF blocks were being lowered to an unconditional
KIL. Since r300 doesn't support branching, when the IF's were lowered
to conditional moves, the KIL would always be executed. This is not a
problem with the mesa state tracker, because the GLSL compiler handles
lowering IF's, but this bug was appearing in the VDPAU state tracker,
which does not use the GLSL compiler.
Note: This is a candidate for the stable branches.
The xvmc state tracker is completely seperate and
doesn't shares code or anything else with the
xorg state tracker.
Signed-off-by: Christian König <deathsimple@vodafone.de>
This patch allows the Mac OS X SCons build to complete. The assembly
sources contain psuedo-ops that not are supported on Mac OS X.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We were inverting the meaning of the stencil op flags: in svga/d3d
the normal incr/decr wraps and the SAT ops clamp.
This fixes piglit failures (at least stencil-twoside and stencil-wrap).
We should backport this everywhere we can.
Reviewed-by: Brian Paul <brianp@vmware.com>
Two of the switch cases used PIPE_FORMAT_ tokens instead of SVGA3D_ tokens.
As it happens, the token values are equal for these formats so there's no
net change.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>