The functions are prototyped in u_transfer.h and are related to the
other functions in u_transfer.c.
The next patch will re-use the u_resource.c file for new code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Ported from r600g commit:
8891b2f9c9
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
NOTE: This is a candidate for the 9.1 branch.
The fallbacks count is the number of drawing calls that use a "draw"
module fallback, such as polygon stipple.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
NOTE: Changed the semantic index for the drawtex coordinate to
be the texture unit index instead of always 0.
Not sure if this is correct but since the value seems to depend
on the unit it would make sense to use different varying slots.
This still isn't optimal, since the fence will signal a bit late,
but better than checking on the bo, which may never be ready if it
is shared (which is likely).
Also, renamed "pixels-rendered" to "samples-passed" because the
occlusion counter increments even if colour and depth writes are
disabled, or (on some implementations) for killed fragments that
passed the depth test when PS early_fragment_tests is set.
This patch consolidates duplicate code in the brw_depthbuffer and
gen7_depthbuffer state atoms. Previously, these state atoms contained
5 chunks of code for emitting the _3DSTATE_DEPTH_BUFFER packet (3 for
Gen4-6 and 2 for Gen7). Also a lot of logic for determining the
appropriate buffer setup was duplicated between the Gen4-6 and Gen7
functions.
This refactor splits the code into three separate functions:
brw_emit_depthbuffer(), which determines the appropriate buffer setup
in a mostly generation-independent way, brw_emit_depth_stencil_hiz(),
which emits the appropriate state packets for Gen4-6, and
gen7_emit_depth_stencil_hiz(), which emits the appropriate state
packets for Gen7.
Tested using Piglit on Gen5-7 (no regressions).
v2: Re-word some comments. Fix an assertion that incorrectly
prohibited packed depth/stencil formats on Gen6 (these are allowed
provided that HiZ is disabled).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit dbf94d105a, which
was working around a bug in the handling of array indexing when
constant folding built-in functions. Now that the constant folding
bug has been fixed, the workaround is no longer needed.
Mesa constant-folds built-in functions by using a miniature GLSL
interpreter (see
ir_function_signature::constant_expression_evaluate_expression_list()).
This interpreter had a bug in its handling of array indexing, which
caused expressions like "m[i][j]" (where m is a matrix) to be handled
incorrectly. Specifically, it incorrectly treated j as indexing into
the whole matrix (rather than indexing just into the vector m[i]); as
a result the offset computed for m[i] was lost and m[i][j] was treated
as m[j][0].
Fixes piglit tests inverse-mat[234].{vert,frag}.
NOTE: This is a candidate for the 9.1 and 9.0 branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57436
Conceptually the same as previously done in float_to_half.
Should cut down number of instructions from 14 to 10 or so, but
will promote some NaNs to Infs, so it's disabled.
It gets a bit tricky though handling all the cases correctly...
Passes basic tests either way (though there are no tests testing special
cases, but some manual tests injecting them seemed promising).
v2: style and comment fixes suggested by Jose
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This replaces the existing float-to-half implementation.
There are definitely a couple of differences - the old implementation
had unspecified(?) rounding behavior, and could at least in theory
construct Inf values out of NaNs. NaNs and Infs should now always be
properly propagated, and rounding behavior is now towards zero
(note this means too large but non-Infinity values get propagated to max
representable value, not Infinity).
The implementation will definitely not match util code, however (which
does nearest rounding, which also means too large values will get
propagated to Infinity).
Also fix a bogus round mask probably leading to rounding bugs...
v2: fix a logic bug in handling infs/nans.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reduced stack size allows to run more threads in some cases,
improving performance for the shaders that use stack (that is, for the
shaders with control flow instructions). E.g. with unigine-based apps.
v4: implement exact computation taking into account wavefront size
v5: add cases for RV620, RS880
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
v2: reduce key size, don't copy key around to much.
v3: remove key size reduction
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This works different than on R600, we need to add the start instance manually.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
This should be used by both SI and R600.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes mplayer -vo vdpau OSD.
NOTE: This is a candidate for the 9.1 branch.
Reported-by: Igor Vagulin <igor.vagulin@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Christian König <christian.koenig@amd.com>
A fix for lower_jumps progress reporting, very much like similar in
c1e591eed.
NOTE: This is a candidate for stable branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
All the other expression types allowed here have inst->mlen == 0, and this
one has implied MRF writes for all of its payload, so nothing else in the
implementation should need to change.
Reduces SEND messages for loading from pull constants in kwin's Lanczos
shader from 16 to 6. (Due to a deficiency in constant propagation, I
can't use the hack I did in the previous commit to test the performance
change)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
NOTE: This is a candidate for the 9.1 branch.
This comes at a minor performance cost at the moment (-3.2% +/- 0.2%, n=14 on
my GM45 forced to load all uniforms through the varying-index path), but we
get a whole vec4 at a time to reuse in the next commit.
v2: Fix comment about channels in the other message.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 9.1 branch.
We weren't setting needs_dep[i] in the loops, so we'd continue on to
potentially add the same workaround MOVs to the later basic block
boundaries, too. We can either set needs_dep[i] to exit through the
normal path, or we can just return since we know we're done.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>