This avoids a lot of message setup we had to do otherwise. Improves
GLB2.7 performance with register spilling force enabled by 1.6442% +/-
0.553218% (n=4).
v2: Use BRW_PREDICATE_NONE, improve a comment (by Paul).
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We were clearing the reg_offset before trying to use it. Oops. Fixes
glsl-fs-texture2drect with the reg spilling debug enabled.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Things blew up when I enabled the debug register spill code without
disabling 16-wide, so I decided to just fix 16-wide spilling.
We still don't generate 16-wide when register spilling happens as part of
allocation (since we expect it to be slower), but now we can experiment
with allowing it in some cases in the future.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
I believe this will never happen in SIMD8 mode, but it could for SIMD16
when we fix it.
v2: Fix off-by-one in my register counting comment (caught by Paul).
Reviewed-by: Paul Berry <stereotype441@gmail.com> (v1)
Now that reg spilling generates new vgrfs, we were looping forever if you
ever turned it on.
Instead, move the debug code into the register allocator right near where
we'd be doing spilling anyway, which should more accurately reflect how
register spilling occurs in the wild.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
I'm going to need to reuse this for fixing register spilling on SIMD16.
Note that BRW_MAX_MRF is 16, which is the same as BRW_MAX_GRF -
GEN7_MRF_HACK_START.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Fixes a boat load of Piglit tests for me, which crashed like fdo#70913
before.
Thanks to Michel Dänzer for the tip.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70913
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
The ICD loader should be responsible for installing headers.
Reviewed and Tested-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When faced with a million instructions that all became candidates at the
same time (none of which individually reduce register pressure), the ones
on the critical path are more likely to be the ones that will free up some
candidates soon.
shader-db:
total instructions in shared programs: 1681070 -> 1681070 (0.00%)
instructions in affected programs: 0 -> 0
GAINED: 40
LOST: 74
Fixes indistinguishable-from-hanging behavior in GLES3conform's
uniform_buffer_object_max_uniform_block_size test, regressed by
c3c9a8c857. Given that
93bd627d5a was unlocked by that commit, the
net effect on 16-wide program count is still quite positive, and I think
this should give us more stable scheduling (less dependency on original
instruction emit order).
v2: Comment suggestions by Paul
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70943
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This is a step in doing scheduling as described in Muchnick (p538). A
difference is that our latency function is only specific to one
instruction (it doesn't describe, for example, the different latency
between WAR of a send's arguments and RAW of a send's destination), but
that's changeable later. We also don't separately compute the postorder
traversal of the graph, since we can use the setting of the delay field as
the "visited" flag.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Commit aec20d66d9
(automake: properly handle non-default expat installation),
assumed that up-to date distributions use a recent version
of expat that handles security vunerabilities CVE-2012-1147
and CVE-2012-1148. Seems like this is not always the case
and they prefer to backport only the fix, rather than use
the updated library.
This commit adds a default case -lexpat whenever expat is
not found, while properly handling expat.pc if present.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71022
Reported-By: Bryce Harrington <b.harrington@samsung.com>
Reported-By: Vinson Lee <vlee@freedesktop.org>
Tested-by: Bryce Harrington <b.harrington@samsung.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This will simplify the addition of layout(location) qualifiers for
separate shader objects. This was validated with new piglit tests
arb_explicit_attrib_location/1.30/compiler/not-enabled-01.vert and
arb_explicit_attrib_location/1.30/compiler/not-enabled-02.vert.
v2: Refactor error checking to check_explicit_attrib_location_allowed
and eliminate the gotos. Suggested by Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Use mode_string to get the name of the variable mode. Slightly change
the control flow. Both of these changes make it easier to support
separate shader object location layouts.
The format of the message changed because mode_string can return a
string like "shader output". This would result in an awkward message
like "vertex shader shader output..."
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
I made this a function (instead of a method of ir_variable) because it
made the change set smaller, and I expect that there will be an overload
that takes an ir_var_mode enum. Having both functions used the same way
seemed better.
v2: Add missing case for ir_var_system_value.
v3: Change the ir_var_mode_count case to just break. Move the assertion
and the return outside the switch-statment. In the unlikely event that
var->mode is an invalid value other than ir_var_mode_count, the
assertion will still fire, and in release builds we won't wind up
returning a garbage pointer. Suggested by Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Since the separation of ir_var_function_in and ir_var_shader_in (similar
for out), this check is no longer necessary. Previously, global_scope
was the only way to tell which was which.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Future patches will add some extra code to this path, and some of that
code will want to exit from the explicit location code early.
v2: Change a geometry shader "break" to a "return" so that try to apply
a bogus geometry shader location qualifier (which could cause cascading
errors). Suggested by Paul.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The return value has been unused since commit d348b0c. This was
originally included in another patch, but it was split out by Ian
Romanick.
v2: Drop unnecessary final return. Suggested by Paul.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Incremental builds were failing because not all generated source files
were missing dependencies to src/mapi/glapi/gen/*.xml.
Hopefully this change will be the end of these incremental build
failures.
This avoids a defect in lower_output_reads.
The problem is lower_output_reads treats the gl_FragData array as a single
variable. It first redirects all output writes to a temporary variable (array)
and then writes the whole temporary variable to the output, generating
assignments to all elements of gl_FragData.
BTW this pass can be modified to lower all arrays, not just inputs and outputs.
The question is whether it is worth it.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2: addressed Paul Berry's comments
Use PKG_CHECK_MODULE over requesting the user to setup the
option at configure time. Drop unused EXPAT_INCLUDE and
update all targets.
NOTE: The this commit removes the --with-expat configure
option. One should ensure that the expat they wish to use
has expat.pc file accessible by pkg-config.
v2:
* Add note about the removal of --with-expat
(per Tom Stellard)
* Drop EXPAT_CFLAGS for targets that do not build DRI_COMMON
(spotted by Matt Turner)
v3:
* Rebase on top of megadrivers (drop EXPAT_CFLAGS from swrast)
Acked-by: Matt Turner <mattst88@gmail.com> (v2)
Reviewed-by: Tom Stellard <thomas.stellard@amd.com> (v2)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Conflicts:
configure.ac
src/mesa/drivers/dri/common/Makefile.am
Already available and used in other places of configure.ac.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The function should have never used it in the first place as it was
a left over from the DRI1 days of the nouveau ddx. While we're around
check if KMS is supported before opening the nouveau device, and
add support for Fermi & Kepler cards.
Compile tested only due to the lack of a Fermi/Kepler card.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The function xf86GetEntityInfo() retrieves the entity rather than
doing any changes. Remove this no-op code.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Commit a9f8baf00b removed the first and only use of the variables
but forgot to remove them.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Remove xf86PciInfo.h, all drivers provide their own PCI ID list
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
A convenient front end to indices generate/translate code, for emulating
primitives which are not supported natively by the driver.
This handles saving/restoring index buffer state, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The idea of the original order was that you'd dead code eliminate accesses
to push constants. But I've never seen a case of that (nor has
shader-db), while we frequently see sparse accesses of large constant
arrays that would overflow into pull constants.
Cuts pull constant use on csgo, serious sam, planeshift, and the cave:
total instructions in shared programs: 1695103 -> 1688795 (-0.37%)
instructions in affected programs: 92024 -> 85716 (-6.85%)
GAINED: 339
LOST: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* Use more consistant data sources
* Fix improper color space assignments
* Remove unnecessary comments and code
* Drop unnecessary round_up function (this was leftover
from moving winsys code out of renderer)
Acked-by: Brian Paul <brianp@vmware.com>
* Instead of assuming the displaytarget is the same
stride / colorspace as the destination, lets
actually check the source bitmap.
* Fixes random stride issues in rendering
Acked-by: Brian Paul <brianp@vmware.com>
The MRF variant is going to be used extensively by the atomic counter
intrinsics to assemble untyped atomic and surface read messages
easily.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The maximum number of atomic buffer objects is somewhat arbitrary, we
can change it in the future easily if it turns out it's not enough...
v2: Add comments with the relevant mesa dirty bits. Fix usage of
BRW_NEW_UNIFORM_BUFFER in the GS ABO state atom.
v3: Update binding table layout diagrams.
v4: Resolve conflicts with the recent dynamic surface index assignment changes.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Almost a trivial change, it boils down to renaming a few identifiers
so their names still make sense for opaque types other than sampler.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>