This still fails, since 8192*4bpp == 32768, which is too big to use the
blitter on.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This will be used for handling updates of large textures.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>.
Now that we require 2.6.39, there's no need to also check for 2.6.29.
Calling drm_intel_bufmgr_gem_enable_fenced_relocs() without checking
should be safe, as it simply sets a flag.
This does remove the check for zero fences available, but that doesn't
seem worth checking.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Chris Wilson's relaxed relocation patch landed in March 2011. Anyone
running pre-3.0 kernels probably isn't going to get the latest Mesa
anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
These were likely used for BRW_NEW_... dirty bit flags at one point, but
they're unused now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Nobody uses this value, so there's no need to set it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When I removed the proj_attrib_mask optimization, I also removed the
last consumer of this bit without realizing it.
Since nobody uses it, there's no point in flagging it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Clover needs the irreader component of llvm
v2: Check for irreader component
irreader is only available with LLVM 3.3 >= 177971
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
It has 2 dependencies: glClampColor and the framebuffer, we might just as well
do the update where those two are changed.
v2: cosmetic changes from Brian's email
Reviewed-by: Brian Paul <brianp@vmware.com>
This should reduce shader recompilations with drivers that emulate fragment
color clamping, because we want the clamping to be enabled only if there is
a signed normalized or floating-point colorbuffer.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reported-by: `per` in #intel-gfx
The size of the cache key varies, so store the actual size as well as
the key blob itself, rather than just assuming it's the same as the size
passed in.
NOTE: This is a candidate for stable branches.
V2: Don't leave silly holes in structure; use unsigned instead of GLuint.
V3: Fix missing case for `last` match.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2:
- Only dump shaders when env variable is set.
v3:
- Don't emit VGT registers
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com
This function is a holdover from r600g and is identical to
si_pm4_inval_texture_cache(), so it is not needed.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com
This target string now contains four values instead of three. The old
processor field (which was really being interpreted as arch) has been split
into two fields: processor and arch. This allows drivers to pass a
more a more detailed description of the hardware to compiler frontends.
v2:
- Adapt to libclc changes
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Add UTIL_FORMAT_LAYOUT_ETC to util_format_is_compressed. It was missing.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Don't check if there's sampler support for stencil if we're not
going to actually blit/copy stencil values. Fixes the case where
we mistakenly said we can't support a blit of depth values from
S8Z24 to X8Z24.
Also, rename the is_stencil variable to dst_has_stencil to improve
readability.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Switch to use the envytools generated headers for register/bitfield
definitions. This is the first step in preparing to add a3xx support,
since it avoids having conflicting names for a3xx and a2xx registers.
And since I'm using envytools for a3xx it is simpler to just use it for
everything.
This shouldn't cause any functional change, it is really just a lot of
renaming.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Otherwise we will not receive destroy windows events, causing framebuffers
to leak.
This happens particularly with java and jogl.
Tested with java + jogl, MATLAB.
VMware Internal Bug Number: 1013086.
Reviewed-by: Brian Paul <brianp@vmware.com>
At least on llvm 3.2 this appears to work fine. Tested on an Athlon XP
2600+, which has sse and 3dnow but not sse2.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Build time option, set RADEON_CS_DUMP_ON_LOCKUP to 1 in radeon_drm_cs.h to
enable it.
When enabled after each cs submission the code will try to detect lockup by
waiting on one of the buffer of the cs to become idle, after a timeout it
will consider that the cs triggered a lockup and will write a radeon_lockup.c
file in current directory that have all information for replaying the cs.
To build this file :
gcc -O0 -g radeon_lockup.c -ldrm -o radeon_lockup -I/usr/include/libdrm
v2: Add radeon_ctx.h file to mesa git tree
v3: Slightly improve dumped file for easier editing, only dump first faulty cs
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
The default wrap mode (PIPE_TEX_WRAP_REPEAT) is incompatible with
unnormalized texcoords (at least for softpipe).
v2: use PIPE_TEX_WRAP_CLAMP_TO_EDGE
Reviewed-by: Marek Olšák <maraeo@gmail.com>
GLBenchmark 2.7's shaders contain conditional blocks like:
if (x) {
if (y) {
...
}
}
where the outer conditional's then clause contains exactly one statement
(the nested if) and there are no else clauses. This can easily be
optimized into:
if (x && y) {
...
}
This saves a few instructions in GLBenchmark 2.7:
total instructions in shared programs: 11833 -> 11649 (-1.55%)
instructions in affected programs: 8234 -> 8050 (-2.23%)
It also helps CS:GO slightly (-0.05%/-0.22%). More importantly,
however, it simplifies the control flow graph, which could enable other
optimizations.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This clarifies that the offset of 2 is actually 16 kB / 8kB units.
It also keys both computations off of a single variable, which should
make it easier to change in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These variables are only used within a single function, so we may as
well make them local variables.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This was only produced by the brw_wm_input_dimensions atom, which was
removed in the previous commit. So there's no need for the dirty bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This was only used to compute proj_attrib_mask, which was removed by the
previous commit. That makes this dead code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The previous commit removed the last user of this field, so there's no
longer any point in setting it. Removing this should eliminate
state-dependent recompiles, and make the precompile more reliable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This optimization attempts to avoid extra attribute interpolation
instructions for texture coordinates where the W-component is 1.0.
Unfortunately, it requires a lot of complexity: the brw_wm_input_sizes
state atom (all the brw_vs_constval.c code) needs to run on each draw.
It computes the input_size_masks array, then uses that to compute
proj_attrib_mask. Differences in proj_attrib_mask can cause
state-dependent fragment shader recompiles. We also often fail to guess
proj_attrib_mask for the fragment shader precompile, causing us to
needlessly compile it twice.
Furthermore, this optimization only applies to fixed-function programs;
it does not help modern GLSL-based programs at all. Generally, older
fixed-function programs run fine on modern hardware anyway.
The optimization has existed in some form since the initial commit. When
we rewrote the fragment shader backend, we dropped it for a while. Eric
readded it in commit eb30820f26 as part of
an attempt to cure a ~1% performance regression caused by converting the
fixed-function fragment shader generation code from Mesa IR to GLSL IR.
However, no performance data was included in the commit message, so it's
unclear whether or not it was successful.
Time has passed, so I decided to re-measure this. Surprisingly,
Eric's OpenArena timedemo actually runs /faster/ after removing this and
the brw_wm_input_sizes atom. On Ivybridge at 1024x768, I measured a
1.39532% +/- 0.91833% increase in FPS (n = 55). On Ironlake, there was
no statistically significant difference (n = 37).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is the same computation as the _WriteEnabled flag, so we may as
well use it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
ctx->Stencil.WriteMask is a statically sized array of 3 elements.
Checking it against 0 actually is a NULL check, and can never fail,
which meant that we always said stencil writes were enabled.
Use the new core Mesa derived state flag to fix this.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
i965 needs to know whether stencil writes are enabled in several places,
and gets the test wrong sometimes. While we could create a function to
compute this, it seems generally useful enough to warrant a new piece of
derived state. Also, all the plumbing is already in place.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The ar_ge_as_at variable was just very very confusing since the condition
was actually the other way around (as_at_ge_ar). So change the condition
(and the selects depending on it) to match the variable name.
And also change the chosen major axis in case the coord values are the
same. OpenGL doesn't care one bit which one is chosen in this case but
it looks like dx10 would require z chosen over y, and y chosen over x
(previously did x chosen over y, y chosen over z). Since it's all the
same effort just honor dx10's wishes. (Though actually, for some prefered
orderings, we could save one (or two with derivatives) selects since the
tnewx and tnewz (and the corresponding dmax values) are the same.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The way we were allocating registers before, packing into low register
numbers for Ironlake, resulted in an overly-constrained dependency graph
for instruction scheduling. Improves GLBenchmark 2.1 performance by
4.5% +/- 0.7% (n=26). No difference on my old GLSL demo (n=20). No
difference on nexuiz (n=15).
v2: Fix off-by-one bug that made the change only work for 16-wide on i965.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GCC 4.8 now warns about typedefs that are local to a scope and not
used anywhere within that scope. This produced spurious warnings with
the STATIC_ASSERT() macro (which used a typedef to provoke a compile
error in the event of an assertion failure).
This patch switches to a simpler technique that avoids the warning.
v2: Avoid GCC-specific syntax. Also update p_compiler.h.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>