Tested on ILK and CTG (with the GL3isms taken out of the piglits).
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The conversion code for srgb was tuned for n x 4x8bit AoS -> 4 x nxfloat SoA
(and vice versa), fix this to handle also 16bit 565-style srgb formats.
Still not really all that generic, things like r10g10b10a2_srgb or
r4g4b4a4_srgb wouldn't work (the latter trivial to fix, the former would not
require more work to not crash but near certainly need some higher precision
calculation) but not needed right now.
The code is not fully optimized for this (could use more direct calculation
instead of expanding to 8-bit range first) but should be good enough.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
GL generally doesn't seem to allow srgb formats with less (or more) than 8 bit
for the rgb channels, though some hw could easily do it (typically for formats
with up to 10 bits for the rgb channels, at least for formats with less than 8
bits support is likely widespread even). While it may be true there aren't
really any benefits for such formats, we need for it for d3d, though luckily
only for b5g6r5_srgb it seems.
So add this format along with the util code for conversion - since that util
code is heavily tuned for 8bit srgb this isn't really all that well optimized
and rounding doesn't seem right but at least it should give some halfway
meaningful results.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The nvc0 texfetch instruction expects the sample id to be in the second
source (usually used for the offset) rather than as part of the texture
coordinate.
This fixes all the sampler2DMS/Array tests on nvc0.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
This fallback to triangle strips is silly and should be done in drivers
if they need it.
This should fix the case when quad strips are used with flatshading that is
enabled by the "flat" GLSL varying modifier. It also fixes primitive restart
for quad strips.
This fixes piglit:
NV_primitive_restart/primitive-restart-draw-mode-quad_strip
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
The last changes to it are from 2008 and 2009.
It doesn't support most texture formats and some texture targets.
Nobody can possibly be using this.
Reviewed-by: Brian Paul <brianp@vmware.com>
It didn't use the driver-provided src/dstRowStride at all.
This was broken for the cases when stride != width*bpp.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
This code is a slightly modified version of evergreen_dma_blit (and
evergreen_dma_copy as well as evergreen_dma_copy_tile).
It would be nice to share some of the code in the long term.
I have reused some "cik"-prefixed functions that also return the right
value for SI. I am not sure if they should be renamed.
v2: Marek> removed gfx.flush in si_dma_copy_tile
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The AoS version of ld_build_blend_factor was assuming that if the first
channel was alpha, there were no rgb components.
Fixes glean/blendFunc on System z. No piglit regressions on x86_64.
The shortcut is still used in tests like spec/ARB_framebuffer_object/
fbo-alpha.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
libdl does not exist on many platforms which have dlopen in libc.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
In the gallium code, the assert() macro could come from either the
system's assert.h file (via c11/threads.h) or from gallium's u_debug.h.
It looks like all known assert.h files unconditionally #undef assert
before defining their own version. So the assert you get depends on
whether threads.h or u_debug.h was included last.
In the gallium code we really want to use the assert() from u_debug.h
(it behaves better on Windows). In gallium, c11/threads.h is only
included after u_debug.h in the os_thread.h wrapper. So Adding
an #ifndef assert test in the threads*.h files avoids using the system's
assert().
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
EXT_packed_depth_stencil is supported by all drivers, but
ARB_depth_texture isn't (notably nouveau_vieux). This should avoid
passing unexpected values down to ChooseTextureFormat.
The EXT_packed_depth_stencil spec does not make any explicit references
to requiring ARB_depth_texture in order to allow textures with that
format, however if there is no dependency, ARB_depth_texture would be
practically implied by the extension.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Note for 10.0 backport: This will produce a conflict, the solution is to
move the surrounding if as well.
There are a lot of different pci ids supported by nouveau, and more are
added all the time. The relevant distinguisher between drivers is the
chipset id.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
In all uses of dotlike() we're writing generic code that operates on 1-4
component vectors. That our IR requires ir_binop_dot expressions'
operands to be 2+ component vectors is an implementation detail that's
not important when implementing built-in functions with dot(), which is
defined for scalar floats in GLSL.
Reviewed-by: Eric Anholt <eric@anholt.net>
ARB_gpu_shader5 and ES 3.0 expose different subsets of
ARB_shading_language_packing.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Volume 4, part 1 of the Ivybridge PRM says, "Generally, the EWA
approximation algorithm results in higher image quality than the legacy
algorithm." Using a classic anisotropic filtering "tunnel" demo, it
appears that there is *no* anisotropic filtering on IVB without this bit
set.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I meant to include this fixes in v3 of commit
de7ad2c88f, but accidentally pushed a
previous version.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Sadly, we can't use actual ALIGN(), since that only supports
power-of-two values for the alignment parameter.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Register sets depend on the particular hardware generation, but don't
depend on anything in the actual OpenGL context. Computing them is
fairly expensive, and they take up a large amount of memory. Putting
them in the screen allows us to compute/allocate them once for all
contexts, saving both time and space.
Improves the performance of a context creation/destruction
microbenchmark by about 3x on my Haswell i7-4750HQ.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will allow us to use the screen as a memory context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This one saves about 2MB peak allocation in glsl-fs-algebraic-add-add-1,
with no performance difference on timing short shader-db runs (n=9/10,
warmup outlier removed).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This should use 1/8 the memory.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Christoph Brill <egore911@gmail.com>
This is a little easier to read.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Christoph Brill <egore911@gmail.com>
This isn't the GL API, so there's no reason to use GLboolean.
Using bool is safer: any non-zero value is treated as "true". When
converting a value to a GLboolean, all but the low byte is discarded,
which means that values like 256 will be incorrectly rendered as false.
Done via the following vim commands:
:%s/GLboolean/bool/g
:%s/GL_TRUE/true/g
:%s/GL_FALSE/false/g
and one line of manual whitespace tidying.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Ideally, we'd like to never even attempt the SIMD16 compile if we could
know ahead of time that it won't succeed---it's purely a waste of time.
This is especially important for state-based recompiles, which happen at
draw time.
The fragment shader compiler has a number of checks like:
if (dispatch_width == 16)
fail("...some reason...");
This patch introduces a new no16() function which replaces the above
pattern. In the SIMD8 compile, it sets a "SIMD16 will never work" flag.
Then, brw_wm_fs_emit can check that flag, skip the SIMD16 compile, and
issue a helpful performance warning if INTEL_DEBUG=perf is set. (In
SIMD16 mode, no16() calls fail(), for safety's sake.)
The great part is that this is not a heuristic---if the flag is set, we
know with 100% certainty that the SIMD16 compile would fail. (It might
fail anyway if we run out of registers, but it's always worth trying.)
v2: Fix missing va_end in early-return case (caught by Ilia Mirkin).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v1]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net>
This is just a matter of reusing the pull/push constant information set
up by the SIMD8 compile.
This gains us 78 SIMD16 programs in shader-db.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we don't renumber uniform registers, assign_constant_locations
and move_uniform_array_access_to_pull_constants use the same names.
So, they can share a single copy of the pull_constant_loc[] array.
This simplifies the code considerably. assign_constant_locations()
doesn't need to walk through pull_params[] to rediscover reladdr
demotions; it just has that information in pull_constant_loc[]. We also
only need to rewrite the instruction stream once, instead of twice.
Even better, we now have a single array describing the layout of
all pull parameters, which we can pass to the SIMD16 program.
This actually hurts a few shaders in Serious Sam 3, and one in KWin:
total instructions in shared programs: 1841957 -> 1842035 (0.00%)
instructions in affected programs: 1165 -> 1243 (6.70%)
Comparing dump_instructions() before and after the pull constant
transformations with and without this patch, it appears that there is
a uniform array with variable indexing (reladdr) and constant indexing
(of array element 0). Previously, we uploaded array element 0 as both
a pull constant (for reladdr) /and/ a push constant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, remove_dead_constants() would renumber the UNIFORM registers
to be sequential starting from zero, and the resulting register number
would be used directly as an index into the params[] array.
This renumbering made it difficult to collect and save information about
pull constant locations, since setup_pull_constants() and
move_uniform_array_access_to_pull_constants() used different names.
This patch generalizes setup_pull_constants() to decide whether each
uniform register should be a pull constant, push constant, or neither
(because it's unused). Then, it stores mappings from UNIFORM register
numbers to params[] or pull_params[] indices in the push_constant_loc
and pull_constant_loc arrays. (We already did this for pull constants.)
Then, assign_curb_setup() just needs to consult the push_constant_loc
array to get the real index into the params[] array.
This effectively folds all the remove_dead_constants() functionality
into assign_constant_locations(), while being less irritable to work
with.
v2: Add assert(remapped <= i), requested by Topi.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
move_uniform_array_access_to_pull_constants() and setup_pull_constants()
both have two parts:
1. Decide which UNIFORM registers to demote to pull constants, and
assign locations.
2. Mechanically rewrite the instruction stream to pull the uniform
value into a temporary VGRF and use that, eliminating the UNIFORM
file access.
In order to support pull constants in SIMD16 mode, we will need to make
decisions exactly once, but rewrite both instruction streams.
Separating these two tasks will make this easier.
This patch introduces a new helper, demote_pull_constants(), which
takes care of rewriting the instruction stream, in both cases.
For the moment, a single invocation of demote_pull_constants can't
safely handle both reladdr and non-reladdr tasks, since the two callers
still use different names for uniforms due to remove_dead_constants()
remapping of things. So, we get an ugly boolean parameter saying
which to do. This will go away.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When demoting a variably indexed uniform array to pull constants, we
only recorded the location for the base of the array (element 0).
Recording locations for all array elements is a trivial amount of code
and will make subsequent refactoring easier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>