The setup shaders were composed of both a fs shader number and a variant
number. But since they aren't tied to a particular fragment shader, the
former was a fixed zero while the latter was also always zero because
it was never assigned. So, similar to what the fs code does, use a ever
increasing number to give it a more catchy name (unlike fragment shaders
though where this number is for each explicitly created shader, we just use
it for the implicitly created variants).
And while here, fix whitespace a bit.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Unused except it was increased for both fs and setup shader variants created.
Probably some leftover from ages ago.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Previously the code used the total number of ubos being declared in the
linked program (so the ubos of all shaders combined), use the number
from the particular shader instead.
This fixes an assertion failure with piglit arb_uniform_buffer_object-maxblocks
seen in llvmpipe since 8a9f5ecdb1 as it now emits
code for each declared buffer, not just the ones actually used.
CC: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The big missing part here is proper sched data calculations, but
hopefully the chosen placeholder will be sufficient for now.
Passes piglit as well as GK107 does.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Maxwell no longer has the methods to set constant attributes, and we'll
want to be treating stride 0 vtxbufs the same as for stride > 0.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Both changes landed in 10.2, and for people not following the
development cycle these will come as a surprise. Note that the
pipe_* interface is not stable.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
Use "@GL_LIB@" in src/gallium/targets/libgl-xlib/Makefile.am to produce
the library name specified by the configure --with-gl-lib-name option.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Commit 11623be934 was meant to have this hunk, which
I accidently dropped during git rebase.
Cc: 10.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Julien Cristau <jcristau@debian.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jonathan Gray <jsg@jsg.id.au>
In 1d35f77228 support for multiple constant
buffers was introduced. This meant we had another indirection, and we did
resolve the indirection for each constant buffer access. This looks very
reasonable since llvm can figure out if it's the same pointer, however it
turns out that this can cause llvm compilation time to go through the roof
and beyond (I've seen cases in excess of factor 100, e.g. from 50 ms to more
than 10 seconds (!)), with all the additional time spent in IR optimization
passes (and in the end all of it in DominatorTree::dominate()).
I've been unable to narrow it down a bit more (only some shaders seem affected,
seemingly without much correlation to overall shader complexity or constant
usage) but it is easily avoidable by doing the buffer lookups themeselves just
once (at constant buffer declaration time).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Bring it back in line with r600g. I broke this in the original radeonsi
bringup. :(
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78537
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Previously, ir_unop_any was implemented via a dot-product call, which
uses floating point multiplication and addition. The multiplication was
completely pointless, and the addition can just as well be done with an
or. Since we know that the inputs are booleans, they must already be in
canonical 0/~0 format, and the final SNE can also be avoided.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It wasn't completely clear from the docs, so I had to figure out by
looking at piglit results. Hopefully this saves the next driver writer
implementing queries some time.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Not necessary, now that we will free the whole module (hence all
function bodies) immediately after compiling.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Free up unneeded LLVM stuff immediately after generating vertex shader
code. Saves about 500K per shader.
v2: Don't bother calling gallivm_free_function (Jose)
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Split free_gallivm_state() into two steps. First step is
gallivm_free_ir() which cleans up the LLVM scaffolding used to generate
code while preserving the code itself. Second step is
gallivm_free_code() to free the memory occupied by the code.
v2: s/gallivm_teardown/gallivm_free_ir/ (Jose)
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Provide a JITMemoryManager derivative which puts all generated code into
one memory pool instead of creating a new one each time code is generated.
This saves significant memory per shader as the pool size is 512K and
a small shader occupies just several K.
This memory manager also defers freeing generated code until you tell
it to do so, making it possible to destroy the LLVM engine while keeping
the code, thus enabling future memory savings.
v2: Fix compilation errors with LLVM 3.4 (Jose)
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
I saw that LLVM internally uses its global context for some things, even
when we use our own. Given ours is also global, might as well use
LLVM's.
However, sepearate contexts can still be enabled with a simple source
code modification, for when the need/benefit arises.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Nowadays LLVMModuleProviderRef is just an alias for LLVMModuleRef, so
its use just causes unnecessary confusion.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Older versions haven't been tested probably don't work anyway. But more
importantly, code supporting it is hindering further work.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It should compare with it's own size.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Many instructions implicitly update the accumulator on Gen < 6. The instruction
scheduling code just calls add_barrier_deps() for each accumulator access on
these platforms, but a large class of operations don't actually update the
accumulator -- mostly move and logical instructions. Teaching the scheduling
code about this would allow more flexibility to schedule instructions.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77740
Reviewed-by: Matt Turner <mattst88@gmail.com>
The M_PI*f macros used a preprocessor paste to append 'f'
to M_PI defines, which works if the values are only numbers
but breaks on OpenBSD where M_PI definitions have casts
and brackets to meet requirements of a future version of POSIX,
http://austingroupbugs.net/view.php?id=801http://austingroupbugs.net/view.php?id=828
Simplify the M_PI*f macros by using casts directly in the defines
as suggested by Kenneth Graunke.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78665
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Real GPU queries need some infrastructure to track samples per tile and
accumulate the results. But fortunately this can be shared across GPU
generation.
See:
https://github.com/freedreno/freedreno/wiki/Queries#hardware-queries
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Split out fd_query into an abstract base class, to allow multiple
implementations. The current sw based queries are moved into
fd_sw_query.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
As far as I can tell, Mesa hasn't had a convenient way to dump ARB_vp/fp
source until now. Using MESA_GLSL=dump is convenient, since it means
you can use a single environment variable to dump a program's shaders,
no matter which language they're written in.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The point of copytexsubimage_using_blit_framebuffer is to use a hardware
accelerated BlitFramebuffer path. If that fails, we shouldn't do a
swrast blit---we should try our CTSI fallback code.
This is especially important for i965 and GLES, where we don't even
create a swrast context.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77705
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
The depth extent field is used to limit the allowed slice range that
can be rendered to.
With the previous setting, only slice 0 could be rendered.
This fixes piglit amd_vertex_shader_layer-layered-depth-texture-render.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Fixes piglit's
'gl-3.2-layered-rendering-clear-color-all-types 3d mipmapped'
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>