Fix build for darwin, when ./configured --disable-driglx-direct
- darwin ld doesn't support -Bsymbolic or --version-script, so check if ld
supports those options before using them
- define GLX_ALIAS_UNSUPPORTED as config/darwin used to, as aliasing of non-weak
symbols isn't supported
- default to -with-dri-drivers=swrast
v2:
Use -Wl,-Bsymbolic, as before, not -Bsymbolic
Test that ld --version-script works, rather than just looking for it in ld --help
Don't use -Wl,--no-undefined on darwin, either
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
It works fine, though it requires using ELF objects.
With this change there is nothing preventing us to switch exclusively
to MCJIT, everywhere. It's still off though.
Seems the opcodes are slightly different from a2xx. Resync headers and
move blend_func() helper into hw generation specific code.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We already multiply by bytes per pixel for this, so f3ba7611 broke
mem2gmem for depth/stencil. Drop the now-redundant mutiply by cpp.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
In cases where there was no color buf bound, there were inconsistancies
in register settings related to position of depth/stencil inside GMEM.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
These aren't buffers we ever read back from CPU, so using incorrect
reloc fxn wasn't really harming anything. But might as well be correct.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
In commit 4be146b1, I neglected to add the new property to the strings
array. This leads to the string '(null)' to be printed instead when
converting a GS shader to text.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Make sure to normalize the z coordinates as well as the x/y ones when
there are mipmaps present. Fixes 3d mipmap generation, which now uses
the blit path.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
These instructions can come in either through IMUL_HI/UMUL_HI TGSI
opcodes, or from OP_DIV constant folding.
Also make sure that the constant foldings which delete the original
instruction still get counted as having done something.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Retrieving the high 32 bits of a signed multiply is rather annoying. It
appears that the simplest way to do this is to compute the absolute
value of the arguments, and perform a u32 x u32 -> u64 operation. If the
arguments' signs differ, then negate the result. Since there is no u64
support in the cvt instruction, we have the perform the 2's complement
negation "by hand".
This logic can come into use by the IMUL_HI instruction (very unlikely
to be seen), as well as from constant folding of division by a constant.
Fixes dolphin's divisions by 255.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
I think a3xx and later should support (it is part of GLES3), but this
isn't needed for the time being and still needs to be reversed.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2ea923cf57 had the side effect of IR counting
now being done after IR optimization instead of before. Some quick analysis
shows that there's roughly 1.5 times more IR instructions before optimization
than after, hence the effective shader cache size got quite a bit smaller.
Could counter this with an increase of the instruction limit but it probably
makes more sense to count them after optimizations, so move that code.
Reviewed-by: Brian Paul <brianp@vmware.com>
UNION appears to expect that all of its sources are conditionally
defined. Otherwise it inserts an unpredicated mov instruction which
overwrites the desired result. This fixes tests that use UMUL_HI, and
much less directly, unsigned integer division by a constant, which uses
this functionality in a peephole pass.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Gallium already gives us height==1 for these, so the texture state is
already setup correctly to emulate 1D textures as a Nx1 2D texture. We
just need to supply the .y coord.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
That was easy. Turns out it is just a matter of setting one bit.
Enable sampling from sRGB texture, and therefore enable GL 2.1 :-)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Enabled with GALLIVM_DEBUG=perf (which up to now was only used to print
warnings for unoptimized code).
While some unexpectedly long shader compile times for some shaders were fixed
with 8a9f5ecdb1 this should help recognize such
problems in the future. For now though only available in debug builds (which
are not always suitable for such analysis). And since this uses system time,
it might not be all that accurate (even llvmpipe's own rasterization threads
might be running at the same time, or just other tasks).
(llvmpipe also has LP_DEBUG=counters but this only gives an average per shader
and the the total time for all shaders.)
This prints information like this:
optimizing module fs17_variant0 took 1 msec
optimizing module setup_variant_0 took 0 msec
optimizing module draw_llvm_vs_variant0 took 9 msec
optimizing module draw_llvm_vs_variant0 took 12 msec
optimizing module fs17_variant1 took 2 msec
v2: rebase for recent gallivm compilation changes, and print time for whole
modules instead of functions (otherwise it would be very spammy since it would
include all trivial inline sse2 functions), using the shiny new module names,
prying them off LLVM using new helper (not available through C bindings).
Per function timings, while possibly giving more information (if there'd be
a problem only in for instance the partial not the whole function), don't seem
all that useful for now.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When we had just one module "gallivm" was an appropriate name. But now we have
modules containing all functions for a particular variant, so give it a
corresponding name (this is really just for helping debugging).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This workaround doesn't list any llvm version, but it was introduced
2010-06-10 (e277d5c1f6). It is unlikely
this bug is still present in llvm versions we support (3.1+).
There's no specific test listed, but I ran lp_test_arit (which uses
the mentioned functions) on llvm 3.1 and 3.3 with sse41 disabled and
this pass enabled without issues.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
32bit code generation and llvm >= 2.7 used a different optimization pass
order - this code was initially introduced (2010-07-23) by
815e79e72c, apparently due to buggy code being
generated with then brand new llvm versions (which was llvm 2.7 plus pre 2.8
devel).
It seems very highly likely that whatever this bug was it has been fixed in
newer llvm versions, though there's no easy way to test this - the mentioned
piglit test has been removed years ago, and even if you'd build it I'm
sceptical the glsl compiler would still produce the required code to trigger
it.
I have no idea what a good order of passes is, but just remove the workaround
and use the same order everywhere.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
With this and the previous patch, we no longer have multiple
definitions in the final egl_gallium.so.
v2: Drop duplicate libloader link.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com> (v1)
Reviewed-by: Tom Stellard <thomas.stellard@amd.com> (v1)
It makes more sense to link the core and common parts of the driver as the
target is build. Additionally this will help us drop duplicating symbols
for targets that static link mulitple pipe-drivers. Only egl-static needs
that currently with more to come.
To simplify things a bit add HAVE_GALLIUM_RADEON_COMMON variable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
The loops for updating the multiple packed fields in SP_VS_OUT[] and
SP_VS_VPC_DST[] will zero out one register beyond the last that on
required. Which is normally not a problem (and is kinda convenient
when looking at cmdstream dumps) unless we have maximum (16) varyings.
Fix loop termination condition so that this does not happen.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We need to size input/output tables big enough for special inputs/
outputs (gl_Position, gl_FrontFacing, etc) which, while they don't
count towards the hw limit of 16 attributes or 16 varyings, we do
still need to track them all the same.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Hardware only supports 16. Which fd3_shader_variant properly reflected,
but the pipe cap did not, leading to array overflow (and shaders that
could not possibly work).
Also a bunch of asserts to make problems like this easier to see.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We are starting to add integer support to the compiler, which does not
get exercised with glsl feature level 120 and without advertising
integer support. But doing so breaks too many things right now. So
for now use a debug flag to conditionally expose the functionality
while it is in development.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The KILL_IF opcode could potentially be merged in to the regular KILL
opcode function. It was a pain to do so, so I've left is separated
for cleanliness.
Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Adds a large sum of TGSI opcodes to the a3xx compiler.
For integer opcodes we have 28 opcodes added.
Adds 4 floating point compare opcodes
If GLSL 1.30 is enabled, this allows the GLSL 1.30 piglits to have a
completion amount of 432/641.
Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
All shaders had the same name.
We could probably use some identifier per shader too, but for now only use
the variant number.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The setup shaders were composed of both a fs shader number and a variant
number. But since they aren't tied to a particular fragment shader, the
former was a fixed zero while the latter was also always zero because
it was never assigned. So, similar to what the fs code does, use a ever
increasing number to give it a more catchy name (unlike fragment shaders
though where this number is for each explicitly created shader, we just use
it for the implicitly created variants).
And while here, fix whitespace a bit.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Unused except it was increased for both fs and setup shader variants created.
Probably some leftover from ages ago.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The big missing part here is proper sched data calculations, but
hopefully the chosen placeholder will be sufficient for now.
Passes piglit as well as GK107 does.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>