This makes things a bit nicer, and more importantly it fixes an issue
where a "downgraded" array texture (due to view reduced to 1 layer and
addressed with (non-array) samplec instruction) would use the wrong
coord as shadow reference value. (This could also be fixed by passing
target through the sampler interface much the same way as is done for
size queries, might do this eventually anyway.)
And if we'd ever want to support (shadow) cube map arrays, we'd need
5 coords in any case.
v2: fix bugs (texel fetch using wrong layer coord for 1d, shadow tex
using wrong shadow coord for 2d...). Plus need to project the shadow
coord, and just for fun keep projecting the layer coord too.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Instead of passing s,t,r coordinates pass a coord array - the reason is that
I need to pass more coords (in particular for shadow "coord", future will also
need another one for cube map arrays) so just pass them as an array.
Also, to simplify things, use fixed location for the shadow reference value I
want to get rid of the silly "where is the right coord value" game.
Keep old-style however for aos sampling (which is not going to need shadow
coord, though for cube map arrays it still would need fixing).
(Next patch will pass those through using the new arrangement directly from
sampler interface.)
v2: fix up soa split path (unreachable currently but still...)
Reviewed-by: Zack Rusin <zackr@vmware.com>
We need to put border color into texture format color space which
essentially means clamping for non-float, normalized formats (not entirely
sure if we're also meant to quantize the float but it's probably ok not to
do it thankfully).
For OpenGL we could do this easily outside generated code due to the
1:1 sampler/texture correspondence but not for d3d10 which is terrible
(as we recalculate a constant over and over again per shader invocation).
Fortunately border color should be rare enough that we don't care THAT much.
Reviewed-by: Zack Rusin <zackr@vmware.com>
If the fragment shader is null then pixel shader invocations have
to be equal to zero. And if we're running a null ps then clipper
invocations and primitives should be equal to zero but only
if both stancil and depth testing are disabled.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Calling the prepare outputs cleans up the slot assignments
for outputs, unfortunately aapoint and aaline didn't have
code to reset their slots after the initial setup, this
was messing up our slot assignments. The unfilled stage
was just missing the initial assignment of the face slot.
This fixes all of the reported piglit failures.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This patch adds the level query support to the video decoders
and uses some more reasonable defaults.
v2: (ck) add commit message
Reviewed-by: Christian König <christian.koenig@amd.com>
This fixes a compilation warning with -Wformat-security.
CC: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Looks like the same issue that was seen with MULADD in trans slot on
R7xx also affects MULADD_IEEE (maybe all OP3 instructions and MULADD is
just a most frequently used?). So the workaround is to not allow affected
instructions to be placed into the trans slot.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=67927
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
FSEQ/FSGE/FSLT/FSNE work just the same as SEQ/SGE/SLT/SNE except skip the
select.
And just for consistency use the same appropriate ordered/unordered comparisons
for the old opcodes as well.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Also while here add a bunch of other forgotten (integer) instructions to
tgsi_util_get_inst_usage_mask() (which isn't used for much except optimizing
away unused input components), though it may still be incomplete.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Newer graphic languages don't want messy float mask results but instead true
"boolean" mask results for float comparisons. Otherwise just need to convert
the floats back to integers. Need to keep the old opcodes however due to both
legacy (gl and d3d9) needing them and because older hw can't really deal with
integers. These new FSEQ/FSGE/FSLT/FSNE opcodes are part of integer API and
hence must be supported if a driver claims to support glsl 1.30 (or
PIPE_SHADER_CAP_INTEGERS).
Reviewed-by: Zack Rusin <zackr@vmware.com>
Because we must maintain an exec_mask even if there's currently nothing
on the mask stack, we can still have an exec_mask at the end of the program.
Effectively, this mask should be set back to default when returning from main.
Without relying on END/RET opcode (I think it's valid to have neither) it is
actually difficult to do this, as there doesn't seem any reasonable place to
do it, so instead let's just say the exec_mask is invalid outside main (which
it really is effectively).
The problem is that geometry shader called end_primitive outside the shader
(in the epilogue), and as a result used a bogus mask, leading to bugs if we
had to set the (somewhat misnamed) ret_in_main bit anywhere. So just avoid
the mask combining function when called from outside the shader.
Reviewed-by: Zack Rusin <zackr@vmware.com>
The code was quite weird, the second comparison was in fact a complete no-op
and we can also do the comparison with the vector directly instead of scalar,
which should not also be faster but it is way more obvious how that mask
is actually going to look like.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Instead of reducing masks to 0/1 simply use the mask directly as -1.
Also use some signed comparison instead of unsigned (as far as I understand
these values have to be (very) small and signed means llvm doesn't have to
apply additional logic to do the unsigned comparisons the cpu can't do).
Saves a couple of instructions in some test geometry shader here.
v2: that was a bit to much optimization, don't skip combining the masks...
Reviewed-by: Zack Rusin <zackr@vmware.com>
Fix ilo_gpe_init_view_surface_for_buffer to allow buffer to be NULL, and add
ilo_gpe_set_view_surface_bo to set it later. This allows us to set up
SURFACE_STATE early for constant buffers backed by user buffers.
In finalize_index_buffer(), when the current index buffer was destroyed due to
u_upload_data(), it may happen that the new index buffer is at the same
address as the old one. Comparing the pointers to the two buffers could fail
to work, and 3DSTATE_INDEX_BUFFER would be incorrectly skipped.
Holding a reference to the current index buffer before calling u_upload_data()
should fix the problem.
My previous attempt at doing so double-failed miserably (minification of
zero still gives one, and even if it would not the value was never written
anyway).
While here also rename the confusingly named int_vec bld as we have int vecs
of different sizes, and rename need_nr_mips (as this also changes out-of-bounds
behavior) to is_sviewinfo too.
Reviewed-by: Zack Rusin <zackr@vmware.com>
d3d10 has no notion of distinct array resources neither at the resource nor
sampler view level. However, shader dcl of resources certainly has, and
d3d10 expects resinfo to return the values according to that - in particular
a resource might have been a 1d texture with some array layers, then the
sampler view might have only used 1 layer so it can be accessed both as 1d
or 1d array texture (I think - the former definitely works). resinfo of a
resource decleared as array needs to return number of array layers but
non-array resource needs to return 0 (and not 1). Hence fix this by passing
the target from the shader decl to emit_size_query and use that (in case of
OpenGL the target will come from the instruction itself).
Could probably do the same for actual sampling, though it may not matter there
(as the bogus components will essentially get clamped away), possibly could
wreak havoc though if it REALLY doesn't match (which is of course an error
but still).
Reviewed-by: Zack Rusin <zackr@vmware.com>
Specifically, must return 0 for non-existent mip levels (and non-existent
textures which is an unsolved problem) for everything but total mip count.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Cayman and trinity systems still seem to suffer from
stability problems with GPUVM. This also fixes compute
on these asics. It can still be enabled for testing
by setting env var RADEON_VA=true.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=65958
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: "9.2" <mesa-stable@lists.freedesktop.org>
CC: "9.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
We can't be injecting the primitive id's in the pipeline because
by that time the primitives have already been decomposed. To
properly number the primitives we need to handle the adjacency
primitives by hand. This patch moves the prim id injection into
the original primitive assembler and completely removes the
useless pipeline stage.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Without reseting the vertex id, with primitives where the same
vertex is used with different primitives (e.g. tri/lines strips)
our vbuf module won't re-emit those vertices with the changed
primitive id. So lets reset the vertex id whenever injecting
new primitive id to make sure that the vertex data is correctly
emitted.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Before inserting new front face and prim id outputs cleanup
the old extra outputs, otherwise our cache will use previous
output slots which will break as soon as outputs of the current
shader don't match the last.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This is wrong both for OpenGL and d3d. (In fact clamping is a side effect
of converting to depth format, so this should really do quantization too
at least in d3d10 for the comparisons to be truly correct.)
Reviewed-by: Zack Rusin <zackr@vmware.com>
Clearly the returned values need to be per-element if the lod is per element.
Does not actually change behavior yet.
Reviewed-by: Zack Rusin <zackr@vmware.com>
This opcode is quite problematic in tgsi, while it tries to mirror
d3d10 resinfo it can't really do what's stated there due to missing
the crazy return type modifiers. Hence specify this is ignored along
with the swizzle.
(Other options would be to have multiple opcodes or specify the ret
type modifier maybe in dst_reg as there's padding bits left there but
it is the only instruction allowing this.)
Reviewed-by: Zack Rusin <zackr@vmware.com>
For d3d10 and ARB_robust_buffer_access_behavior, we are required to return
0 for out-of-bounds coordinates (for which we can just enable the code already
there was just disabled). Additionally, also need to return 0 for
out-of-bounds mip level and out-of-bounds layer. This changes the logic
so instead of clamping the level/layer, an out-of-bound mask is computed
instead in this case (actual clamping then can be omitted just like with
coordinates, since we set the fetch offset to zero if that happens anyway).
Reviewed-by: Zack Rusin <zackr@vmware.com>
While so far this only causes some harmless test failures, there's lots more
cpus with DAZ. All 64bit capable ones can do it (particularly relevant for
AMD cpus as they supported sse3 very very late) but if really necessary we
can check support for that for real with some more magic.
(In fact just about ANY cpu with sse2 can support DAZ, I believe the only
exception are first gen P4 (Willamette) and from those only early steppings
which can't do it it's almost like intel forgot to add it... - a real pity
though docs say you can't just try to set it as they will throw a GPF.)
While this was meant to address https://bugs.freedesktop.org/show_bug.cgi?id=67672
it does not fix it. Most likely the tests need fixing as I don't think
there's any guarantee about denorm handling in the reference math library
functions if the flags aren't set to standard values. Nevertheless enabling
DAZ on all cpus which can do it should be the right thing to do.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Should be much faster, seems to work in softpipe.
While here (also it's now disabled) fix up the pow factor - the former value
is what is in GL core it is however not actually accurate to fp32 standard
(as it is 1.0/2.4), and if someone would do all the accurate math there's no
reason to waste 8 mantissa bits or so...
v2: use real table generating function instead of just printing the values
(might take a bit longer as it does calculations on some 3+ million floats
but much more descriptive obviously).
Also fix up another inaccurate pow factor (this time in the python code) -
wondering where the couple one bit errors came from :-(.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>