This probably got broken when the samplers were converted to be indexed
by shader type.
Seen when looking at bug 89819 though I'm not sure if that really was what
the bug was about...
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
I just botched this when writing the original code.
From the ARB_vertex_program specification:
"The RSQ instruction approximates the reciprocal of the square root of
the absolute value of the scalar operand and replicates it to all four
components of the result vector."
Fixes a Glean vertProg1 subtest:
RSQ test 2 (reciprocal square root of negative value)
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90547
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The comment was accurate but the condition was reversed...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
This change introduces a new field in gl_uniform_storage to
explicitely say that a uniform is built-in. In the case where it is,
no storage is defined to make it clear that it is read-only from the
mesa side. I fixed all the places in the code that made use of the
structure that I changed. Any place making a wrong assumption and using
the storage straight away will just crash.
This patch seems to implement the path of least resistance towards
listing built-in uniforms in GL_ACTIVE_UNIFORM (and other APIs).
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Pretty trivial, fixes the issue that we're expected to be able to blit
stencil surfaces (as the blit just relies on util blitter code which needs
stencil export to do it).
2 piglits skip->pass, 11 fail->pass
v2: prettify, keep different stencil ref value handling out of depth/stencil
test itself.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Some hardware reads only the low 16-bits even if the type is UD, but
other hardware like Cherryview can't handle this.
Fixes spec@arb_gpu_shader5@execution@sampler_array_indexing@fs-simple on
Cherryview.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90830
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
It was 2 bits to accommodate SATURATE_PLUS_MINUS_ONE (removed by commit
09b566e1). A similar change was made to TGSI recently in commit
e1c4e8aa.
Reducing the size from 2 bits to 1 reduces the size of the bit fields
from 17 bits to 16, which is a much nicer number.
Reviewed-by: Brian Paul <brianp@vmware.com>
If the incoming func matches the current state it must be a legal
value so we can do this before the switch statement.
Signed-off-by: Brian Paul <brianp@vmware.com>
If the glPushAttrib() mask value was zero we didn't actually push
anything onto the attribute stack. A subsequent glPopAttrib() call
would generate a GL_STACK_UNDERFLOW error. Now push a dummy attribute
in that case to prevent the error.
Mesa now matches nvidia's behavior.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
lower_phis_to_scalar() pass recurses the instruction dependence graph to
determine if all the sources of a given instruction are scalarizable.
To prevent cycles, it temporary marks the phi instruction before recursing in,
then updates the entry with the resulting value. However, it does not consider
that the entry value may have changed after a recursion pass, hence causing
a use-after-free situation and a crash.
This patch fixes this by reloading the entry corresponding to the 'phi'
after recursing and before updating its value.
The crash can be reproduced ~20% of times with the dEQP test:
dEQP-GLES3.functional.shaders.loops.while_constant_iterations.nested_sequence_fragment
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Now that Jason's LOAD_PAYLOAD improvements have landed, we don't need
this. Passing 1 for the number of header registers already takes care
of setting force_writemask_all on the header copy.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
See the corresponding code in brw_vs_surface_state.c.
v2: const more things (requested by Topi Pohjolainen)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
We used to store the GS dispatch mode in brw_gs_prog_data while
separately storing the VS dispatch mode in brw_vue_prog_data::simd8.
This patch introduces an enum to represent all possible dispatch modes,
and stores it in brw_vue_prog_data::dispatch_mode, unifying the two.
Based on a suggestion by Matt Turner.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
The documentation makes it pretty clear that we shouldn't use this:
"Under normal conditions SW shall specify DMask, as the GS stage
will provide a Dispatch Mask appropriate to SIMD4x2 or SIMD8 thread
execution (as a function of dispatch mode). E.g., for SIMD4x2
execution, the GS stage will generate a Dispatch Mask that is equal
to what the EU would use as the Vector Mask. For SIMD8 execution
there is no known usage model for use of Vector Mask (as there is
for PS shaders)."
I also managed to find descriptions of DMask and VMask, in the "State
Register" (sr0.2/3) field descriptions:
"Dispatch Mask (DMask). This 32-bit field specifies which channels
are active at Dispatch time."
"Vector Mask (VMask). This 32-bit field contains, for each 4-bit
group, the OR of the corresponding 4-bit group in the dispatch
mask."
SIMD4x2 shaders process one or two vec4 values, with each 4-bit group
corresponding to xyzw channel enables (either all on, or all off).
Thus, DMask = VMask in SIMD4x2 mode. But in SIMD8 mode, 4-bit groups
are meaningless, so it just messes up your values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
It's incompletete -- it wasn't filling ReferenceType so it was causing
garbagge on the disassembly. Furthermore it seems impossible to get the
jump information through this interface.
The solution for function size problem is to effectively book-keep the
machine code start and end address while JIT'ing.
When calculating the binding table index for non-constant sampler
array indexing it needs to add the base binding table index which is a
constant within the generated code. Often this base is zero so we can
avoid a redundant instruction in that case.
It looks like nothing in shader-db is doing non-constant sampler array
indexing so this patch doesn't make any difference but it might be
worth having anyway.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Previously when generating the send instruction for a sample
instruction with an indirect sampler it would use the destination
register as a temporary store. This breaks when used in combination
with the opt_sampler_eot optimisation because that forces the
destination to be null. This patch fixes that by avoiding the temp
register altogether.
The reason the temporary register was needed was because it was trying
to ensure the binding table index doesn't overflow a byte by and'ing
it with 0xff. The result is then or'd with samper_index<<8. This patch
instead just and's the whole thing by 0xfff. This will ensure that a
bogus sampler index won't overflow into the rest of the message
descriptor but unlike the previous code it won't ensure that the
binding table index doesn't overflow into the sampler index. It
doesn't seem like that should matter very much though because if the
shader is generating a bogus sampler index then it's going to just get
garbage out either way.
Instead of doing sampler_index<<8|(sampler_index+base_table_index) the
new code avoids one operation by doing
sampler_index*0x101+base_table_index which should be equivalent.
However if we wanted to avoid the multiply for some reason we could do
this by adding an extra or instruction still without needing the
temporary register.
This fixes a number of Piglit tests on Skylake that were using
indirect samplers such as:
spec@arb_gpu_shader5@execution@sampler_array_indexing@fs-simple
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
We were returning the most recently freed BO, without checking if it
was idle yet. This meant that we generally stalled immediately on the
previous frame when generating a new one. Instead, allocate new BOs
when the *oldest* BO is still busy, so that the cache scales with how
much is needed to keep some frames outstanding, as originally
intended.
Note that if you don't have some throttling happening, this means that
you can accidentally run the system out of memory. The kernel is now
applying some throttling on all execs, to hopefully avoid this.
AFAICT, there is no real way to make sure a send message with EOT is properly
ignored from compact, nor can I see a way to actually encode EOT while
compacting. Before the single send optimization we'd always bail because we hit
the is_immediate && !is_compactable_immediate case. However, with single send,
is_immediate is not true, and so we end up trying to compact the un-compactible.
Without this, any compacting single send instruction will hang because the EOT
isn't there. I am not sure how I didn't hit this when I originally enabled the
optimization. I didn't check if some surrounding code changed.
I know Neil and Matt were both looking into this. I did a quick search and
didn't see any patches out there to handle this. Please ignore if this has
already been sent by someone. (Direct me to it and I will review it).
Reported-by: Neil Roberts <neil@linux.intel.com>
Reported-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Pure integer formats cannot be sampled with linear tex / mip filters. In GL
such a setup would make the texture incomplete.
We shouldn't rely on the state tracker though to filter that out, just return
all zeros instead of dying in the lerp.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
gallivm now depends on it. And depending on particular LLVM version /
configure options, the build can fail without this change due to
undefined reference to `LLVM*Disasm*' symbols.
Trivial.
It doesn't do everything we want. In particular it doesn't allow to
detect jumps or return opcodes. Currently we detect the x86's RET
opcode.
Even though it's worse for LLVM 3.3, it's an improvement for LLVM 3.7,
which was totally busted.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>