We would like to be able to combine
result.x = bitfieldExtract(src0.x, src1.x, src2.x);
result.y = bitfieldExtract(src0.y, src1.y, src2.y);
result.z = bitfieldExtract(src0.z, src1.z, src2.z);
result.w = bitfieldExtract(src0.w, src1.w, src2.w);
into a single ivec4 bitfieldInsert operation. This should be possible
with most drivers.
This patch changes the offset and bits parameters from scalar ints
to ivecN or uvecN. The type of all three operands will be the same,
for simplicity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
We would like to be able to combine
result.x = bitfieldInsert(src0.x, src1.x, src2.x, src3.x);
result.y = bitfieldInsert(src0.y, src1.y, src2.y, src3.y);
result.z = bitfieldInsert(src0.z, src1.z, src2.z, src3.z);
result.w = bitfieldInsert(src0.w, src1.w, src2.w, src3.w);
into a single ivec4 bitfieldInsert operation. This should be possible
with most drivers.
This patch changes the offset and bits parameters from scalar ints
to ivecN or uvecN. The type of all four operands will be the same,
for simplicity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
TGSI doesn't use these - it just translates ir_quadop_bitfield_insert
directly. NIR can handle ir_quadop_bitfield_insert as well.
These opcodes were only used for i965, and with Jason's recent patches,
we can do this lowering in NIR (which also gains us SPIR-V handling).
So there's not much point to retaining this GLSL IR lowering code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The GLSL IR to TGSI/Mesa IR paths for any_nequal have the same
optimizations the ir_unop_any paths had.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We've assumed that we could lower per-component vector access from
vec[i] = scalar
to
vec = ir_triop_vector_insert(vec, scalar, i)
but with SSBOs (and compute shader SLM and tesselation outputs) this is
no longer valid. If a vector is "externally visible", multiple threads
can write independent components simultaneously. With lowering to
ir_triop_vector_insert, each thread read the entire vector, changes one
component, then writes out the entire vector. This is racy.
Instead of generating a ir_binop_vector_extract when we see v[i], we
generate ir_dereference_array. We then add a lowering pass to lower the
ir_dereference_array to ir_binop_vector_extract for rvalues and for to
vector_insert for lvalues in a separate lowering pass.
The resulting IR is the same as before, but we now have a window between
ast->ir conversion and the lowering pass where v[i] appears in the IR as
an array deref. This lets us run lowering passes that lower the vector
access to I/O (eg for SSBO load/store) before we lower the per-component
access to full vector writes.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
The unsized array length is computed with the following formula:
array.length() =
max((buffer_object_size - offset_of_array) / stride_of_array, 0)
Of these, only the buffer size needs to be provided by the backends, the
frontend already knows the values of the two other variables.
This patch identifies the cases where we need to get the length of an
unsized array, injecting ir_unop_ssbo_unsized_array_length expressions
that will be lowered (in a later patch) to inject the formula mentioned
above.
It also adds the ir_unop_get_buffer_size expression that drivers will
implement to provide the buffer length.
v2:
- Do not define a triop that will force backends to implement the
entire formula, they should only need to provide the buffer size
since the other values are known by the frontend (Curro).
v3:
- Call state->has_shader_storage_buffer_objects() in ast_function.cpp instead
of using state->ARB_shader_storage_buffer_object_enable (Tapani).
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This type will be used to store the name of subroutine types
as in subroutine void myfunc(void);
will store myfunc into a subroutine type.
This is required to the parser can identify a subroutine
type in a uniform decleration as a valid type, and also for
looking up the type later.
Also add contains_subroutine method.
v2: handle subroutine to int comparisons, needed
for lowering pass.
v3: do subroutine to int with it's own IR
operation to avoid hacking on asserts (Kayden)
v3.1: fix warnings in this patch, fix nir,
fix tgsi
v3.2: fixup tests
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
tests: fix warnings
When the new hash table implementation was added to Mesa it claimed to be much
faster, see commits 35fd61bd99 and 72e55bb688.
The set implementation follows the same implementation strategy so this should
be faster and there was no need to store a data field.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
These were added in commit f2616e56, presumably in preparation for
translating ARB vp/fp into GLSL IR. That never happened, and neither did
a lowering pass that actually generated these instructions.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Copy and pasted from the ir_if::condition handling, plus a NULL check.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Specifically, ir_var_temporary variables constructed with a NULL name
will all have the name "compiler_temp" in static storage.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Also move num_state_slots inside ir_variable_data for better packing.
The payoff for this will come in a few more patches.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The payoff for this will come in a few more patches.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
All built-in uniforms are supposed to be backed by some GL state. The
state_slots field describes this backing state.
This helped me track down a bug in a later patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Will be used to implement interpolateAt*() from ARB_gpu_shader5
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Now that the constructors set a type, ir_type_unset is not very useful.
Move it to the end of the enum (specifically out of position 0) so that
enums checks for dereferences and rvalues can save an instruction.
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
The IR is not supposed to support implicit type conversions; we just
failed to validate it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
foreach_iter and exec_list_iterators have been deprecated for some time now;
we just hadn't ever bothered to convert code to the newer foreach_list
and foreach_list_safe macros.
In these cases, we aren't editing the list, so we can use foreach_list
rather than foreach_list_safe.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch moves following bitfields and variables to the data
structure:
explicit_location, explicit_index, explicit_binding, has_initializer,
is_unmatched_generic_inout, location_frac, from_named_ifc_block_nonarray,
from_named_ifc_block_array, depth_layout, location, index, binding,
max_array_access, atomic
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This patch moves following bitfields in to the data structure:
used, assigned, how_declared, mode, interpolation,
origin_upper_left, pixel_center_integer
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This patch replaces the ir_loop fields "from", "to", "increment",
"counter", and "cmp" with a single integer ("normative_bound") that
serves the same purpose.
I've used the name "normative_bound" to emphasize the fact that the
back-end is required to emit code to prevent the loop from running
more than normative_bound times. (By contrast, an "informative" bound
would be a bound that is informational only).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The compiler back-ends (i965's fs_visitor and brw_visitor,
ir_to_mesa_visitor, and glsl_to_tgsi_visitor) have been assuming this
for some time. Thanks to the preceding patch, the compiler front-end
no longer breaks this assumption.
This patch adds code to validate the assumption so that if we have
future bugs, we'll be able to catch them earlier.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These built-ins have two "out" parameters, which makes implementing them
efficiently with our current compiler infrastructure difficult. Instead,
implement them in terms of the existing ir_binop_mul IR (to return the
low 32-bits) and a new ir_binop_mul64 which returns the high 32-bits.
v2: Rename mul64 -> imul_high as suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Calculates the carry out of the addition of two values and the
borrow from subtraction respectively. Will be used in uaddCarry() and
usubBorrow() built-in implementations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This gives the compiler the chance to inline and not export class symbols
even in the absence of LTO. Saves about 60kb on disk.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@.intel.com>
It's a ?: that operates per-component on vectors. Will be used in
upcoming lowering pass for ldexp and the implementation of frexp.
csel(selector, a, b):
per-component result = selector ? a : b
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that all the places that used to generate array derefeneces of
vectors have been changed to generate either ir_binop_vector_extract or
ir_triop_vector_insert (or both), remove all support for dealing with
this deprecated construct.
As an added safeguard, modify ir_validate to reject ir_dereference_array
of a vector.
v2: Convert tabs to spaces. Suggested by Eric.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The new opcode is used to generate a new vector with a single field from
the source vector replaced. This will eventually replace
ir_dereference_array of vectors in the LHS of assignments.
v2: Convert tabs to spaces. Suggested by Eric.
v3: Add constant expression handling for ir_triop_vector_insert. This
prevents the constant matrix inversion tests from regressing. Duh.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The new opcode is used to get a single field from a vector. The field
index may not be constant. This will eventually replace
ir_dereference_array of vectors. This is similar to the extractelement
instruction in LLVM IR.
http://llvm.org/docs/LangRef.html#extractelement-instruction
v2: Convert tabs to spaces. Suggested by Eric.
v3: Add array index range checking to ir_binop_vector_extract constant
expression handling. Suggested by Ken.
v4: Use CLAMP instead of MIN2(MAX2()). Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
i965/Gen7+ and Radeon/Evergreen+ have bfm/bfi instructions to implement
bitfieldInsert() from ARB_gpu_shader5.
v2: Add ir_binop_bfm and ir_triop_bfi to st_glsl_to_tgsi.cpp.
Remove spurious temporary assignment and dereference.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Since half of ir_validate uses asserts() (the other using printf() then
abort()), there's not much use to calling it in a release build. Cuts
6.3% of the startup time of TF2.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Many GPUs have an instruction to do linear interpolation which is more
efficient than simply performing the algebra necessary (two multiplies,
an add, and a subtract).
Pattern matching or peepholing this is more desirable, but can be
tricky. By using an opcode, we can at least make shaders which use the
mix() built-in get the more efficient behavior.
Currently, all consumers lower ir_triop_lrp. Subsequent patches will
actually generate different code.
v2 [mattst88]:
- Add LRP_TO_ARITH flag to ir_to_mesa.cpp. Will be removed in a
subsequent patch and ir_triop_lrp translated directly.
v3 [mattst88]:
- Move changes from the next patch to opt_algebraic.cpp to accept
3-src operations.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
For each function {pack,unpack}{Snorm,Unorm}4x8, add a corresponding
opcode to enum ir_expression_operation. Validate the new opcodes in
ir_validate.cpp.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
For each function {pack,unpack}{Snorm,Unorm,Half}2x16, add a corresponding
opcode to enum ir_expression_operation. Validate the new opcodes in
ir_validate.cpp.
Also, add opcodes for scalarized variants of the Half2x16 functions. (The
code generator for the i965 fragment shader requires that all vector
operations be scalarized. A lowering pass, to be added later, will
scalarize the Half2x16 functions).
v2: Fix assertion message in ir_to_mesa [for idr].
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Tuner <mattst88@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This patch replaces the three ir_variable_mode enums:
- ir_var_in
- ir_var_out
- ir_var_inout
with the following five:
- ir_var_shader_in
- ir_var_shader_out
- ir_var_function_in
- ir_var_function_out
- ir_var_function_inout
This eliminates a frustrating ambiguity: it used to be impossible to
tell whether an ir_var_{in,out} variable was a shader in/out or a
function in/out without seeing where the variable was declared in the
IR. This complicated some optimization and lowering passes, and would
have become a problem for implementing varying structs.
In the lisp-style serialization of GLSL IR to strings performed by
ir_print_visitor.cpp and ir_reader.cpp, I've retained the names "in",
"out", and "inout" for function parameters, to avoid introducing code
churn to the src/glsl/builtins/ir/ directory.
Note: a couple of comments in the code seemed to indicate that we were
planning for a possible future in which geometry shaders could have
shader-scope inout variables. Our GLSL grammar rejects shader-scope
inout variables, and I've been unable to find any evidence in the GLSL
standards documents (or extensions) that this will ever be allowed, so
I've eliminated these comments.
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reported by coverity scan.
v2: fix second case
Note: This is a candidate for stable branches.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>