This commit splits portions of the existing brw_upload_vs_prog and
brw_upload_gs_prog function into new brw_vs_populate_key and
brw_gs_populate_key functions. This follows the same style as is
already present for all other stages, (see brw_wm_populate_key, etc.).
This commit is intended to have no functional change. It exists in
preparation for some upcoming code movement in preparation for the
shader cache.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 09ee907266 added logic to fold immediates into mad operations,
but the emission code is only there for fmad. Only allow it on float
types.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Commit fb63df2215 added 4-byte mad support, but only supported
emission for floats. Disable it for ints for now.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The lifetime of the sources array needs to be match the nir_tex_instr
itself. So, allocate it using the instruction itself as the context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
These sets are part of the block, and their lifetime needs to match the
block itself. So, allocate them using the block itself as the context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The lifetime of each register's use/def/if_use sets needs to match the
register itself. So, allocate them using the register itself as the
context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
glsl_to_nir passes in the ir_function's name field; we were copying the
pointer, but not duplicating the memory.
We want to be able to free the linked GLSL IR program after translating
to NIR, so we'll need to create a copy of the function name that the NIR
shader actually owns.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We can just pass a pointer to the list of variables, and reuse the code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
ralloc_adopt() reparents all children from one context to another.
Conceptually, ralloc_adopt(new_ctx, old_ctx) behaves like this
pseudocode:
foreach child of old_ctx:
ralloc_steal(new_ctx, child)
However, ralloc provides no way to iterate over a memory context's
children, and ralloc_adopt does this task more efficiently anyway.
One potential use of this is to implement a memory-sweeper pass: first,
steal all of a context's memory to a temporary context. Then, walk over
anything that should be kept, and ralloc_steal it back to the original
context. Finally, free the temporary context. This works when the
context is something that can't be freed (i.e. an important structure).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The hardware only supports 4 MRTs. It should be possible to emulate
support for 8, but doesn't seem worth the trouble.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This complication is unnecessary and makes MRTs more complicated and
likely to generate tons of variants.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The thing we want to avoid is int/float comparisons, but int/unsigned
comparisons with 0 are equivalent.
total instructions in shared programs: 6194829 -> 6193996 (-0.01%)
instructions in affected programs: 117192 -> 116359 (-0.71%)
helped: 471
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
No shader-db changes, probably because they're all removed by the GLSL
compiler optimization added in commit 69ad5fd4.
Reviewed-by: Eric Anholt <eric@anholt.net>
Doesn't work for analogous && cases, because of NaNs.
total instructions in shared programs: 6195712 -> 6194829 (-0.01%)
instructions in affected programs: 42000 -> 41117 (-2.10%)
helped: 403
Reviewed-by: Eric Anholt <eric@anholt.net>
InputsRead is a 64-bit bitfield. Using _mesa_fls would silently
truncate off the high bits, claiming inputs 32..56 (VARYING_SLOT_MAX)
were never read.
Using <= here was a hack I threw in at the last minute to fix programs
which happened to use input slot 32. Switch back to using < now that
the underlying problem is fixed.
Fixes crashes in "Euro Truck Simulator 2" when using prog->nir, which
uses input slot 33.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ptn_move_dest and nir_fadd already take care of replicating the last
channel out, so we can just use a scalar and skip splatting it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We run lowering and optimization passes that might leave garbage lying
around. This keeps the FS cse from having to clean it up.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The idea here is that fusing multiply-add combinations too early can reduce
our ability to perform CSE and value-numbering. Instead, we split ffma
opcodes up-front, hope CSE cleans up, and then fuse after-the-fact.
Unless an algebraic pass does something silly where it inserts something
between the multiply and the add, splitting and re-fusing should never
cause a problem. We run the late algebraic optimizations after this so
that things like compare-with-zero don't hurt our ability to fuse things.
shader-db results for fragment shaders on Haswell:
total instructions in shared programs: 4390538 -> 4379236 (-0.26%)
instructions in affected programs: 989359 -> 978057 (-1.14%)
helped: 5308
HURT: 97
GAINED: 78
LOST: 5
This does, unfortunately, cause some substantial hurt to a shader in Kerbal
Space Program. However, the damage is caused by changing a single
instruction from a ffma to an add. This, in turn, *decreases* register
pressure in one part of the program causing it to fail to register allocate
and spill. Given the overwhelmingly positive results in other shaders and
the fact that the NIR for the Kerbal shaders is actually better, this
should be considered a positive.
Reviewed-by: Matt Turner <mattst88@gmail.com>
total instructions in shared programs: 4422307 -> 4422363 (0.00%)
instructions in affected programs: 4230 -> 4286 (1.32%)
helped: 0
HURT: 12
While this does hurt some things, the losses are minor and it prevents the
compare-with-zero optimization from fighting with ffma which is much more
important.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we couldn't generate two algebraic passes in the same file
because of multiple structure definitions. To solve this, we play the
age-old header file trick and just #define around it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, NIR would just print 4 swizzle components if the swizzle was
anything other than foo.xyzw. This creates lots of noise if, for example,
you have a one-component element with a swizzle of foo.xxxx.
Reviewed-by: Kenneth Grunke <kenneth@whitecape.org>
Unused as of commit 630ab0d27ba(mesa: remove last of MAX_WIDTH,
MAX_HEIGHT). Update all the remaining references to the defines.
v2: Use the correct variable name in the comments
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>