Before commit 04895f5c we would only reswizzle dot product instructions
(since they wrote the same value into all channels, and we didn't have
to think about anything else). That commit extended reswizzling to cases
when the swizzle was single valued -- i.e., writing the same result into
all channels.
But allowing reswizzling of arbitrary things is actually really easy and
is even less code. (Why didn't we do this in the first place?!)
total instructions in shared programs: 4266079 -> 4261000 (-0.12%)
instructions in affected programs: 351933 -> 346854 (-1.44%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Despite the comment above the function claiming otherwise, the function
did not reswizzle sources, which would lead to bad code generation since
commit 04895f5c, which began claiming we could do such swizzling when we
could not.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82932
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The 'start' instruction is always in the current block, except for the
case of shader time, which emits code in a pattern seen no where else.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise, the basic block start/end IPs don't get updated properly,
leading to a broken CFG. This usually results in the following
assertion failure:
brw_fs_live_variables.cpp:141:
void brw::fs_live_variables::setup_def_use():
Assertion `ip == block->start_ip' failed.
Fixes KWin, WebGL demos, and a score of Piglit tests on Sandybridge and
earlier hardware.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The dump() methods don't alter the CFG or basic blocks, so we should
mark them as const. This lets you call them even if you have a const
cfg_t - which is the case in certain portions of the code (such as live
interval handling).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If the ENDIF instruction was the only instruction in its block, we'd
leave the successors of the merged if+jump block in a bad state.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83080
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As older versions of gnu ld did not support --dynamic-list check to see
if it is supported before using it. Non gnu linkers such the apple one
likely lack this option as well.
Fixes the build on OpenBSD which has binutils 2.15 and 2.17.
The --dynamic-list option seems to been have introduced sometime after
binutils 2.17 was released as it is present in 2.18.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Allows using prime fds as display target and from display target.
Test for PRIME capability after initializing kms_swrast screen.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Andreas Pokorny <andreas.pokorny@canonical.com>
Fixes binary garbage in the compilation logs caused by
compat::string::c_str() not being null-terminated (which is a bug on
its own that will be fixed in another commit).
Reported-by: EdB <edb+mesa@sigluy.net>
Reverts
* "i965: Modify state upload to allow 2 different sets of state atoms."
8e27a4d2b3
* "i965: Modify dirty bit handling to support 2 pipelines."
373143ed91
* "i965: Create a macro for checking a dirty bit."
c5bdf9be1e
Conflicts:
src/mesa/drivers/dri/i965/brw_context.h
* "i965: Create a macro for setting all dirty bits."
6f56e1424d
Conflicts:
src/mesa/drivers/dri/i965/brw_blorp.cpp
src/mesa/drivers/dri/i965/brw_state_cache.c
src/mesa/drivers/dri/i965/brw_state_upload.c
* "i965: Create a macro for setting a dirty bit."
88e3d404da
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
We can't rely on the value from the assembler if relative addressing is
used. So instead use the max of declared-consts (which does not include
compiler immediates) and what we get from the assembler (which does).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
all_delayed will also be true if we didn't attempt to schedule anything
due to no more instructions using current addr/pred. We rely on coming
in to block_sched_undelayed() to detect and clean up when there are no
more uses of the current addr/pred, which isn't necessarily an error.
This fixes a regression introduced in b823abed.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The split between these two didn't make much sense. I'm going to want the
chance to look at uniform contents in optimization passes, and the QPU
emit I think is going to end up rewriting the uniforms stream.
This common init routine can be used by constructors for multiple program
types.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Debugging a regression in discard support was just too full of duplicate
instructions, so I decided to remove them instead of re-analyzing each of
them as I dumped their outputs in simulation.
There were troubles with bools without using native integers
(st_glsl_to_tgsi seemed to think bool true was 1.0f sometimes, when as a
uniform it's stored as ~0), and since I've got native integers other than
divide, I might as well just support them.
Before, we had some special opcodes like CMP and SNE that emitted multiple
instructions. Now, we reduce those operations significantly, giving
optimization more to look at for reducing redundant operations.
The downside is that QOP_SF is pretty special -- we're going to have to
track it separately when we're doing instruction scheduling, and we want
to peephole it into the instruction generating the destination write in
most cases (and not allocate the destination reg, probably. Unless it's
used for some other purpose, as well).
A bool is 0 or ~0, and KILL_IF takes a float arg that's <0 for discard or
>= 0 for not. By negating it, we ended up doing a floating point subtract
of (0 - ~0), which ended up as an inf. To make this actually work, we
need to convert the bool to a float.
Reviewed-by: Brian Paul <brianp@vmware.com>
While similar in layout, the size of the SVGA3dSize type may be smaller than
the struct drm_vmw_size type that is part of the ioctl interface. The kernel
driver could accordingly overwrite a memory area following the size variable
on the stack. Typically that would be another local variable, causing
breakage in, for example, ubuntu 12.04.5 where the handle local variable
becomes overwritten.
v2: Fix whitespace errors
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Cc: "10.1 10.2 10.3" <mesa-stable@lists.freedesktop.org>
As explained in the previous commit, we want to avoid the possibility of
integer-multiplication overflow while allocating buffers.
In these two cases, the final allocation size is the product of three values:
one variable and two that are fixed constants at compile time.
In this commit, we move the explicit multiplication to involve only the
compile-time constants, preventing any overflow from that multiplication, (and
allowing calloc to catch any potential overflow from the remainining implicit
multiplication).
Reviewed-by: Matt Turner <mattst88@gmail.com>
In commit 32f2fd1c5d, several calls to
_mesa_calloc(x) were replaced with calls to calloc(1, x). This is strictly
equivalent to what the code was doing previously.
But for cases where "x" involves multiplication, now that we are explicitly
using the two-argument calloc, we can do one step better and replace:
calloc(1, A * B);
with:
calloc(A, B);
The advantage of the latter is that calloc will detect any overflow that would
have resulted from the multiplication and will fail the allocation, (whereas
the former would return a small allocation). So this fix can change
potentially exploitable buffer overruns into segmentation faults.
Reviewed-by: Matt Turner <mattst88@gmail.com>
It's been altering the tree and reporting "false" since January 2011.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, opt_copy_propagation_elements would always rewrite the
instruction stream, even if was the same thing as before. In order to
report progress correctly, we'll need to bail if the suggested
replacement is identical (or equivalent) to the original code.
This also introduced unnecessary noop swizzles, as far as I can tell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, if chans < 4, we passed uninitialized stack garbage to the
ir_swizzle constructor for the excess components. Thankfully, it
ignores that data, as it's unnecessary, so no harm actually comes of it.
However, it's obviously better to initialize it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ir_triop_csel can return a boolean expression, so we need to handle it
here; we simply forgot when we added it.
Fixes Piglit's EXT_shader_integer_mix/{vs,fs}-mix-if-bool.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
All shader stages have these fields, so it makes sense to store them in
the common base structure, rather than duplicating them in each.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In commit 46d03d37bf I renamed a Makefile target
from md5 to checksums, (as we switched from MD5 checksums to SHA-256
checksums, so the more general name is more future proof).
But that commit missed one mention of "md5" as a dependency of the .PHONY
target. Rename that here as well.