With everything in place, we can now use the scalar backend compiler for
vertex shaders on BDW+. We make scalar vertex shaders the default on
BDW+ but add a new vec4vs debug option to force the vec4 backend.
No piglit regressions.
Performance impact is minimal, I see a ~1.5 improvement on the T-Rex
GLBenchmark case, but in general it's in the noise. Some of our
internal synthetic, vs bounded benchmarks show great improvement, 20%-40%
in some cases, but real-world cases are mostly unaffected.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that fs_visitor::run is back to being only fragment
shader compilation, we can clean up a few stage == MESA_SHADER_FRAGMENT
conditions and rename it to run_fs.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch uses the previous refactoring to add a new run_vs() method
that generates vertex shader code using the scalar visitor and
optimizer.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These structs aren't vec4 specific, they are shared by shader stages
operating on Vertex URB Entries (VUEs). VUEs are the data structures in
the URB that hold vertex data between the pipeline geometry stages.
Using vue in the name instead of vec4 makes a lot more sense, especially
when we add scalar vertex shader support.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The scalar vertex shader will use the ATTR register file for vertex
attributes. This patch adds support for the ATTR file to fs_visitor.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This chunk of code is repeated in a few places, and we're going to add
a MESA_SHADER_VERTEX case to it soon.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This flag signals that we have a SIMD8 VS shader so we can set up the
corresponding state accordingly. This boils down to setting
the BDW+ SIMD8 enable bit in 3DSTATE_VS and making UBO and pull
constant buffers use dword pitch.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is all we need from the generator for SIMD8 vertex shaders. This
opcode is just the send instruction, all the hard work will happen
in the visitor using LOAD_PAYLOAD.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that the caller passes in the shader debug name, we don't need this
anymore.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
fs_generator no longer knows what stage it's generating code for, so
we have to set the debug name of the shader from the call site.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This removes all stage specific data from the generator, and lets us
create a generator for any stage.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't propagate the saturate bit and some instructions can't
saturate at all. If the source has saturate set, just skip propagation.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Back to the original commit (8313f444) adding the workaround, we were
enabling it on gens <= 7, even though gens <= 5 can't do multisampling.
I cannot find documentation that says that Sandybridge needs this
workaround but in practice disabling it causes these piglit tests to
fail:
EXT_framebuffer_multisample/interpolation {2,4} centroid-deriv{,-disabled}
On Ironlake:
total instructions in shared programs: 4358478 -> 4349671 (-0.20%)
instructions in affected programs: 117680 -> 108873 (-7.48%)
A bunch of shaders in TF2, Portal 2, and L4D2 are cut by 25~30%.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This way we get a warning if an enum value is not handled.
v2: codestyle
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
On evergreen there are 4 regs, on r600/700 there is only one.
Don't initialise regs and trash someone elses state.
Not sure this fixes anything, but hey one less stupid.
Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.3 10.4" mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
This means another pass of reordering the uniform data store, but it lets
us pair up a lot more instructions.
total instructions in shared programs: 44639 -> 43176 (-3.28%)
instructions in affected programs: 36938 -> 35475 (-3.96%)
This is a standard scheduling heuristic, and clearly helps.
total instructions in shared programs: 46418 -> 44467 (-4.20%)
instructions in affected programs: 42531 -> 40580 (-4.59%)