Preload exactly what the shader needs, based on the compiler's mask of
uninitialized registers, rather than trying to sync pan_shader.h with
the behaviour of code gen. Would've saved me some debugging over the
years...
As a bonus this avoids preloading unnecessary registers, particularly in
compute shaders. In theory this should reduce power consumption.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14154>
On XeHP+, Binding Table Pointers are an offset relative to the Surface
State Base Address anymore. Instead, they are relative to the State
Binding Table Pool Address, which is set by the command above.
We emit that command (pointing to the same address as the Surface
State Base Addresss), and everything should stay working as before.
Reworks:
* Jordan: Add iris
* Jordan: Drop i965
* Ken: Set MOCS to avoid a major perf impact. (Found by Felix DeGrood.)
* Jordan: Shrink size from 2MiB to actual iris, anv usage
* Lionel: Add BINDING_TABLE_POOL_BLOCK_SIZE
Ref: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4995
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
[jordan.l.justen@intel.com: Add Iris, adjust sizes]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13992>
- Default c_std is now c11, no need to workaround
89b4f337d5 ("c_std=c11 in meson default_options")
- gallium-xlib has been renamed to xlib:
76791db088 ("mesa/x11: Remove the swrast-classic-based fake libGL")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14216>
This isn't something that ANV or RADV have cared about in a long time
but, as people bring up new Vulkan drivers, shipping Vulkan 1.0 is still
a thing that happens in Mesa. The common code should also implement the
1.0 rules.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14150>
Before meson 0.48 the cpu_family() would return 'ppc64le' on little
endian power8. In newer versions it returns 'ppc64' and endianness
should be checked with endian()
We now require meson >= 0.53 so we can drop the compatability with
older versions.
The old behavior was added in e430a034b9
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14240>
While our LIFO scheduling mode attempts to optimize for register
pressure, it's often hard for a scheduling algorithm to do better than
the instruction order provided by the shader author. Shader authors
often do perfectly reasonable things like using texture results
immediately after fetching them or constructing texture coordinates
immediately before the texture op. When we throw all the instruction
ordering information away, we loose any help the author may have given
us. By attempting NONE before we fall back to the worst case LIFO mode.
And, yes, I tried this with NONE both before and after LIFO and doing
NONE before LIFO is substantially better, according to shader-db.
total instructions in shared programs: 19673152 -> 19665202 (-0.04%)
instructions in affected programs: 33669 -> 25719 (-23.61%)
helped: 20
HURT: 0
helped stats (abs) min: 15 max: 4609 x̄: 397.50 x̃: 107
helped stats (rel) min: 2.33% max: 67.50% x̄: 14.60% x̃: 9.12%
95% mean confidence interval for instructions value: -867.61 72.61
95% mean confidence interval for instructions %-change: -21.74% -7.46%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 935562500 -> 935020920 (-0.06%)
cycles in affected programs: 18620349 -> 18078769 (-2.91%)
helped: 104
HURT: 48
helped stats (abs) min: 88 max: 60986 x̄: 8031.48 x̃: 3680
helped stats (rel) min: 0.61% max: 51.44% x̄: 14.95% x̃: 8.87%
HURT stats (abs) min: 10 max: 54724 x̄: 6118.62 x̃: 1530
HURT stats (rel) min: 0.13% max: 46.45% x̄: 10.28% x̃: 6.46%
95% mean confidence interval for cycles value: -5724.34 -1401.71
95% mean confidence interval for cycles %-change: -9.86% -4.10%
Cycles are helped.
total spills in shared programs: 12158 -> 10327 (-15.06%)
spills in affected programs: 1831 -> 0
helped: 20
HURT: 0
total fills in shared programs: 14749 -> 12635 (-14.33%)
fills in affected programs: 2114 -> 0
helped: 20
HURT: 0
LOST: 8
GAINED: 649
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
The way the current scheduler loop is implemented, each scheduling pass
starts with what the previous pass had. This means that, if PRE screwed
everything up majorly, PRE_NON_LIFO would have to try to fix it. It
also meant that tiny changes to one pass would affect every later pass.
Instead, reset the order of the instructions before each scheduling
pass. This makes the passes entirely independent of each other.
Shader-db results on Ice Lake:
total instructions in shared programs: 19670486 -> 19670648 (<.01%)
instructions in affected programs: 25317 -> 25479 (0.64%)
helped: 2
HURT: 7
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.07% max: 0.07% x̄: 0.07% x̃: 0.07%
HURT stats (abs) min: 8 max: 70 x̄: 24.29 x̃: 12
HURT stats (rel) min: 0.41% max: 4.95% x̄: 1.47% x̃: 0.87%
95% mean confidence interval for instructions value: -1.28 37.28
95% mean confidence interval for instructions %-change: -0.04% 2.30%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 935535948 -> 935490243 (<.01%)
cycles in affected programs: 421994824 -> 421949119 (-0.01%)
helped: 1269
HURT: 879
helped stats (abs) min: 1 max: 12008 x̄: 259.38 x̃: 52
helped stats (rel) min: <.01% max: 28.02% x̄: 1.12% x̃: 0.14%
HURT stats (abs) min: 1 max: 29931 x̄: 322.46 x̃: 20
HURT stats (rel) min: <.01% max: 32.17% x̄: 1.74% x̃: 0.22%
95% mean confidence interval for cycles value: -71.37 28.81
95% mean confidence interval for cycles %-change: -0.11% 0.21%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 12403 -> 12430 (0.22%)
spills in affected programs: 1355 -> 1382 (1.99%)
helped: 2
HURT: 7
total fills in shared programs: 15128 -> 15182 (0.36%)
fills in affected programs: 3294 -> 3348 (1.64%)
helped: 2
HURT: 7
LOST: 21
GAINED: 28
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
This reverts commit ba2fa1ceaf. Doing
optimizations after scheduling but before RA means doing them in the
middle of the scheduling loop which introduces additional dependencies
between one scheduling iteration and the next. That won't work if we
want to make the scheduling modes independent, at least not unless we
have some way of fully cloning the IR.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
brw_find_next_block_end() scans through the instructions to find the end
of the block. We were calling it for every instruction in the program
which is, if you have a single basic block, makes the whole mess a nice
clean O(n^2) when it really doesn't need to be. Instead, only call
brw_find_next_block_end() as-needed. This brings it back to O(n) like
it should have been.
This cuts the runtime of the following Vulkan CTS on my SKL box by 5%
from 1:51 to 1:45: dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
Now that we're being conservative in the pass, it's easy to tell when it
makes progress and we can put it in the OPT() macro. This way, we get
nice INTEL_DEBUG=optimizer dumps for it. While we're here, fix the
header comment which is massively out-of-date.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
Instead of modifying every single instruction, keep track of which VGRFs
are actually split in a bit-set, and only modify the instructions that
actually touch split regs.
This cuts the runtime of the following Vulkan CTS on my SKL box by 45%
from 3:21 to 1:51: dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
SPV_EXT_demote_to_helper_invocation added OpDemoteToHelperInvocation
operation to turn an invocation into a helper invocation, but the
value of HelperInvocation (a builtin from Input storage class)
couldn't be modified dynamically without breaking compatibility.
For the extension the operation OpIsHelperInvocation was added to get
the dynamic value.
For SPIR-V 1.6, the demote operation was promoted, but now to get the
dynamic value the shader must issue a load to HelperInvocation with
Volatile memory access semantics.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14209>