Reject the new formats in swr to prevent crashes because it doesn't
know how to handle the new formats.
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
gl_Viewport is also in the VUE header so we need to whack the read
offset to 0 and emit a default (no overrides) SBE_SWIZ entry in that
case as well.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes some CTS regressions.
Fixes: e61a826f39 ("ac/llvm: fix pointer type for global atomics")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This wires up the front facing value as a sysval, I'd like to
remove the other facing code but I'd need to confirm VMware
don't use it first.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Now that the Mali T720 GPU is supoprted at the same level as the T760,
test it on PINE64 H64 boards.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
During the past months, Panfrost has matured considerably and several
tests stopped being flaky or failing at all.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Support for this GPU is equal now to that of T760, so whitelist it.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Make sure that the fragment is complete when writing it out.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
We need to always upload anyway.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes dEQP-GLES3.functional.primitive_restart.*. Note the 0x18000 value
is accidentally somehow enabling primitive restart for some reason.
I'm not sure where this value came from but let's not.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
The algorithm is as described. Nothing fancy here, just need to add some
new code paths depending on which model we're running on.
Tomeu:
- Also disable tiling when !hierarchy and !vertex_count
- Avoid creating polygon lists smaller than the minimum when
vertex_count > 0 but tile size smaller than 16 byte
- Take into account tile size when calculating polygon list size for
!hierarchy
- Allow 0-sized tiles in a single dimension
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
We've figured out most of the big pieces, and though it looks faintly
like other Midgards, it's much simpler.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Similarly to how it's already done in the compiler, add a way to express
differences between GPU models that need to be taken into account when
assembling the cmdstream.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Since a is non-negative, neither fsqrt nor frsq should return NaN. frsq
should only return Inf when fsqrt returns 0.
The changes are pretty small, but this turns a few hundred hurt shaders
in the next patch into helped shaders.
An alternative to the intBitsToFloat is to import numpy and do
np.finfo(np.float32).max. That's more explicit, but we may also want to
have specific bit encodings of float values later. I could be convinced
either way, but intBitsToFloat(0x7f7fffff) was what I implemented first.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14661140 -> 14661104 (<.01%)
instructions in affected programs: 7520 -> 7484 (-0.48%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.32% max: 0.61% x̄: 0.49% x̃: 0.52%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.52% -0.47%
Instructions are helped.
total cycles in shared programs: 228585416 -> 228584806 (<.01%)
cycles in affected programs: 56321 -> 55711 (-1.08%)
helped: 32
HURT: 0
helped stats (abs) min: 2 max: 98 x̄: 19.06 x̃: 10
helped stats (rel) min: 0.08% max: 6.41% x̄: 1.09% x̃: 0.65%
95% mean confidence interval for cycles value: -28.32 -9.80
95% mean confidence interval for cycles %-change: -1.63% -0.54%
Cycles are helped.
Sandy Bridge
total cycles in shared programs: 152991077 -> 152991075 (<.01%)
cycles in affected programs: 11525 -> 11523 (-0.02%)
helped: 2
HURT: 2
helped stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.07% max: 0.11% x̄: 0.09% x̃: 0.09%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08%
95% mean confidence interval for cycles value: -5.27 4.27
95% mean confidence interval for cycles %-change: -0.16% 0.15%
Inconclusive result (value mean confidence interval includes 0).
No changes on Iron Lake or GM45.
In many cases, fsat, fneg, fabs, ineg, and iabs will get folded into
another instruction as either source or destination modifiers.
Counting them as instructions means that some if-statements won't get
converted to selects. For example,
vec1 32 ssa_25 = flt32 ssa_0, ssa_23.x
/* succs: block_1 block_2 */
if ssa_25 {
block block_1:
/* preds: block_0 */
vec1 32 ssa_26 = fabs ssa_24
vec1 32 ssa_27 = fneg ssa_26
vec1 32 ssa_28 = fabs ssa_20
vec1 32 ssa_29 = fneg ssa_28
vec1 32 ssa_30 = fmul ssa_27, ssa_29
vec1 32 ssa_31 = fsat ssa_30
/* succs: block_3 */
} else {
block block_2:
/* preds: block_0 */
/* succs: block_3 */
}
block block_3:
/* preds: block_1 block_2 */
block_1 isn't really 6 instructions, but it will be counted that way.
Most callers of the peephole_select pass use either 1 or 8. It's very
easy to blow way past either of these limits with things that are really
only one or two actual instructions.
I also tried some fancier things like making sure the fsat was of
another SSA def from the same block, but the simple test was actually
better.
The i965 back-end SEL peephole pass still helps ~700 shaders in
shader-db with this change.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All Gen6+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14743694 -> 14738910 (-0.03%)
instructions in affected programs: 156575 -> 151791 (-3.06%)
helped: 1204
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 3.97 x̃: 3
helped stats (rel) min: 0.15% max: 19.57% x̄: 5.15% x̃: 4.55%
95% mean confidence interval for instructions value: -4.12 -3.82
95% mean confidence interval for instructions %-change: -5.35% -4.95%
Instructions are helped.
total cycles in shared programs: 231749141 -> 231602916 (-0.06%)
cycles in affected programs: 2818975 -> 2672750 (-5.19%)
helped: 876
HURT: 322
helped stats (abs) min: 2 max: 788 x̄: 180.99 x̃: 220
helped stats (rel) min: <.01% max: 43.82% x̄: 20.75% x̃: 19.44%
HURT stats (abs) min: 1 max: 1188 x̄: 38.27 x̃: 20
HURT stats (rel) min: 0.09% max: 102.67% x̄: 5.17% x̃: 1.70%
95% mean confidence interval for cycles value: -130.47 -113.64
95% mean confidence interval for cycles %-change: -14.85% -12.72%
Cycles are helped.
total sends in shared programs: 730495 -> 730491 (<.01%)
sends in affected programs: 46 -> 42 (-8.70%)
helped: 2
HURT: 0
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8122757 -> 8122617 (<.01%)
instructions in affected programs: 14716 -> 14576 (-0.95%)
helped: 46
HURT: 1
helped stats (abs) min: 1 max: 8 x̄: 3.07 x̃: 3
helped stats (rel) min: 0.36% max: 10.00% x̄: 2.54% x̃: 1.06%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 1.59% max: 1.59% x̄: 1.59% x̃: 1.59%
95% mean confidence interval for instructions value: -3.42 -2.54
95% mean confidence interval for instructions %-change: -3.28% -1.62%
Instructions are helped.
total cycles in shared programs: 188510100 -> 188509780 (<.01%)
cycles in affected programs: 58994 -> 58674 (-0.54%)
helped: 32
HURT: 1
helped stats (abs) min: 2 max: 96 x̄: 10.06 x̃: 6
helped stats (rel) min: 0.05% max: 15.29% x̄: 1.37% x̃: 0.31%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.68% max: 0.68% x̄: 0.68% x̃: 0.68%
95% mean confidence interval for cycles value: -16.34 -3.06
95% mean confidence interval for cycles %-change: -2.46% -0.15%
Cycles are helped.
Reworks:
* Adjust comment to list the state packets that curro found to be
affected.
Fixes: 8125d7960b ("intel/dev: Add preliminary device info for Tigerlake")
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
pthread_getcpuclockid() and clock_gettime() are also available on at least
OpenBSD, FreeBSD, NetBSD, DragonFly, Cygwin.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Enabling this option makes Intel Gen8-11 hardware load the 'iris'
driver by default instead of the older 'i965' driver.
Regardless of how this option is set, users can still override which
driver the loader selects via two methods. The first is to create a
~/.drirc or /etc/drirc file with the following snippet:
<driconf>
<device driver="loader" kernel_driver="i915">
<option name="dri_driver" value="i965" />
</device>
</driconf>
The other option is to set an environment variable:
export MESA_LOADER_DRIVER_OVERRIDE=i965
For now, "prefer_iris" defaults to i965 (the historical choice).
A separate future patch will change the default driver to iris.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1893
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Some backends require that there are no array varyings.
If there were no arrays in the input shader, the pass shouldn't have to
create new ones.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2103
Fixes: bcd14756ee ('nir/lower_io_to_vector: add flat mode')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Improves generated code of dEQP-VK.graphicsfuzz.disc-and-add-in-func-in-loop
because a loop exit phi can then be fixed to exec, removing copies and
improving jump threading.
No pipeline-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
ACO considers discards jumps and creates edges in the CFG for them but NIR
does neither of these.
This can be fixed instead by keeping track of whether a side of an IF had
a break/discard, but this doesn't solve the issue with discards affecting
loop exit phis. So this reworks phi handling a bit.
Fixes these tests:
dEQP-VK.graphicsfuzz.disc-and-add-in-func-in-loop
dEQP-VK.graphicsfuzz.loop-call-discard
dEQP-VK.graphicsfuzz.complex-nested-loops-and-call
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Right now there are two copies of mm:
* mesa/main/mm.[ch]
* gallium/auxiliary/util/u_mm.[ch]
At some point they splitted, and from the commit message it was not
clear why it was not possible to have only one copy at a common place.
Taking into account that was several years ago, Im assuming that it
was not possible then.
This change would allow to have one copy of the same code, and also
being able to use that code out of mesa/main or gallium, if needed.
This commit moves u_mm and removes mm, as u_mm has slightly more
changes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Fixes: 13ab63bb62 ('radv: Implement VK_EXT_buffer_device_address.')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Stronger ordering is implemented in SPIRV->NIR with barriers.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This exposes GL_TDFX_texture_compression_FXT1 support. It's ancient,
only Intel GPUs appear to support it, and I seriously doubt anybody
uses it. But i965 supports it, and it's trivial to do, so we may as
well support it in the new iris driver as well.
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric recently added PIPE_FORMAT_FXT1_RGB[A] as part of his format
unification work. This was really most of the work of implementing
the extension. We just need to handle it in a couple of places and
expose the extension.
v2: Reject the new formats in llvmpipe_is_format_supported to prevent
crashes because it doesn't know how to handle the new formats.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
This allows this pass to be run multiple times and the results are
just or'ed together.
It fixes on test on llvmpipe nir, and regresses none.
Suggested by Kenneth
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Moved to RADV. No pipeline-db changes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
This shouldn't introduce any functional changes for RadeonSI
when NIR is enabled because these operations are already lowered.
pipeline-db (NAVI10/LLVM):
SGPRS: 9043 -> 9051 (0.09 %)
VGPRS: 7272 -> 7292 (0.28 %)
Code Size: 638892 -> 621628 (-2.70 %) bytes
LDS: 1333 -> 1331 (-0.15 %) blocks
Max Waves: 1614 -> 1608 (-0.37 %)
Found this while glancing at some F12019 shaders.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The reference guide is incorrect and SADDR is actually used with FLAT on
GFX10.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
They can take both a definition and data operand
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>