ALIGN() brilliantly uses uintptr_t, making it unsafe for use with 64-bit
GPU addresses in 32-bit builds of the driver. Use align64() instead,
which uses uint64_t.
Fixes assertion failures when running any 32-bit program on Tigerlake.
Fixes: 2e6a7ced4d ("iris/gen12: Write GFX_AUX_TABLE base address register")
Fixes: 0d0290bb3f ("intel/common: Add surface to aux map translation table support")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3507>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3507>
Currently only implemented in the scalar backend, so only enable for
Gen8+. If support for the other opcodes is added to the vec4 backend,
Gen7 could be supported.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Use new lower_hadd64 and lower_usub_sat64 flags.
v3: Enable SPIR-V capability.
v4: Move lowering options to COMMON_SCALAR_OPTIONS. Suggested by Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Remove smashing type to D for nir_op_irhadd. Caio noticed it was
odd, and removing it fixes an assertion failure in the crucible
func.shader.averageRounded.int64_t test (because the source should be
W).
v3: Emit BRW_OPCODE_MUL directly for nir_op_umul_32x16 and
nir_op_imul_32x16. Suggested by Curro.
v4: Smash types of MUL instruction generated for nir_op_umul_32x16 and
nir_op_imul_32x16. With this change, I get the same assembly now as I
did with v2.
v5: Remove support for pre-Gen7. The integer multiply path was
incorrect, and, since the extension isn't enabled pre-Gen7, there's no
way to test it.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Add a big comment explaining the [IU]SUB_SAT lowering. Suggested by
Caio.
v3: Use get_fpu_lowered_simd_width in get_lowered_simd_width. Suggested
by Ken on IRC.
v4: Fix a typo in a comment. Noticed by Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Move the check to fs_visitor::lower_integer_multiplication.
Previously the cases where lowering was skipped, the original
instruction was removed by fs_visitor::lower_integer_multiplication.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
Gen4/5's rounding instructions operate differently than later Gens'.
They all return the floor of the input and the "Round-increment"
conditional modifier answers whether the result should be incremented by
1.0 to get the appropriate result for the operation (and thus its
behavior is determined by the round opcode; e.g., RNDZ vs RNDE).
Since this requires a second instruciton (a predicated ADD) that
consumes the result of the round instruction, the round instruction
cannot write its result directly to the (write-only) message registers.
By emitting the ADD in the generator, the backend thinks it's safe to
store the round's result directly to the message register file.
To avoid this, we move the emission of the ADD instruction to the NIR
translator so that the backend has the information it needs.
I suspect this also fixes code generated for RNDZ.SAT but since
Gen4/5 don't support GLSL 1.30 which adds the trunc() function, I
couldn't write a piglit test to confirm. My thinking is that if x=-0.5:
sat(trunc(-0.5)) = 0.0
But on Gen4/5 where sat(trunc(x)) is implemented as
rndz.r.f0 result, x // result = floor(x)
// set f0 if increment needed
(+f0) add result, result, 1.0 // fixup so result = trunc(x)
then putting saturate on both instructions will give the wrong result.
floor(-0.5) = -1.0
sat(floor(-0.5)) = 0.0
// +1 increment would be needed since floor(-0.5) != trunc(-0.5)
sat(sat(floor(-0.5)) + 1.0) = 1.0
Fixes: 6f394343b1 ("nir/algebraic: i2f(f2i()) -> trunc()")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2355
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3459>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3459>
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
[Eric: factor out the is_dir_or_link() check and fix a bug in v1]
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
v3: include directory path when lstat'ing files
v4: fix inverted check in enumerate_sysfs_metrics()
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2258>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2258>
As a result of 9baa33cef0 our backend compiler leaves params pretty
much untouched. So in order to avoid storing uninitialized values in
the shader cache blobs, just 0 out this array.
I've considered not even allocating this array which works on gen8+
but the vec4 backend still makes a copy of this array and so it
crashes on memcpy on HSW.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9baa33cef0 ("anv: Rework push constant handling")
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3516>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3516>
Specifically, execution size, register file, and register type. I did
not add validation for vertical stride and width because I don't believe
it's possible to have an otherwise valid instruction with an invalid
vertical stride or width, due to all of the other regioning
restrictions.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
In order to fuzz test instructions, we first need to do some sanity
checking first. Factoring out this function allows us an easy way to
validate a single instruction.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
16-bit immediates need to be replicated through the 32-bit immediate
field, so we should never see one that isn't.
This does happen however in the fuzzer unit test, so returning false
allows the fuzzer to reject this case.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
Previously we were sharing tables between generations that were nearly
identical (i.e., Gen8 3-src adds HF support) and used a small bit of
code to handle the differences. This is kind of a mess if you want to
reject 64-bit types on platforms that don't support 64-bit types, so
split the tables, allowing each generation's table to list exactly what
it supports.
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
Since the enum brw_reg_type is packed, comparisons with -1 don't work
directly, necessitating the cast. Add a macro to avoid this confusion.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
Two of the tests emit instructions with MRF destinations, and MRFs
aren't present on Gen7+. I think we were just lucky that this didn't
cause a problem earlier since we were running the tests on Gen7-9.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
Since the platforms don't support align1 3-src instructions, the
contents of these operands are not going to be meaningful. Just don't
print them to avoid hitting some assertions in brw_inst functions.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
When there's only one hardware thread (i.e. the dispatch width greater
or equal to the workgroup size), there's no need to use a barrier to
ensure all the invocations reach the same point in the shader, because
they are already running lock-step.
Results for SKL running Iris for shader-db tests with compute shaders
total sends in shared programs: 18361 -> 18339 (-0.12%)
sends in affected programs: 904 -> 882 (-2.43%)
helped: 9
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 2.44 x̃: 2
helped stats (rel) min: 0.84% max: 21.43% x̄: 7.82% x̃: 2.67%
95% mean confidence interval for sends value: -3.31 -1.58
95% mean confidence interval for sends %-change: -14.67% -0.97%
Sends are helped.
Shaders from Aztec Ruins, Car Chase, Manhattan and DeusEx are helped.
Results for ICL and TGL are similar to SKL.
Results for BDW are similar to SKL except for DeusEx shader that has a
workgroup size 16 but in BDW picks the SIMD8.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>
When there's only one hardware thread (i.e. the dispatch width greater
or equal to the workgroup size), there's no need to synchronize shared
memory access (SLM) since all the requests from a single thread are
already synchronized. In such case, we just add a scheduling fence.
To be able to identify that case for all platforms, move the handling
of platforms prior to Gen11 (which don't have a separate SLM fence)
after the optimization.
Results for SKL running Iris for shader-db tests with compute shaders
total sends in shared programs: 18395 -> 18361 (-0.18%)
sends in affected programs: 938 -> 904 (-3.62%)
helped: 9
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 3.78 x̃: 4
helped stats (rel) min: 1.56% max: 26.32% x̄: 10.33% x̃: 2.60%
95% mean confidence interval for sends value: -4.85 -2.71
95% mean confidence interval for sends %-change: -19.12% -1.54%
Sends are helped.
Shaders from Aztec Ruins, Car Chase, Manhattan and DeusEx are helped.
Results for ICL and TGL are similar to SKL.
Results for BDW are similar to SKL except for DeusEx shader that has a
workgroup size 16 but in BDW picks the SIMD8.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>
Like a SHADER_OPCODE_MEMORY_FENCE but doesn't doesn't generate any
assembly code.
Will be used when the compiler shouldn't reorder certain instructions
but there's no need to generate code for the HW to do it -- as the
ordering will be guaranteed by other means.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>
Instead of having a single physical device in anv_instance, have a
linked list of them. What we have now works today because we our GPUs
are build into the CPU and so you're guaranteed to only ever have one of
them. One day, that will change and we want ANV to be ready.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
This commit simply moves fetching the device info and checking if ANV
supports the device a bit higher up. This way we fail earlier and it'll
make error checking easier in the next commit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
We don't actually have genX versions of any physical device level
commands so we don't need the trampoline versions and we don't need to
have a separate table per physical device.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
There are very few times when we actually want to fetch the instance
from the anv_device. We can put up with a bit of pain there in exchange
for strongly discouraging people from doing this in general.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
Having to always pull the physical device from the instance has been
annoying for almost as long as the driver has existed. It also won't
work in a world where we ever have more than one physical device. This
commit adds a new field called "physical" to anv_device and switches
every location where we use device->instance->physicalDevice to use the
new field instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
Some formats, in particular YCbCr formats and ASTC have additional
restrictions. We already whack ASTC formats to RGBA32_UINT because the
hardware doesn't allow LINEAR with ASTC. However, we need to fix YCbCr
formats as well because they come with alignment restrictions that we
can't guarantee are satisfied. We're using blorp_copy to do the copies
so we may as well just stomp formats for everything.
Fixes: b24b93d584 "anv: enable VK_KHR_sampler_ycbcr_conversion"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3460>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3460>
This involves permuting the registers of barycentric vectors to have
the standard X[0-n] Y[0-n] layout at NIR translation time.
Barycentrics are converted to the format expected by the PLN
instruction in the lower_barycentrics() pass run after the
optimization loop.
Main reason is correctness of SIMD32 fragment shaders. The
shuffle_from_pln_layout() and shuffle_to_pln_layout() helpers used
during NIR translation are busted for SIMD32. This leads to serious
corruption at present with INTEL_DEBUG=do32, especially on Gen11+
where these helpers are hit more frequently due to the lack of a
hardware PLN instruction.
Of course one could have chosen to fix those helpers instead, but
there is another far more subtle issue that was reported during review
of the SIMD32 fragment shader codegen changes: The SIMD splitting pass
currently handles SIMD32 barycentric vectors as if they had the
standard X[0-n] Y[0-n] layout, even though they are interleaved for
the PLN instruction, which causes incorrect execution masks to be
applied to the MOVs unzipping barycentric vectors in cases where a
LINTERP instruction occurs under non-uniform control flow.
I'm not aware of any conformance regressions due to the latter issue
at present, but for our peace of mind let's move the conversion to the
PLN layout into the lower_barycentrics() pass run after
lower_simd_width().
This leads to the following shader-db improvements (including SIMD32
shaders) in combination with the previous back-end preparation changes
-- Without them (especially the copy propagation changes) this would
lead to a massive number of regressions. On ICL:
total instructions in shared programs: 20662316 -> 20466903 (-0.95%)
instructions in affected programs: 10538474 -> 10343061 (-1.85%)
helped: 68775
HURT: 6
total spills in shared programs: 8938 -> 8748 (-2.13%)
spills in affected programs: 376 -> 186 (-50.53%)
helped: 9
HURT: 5
total fills in shared programs: 8965 -> 8663 (-3.37%)
fills in affected programs: 965 -> 663 (-31.30%)
helped: 9
HURT: 6
LOST: 146
GAINED: 43
On SKL:
total instructions in shared programs: 18725867 -> 18614912 (-0.59%)
instructions in affected programs: 3876590 -> 3765635 (-2.86%)
helped: 27492
HURT: 2
LOST: 191
GAINED: 417
On SNB:
total instructions in shared programs: 14573613 -> 13980646 (-4.07%)
instructions in affected programs: 5199074 -> 4606107 (-11.41%)
helped: 29998
HURT: 0
LOST: 21
GAINED: 30
Results are somewhat less impressive but still significant without
SIMD32 fragment shaders enabled. On ICL:
total instructions in shared programs: 16148728 -> 16061659 (-0.54%)
instructions in affected programs: 6114788 -> 6027719 (-1.42%)
helped: 42046
HURT: 6
total spills in shared programs: 8218 -> 8028 (-2.31%)
spills in affected programs: 376 -> 186 (-50.53%)
helped: 9
HURT: 5
total fills in shared programs: 8953 -> 8651 (-3.37%)
fills in affected programs: 965 -> 663 (-31.30%)
helped: 9
HURT: 6
LOST: 0
GAINED: 3
On SKL:
total instructions in shared programs: 14927994 -> 14926738 (-0.01%)
instructions in affected programs: 168850 -> 167594 (-0.74%)
helped: 711
HURT: 2
On SNB:
total instructions in shared programs: 10770538 -> 10734403 (-0.34%)
instructions in affected programs: 2702172 -> 2666037 (-1.34%)
helped: 17818
HURT: 0
All of the hurt shaders are either spilling slightly more or emitting
additional NOP instructions due to the SIMD16 POW workaround for
Gen8-9 combined with differences in scheduling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>