To clean up compilation warnings about unused variables
when asserts are disabled.
v2: UNUSED -> ASSERTED (Eric Engestrom)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19016>
Since divergence is a lot more likely in RT than compute, it makes
sense to limit ourselves to SIMD8.
The trampoline shader defaults to SIMD16 since this one is uniform.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
There will be situations where we will want to use a local builder
rather than the one associated with NIR->backend translation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
VS, TCS, TES, and GS threads must end with a URB write message with the
EOT (end of thread) bit set. For VS and TES, we shadow output variables
with temporaries and perform all stores at the end of the shader, giving
us an existing message to do the EOT.
In tessellation control shaders, we don't defer output stores until the
end of the thread like we do for vertex or evaluation shaders. We just
process store_output and store_per_vertex_output intrinsics where they
occur, which may be in control flow. So we can't guarantee that there's
a URB write being at the end of the shader.
Traditionally, we've just emitted a separate URB write to finish TCS
threads, doing a writemasked write to an single patch header DWord.
On Broadwell, we need to set a "TR DS Cache Disable" bit, so this is
a convenient spot to do so. But on other platforms, there's no such
field, and this write is purely wasteful.
Insetad of emitting a separate write, we can just look for an existing
URB write at the end of the program and tag that with EOT, if possible.
We already had code to do this for geometry shaders, so just lift it
into a helper function and reuse it.
No changes in shader-db.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>
And run algebraic when either int64 for float64 are not supported so
those don't end up in the generated code.
Cc: mesa-stable
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
Move subgroup_id, that's only used by CS for verx10 < 125, as part of
the payload too -- even though is not, strictly speaking.
Note the thread execution of Task/Mesh is similar enough, so we make
their common struct inherit from cs_thread_payload.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
Move the setup into the FS thread payload constructor. Consolidate
payload setup for that in brw_fs_thread_payload.cpp file.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
We're now able to load up to 8 GRFs in one send.
v2: Switch to use transpose + vector of up to 64 (Thanks Curro!)
v3: Increase parallelism by not reusing the same register for push
constant offset (Curro)
v4: Drop dead ADD() instruction (Curro)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17555>
Make it clearer we are dealing with multiple patches,
works better in constrast with SINGLE_PATCH.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18151>
This backend lowering code has been dead since the removal of i965 -
nothing in the current source tree ever sets the flag.
This is handled by iris_setup_uniforms() and crocus_setup_uniforms().
Variable group size does not appear to be a feature in anv.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18055>
That way we can look at the SBT entries for debug purposes.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17908>
Bspec 53421 says:
"A URB fence memory is typically performed prior the thread
exit message, so that the next thread dispatch that reads
that URB memory will see it."
Cc: 22.1 <mesa-stable>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16665>
This allows the software scoreboarding pass, scheduler, and so on
to handle the individual instructions and handle them, rather than
trusting in the generator to do scoreboarding correctly when expanding
the virtual instruction to multiple actual instructions.
By using SHADER_OPCODE_READ_SR_REG, we also correctly handle the
software scoreboarding workaround when reading DMask/VMask.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17530>
No shader-db changes on any Intel platform
Fossil-db results:
Tiger Lake
Instructions in all programs: 156926440 -> 156926470 (+0.0%)
Instructions hurt: 15
Cycles in all programs: 7513099349 -> 7513099402 (+0.0%)
Cycles hurt: 15
Ice Lake and Skylake had similar results. (Ice Lake shown)
Cycles in all programs: 9099036492 -> 9099036489 (-0.0%)
Cycles helped: 1
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
The changes to fs_visitor::validate() helped track down a place where I
initially forgot to convert a message to the new sources layout. This
had caused a different validation failure in
dEQP-GLES31.functional.tessellation.tesscoord.triangles_equal_spacing,
but this were not detected until after SENDs were lowered.
Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 19951145 -> 19951133 (<.01%)
instructions in affected programs: 2429 -> 2417 (-0.49%)
helped: 8 / HURT: 0
total cycles in shared programs: 858904152 -> 858862331 (<.01%)
cycles in affected programs: 5702652 -> 5660831 (-0.73%)
helped: 2138 / HURT: 1255
Broadwell
total cycles in shared programs: 904869459 -> 904835501 (<.01%)
cycles in affected programs: 7686744 -> 7652786 (-0.44%)
helped: 2861 / HURT: 2050
Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 141442369 -> 141442032 (-0.0%)
Instructions helped: 337
Cycles in all programs: 9099270231 -> 9099036492 (-0.0%)
Cycles helped: 40661
Cycles hurt: 28606
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
[anholt: changed to make all drivers do the right thing by moving the
payload barycentric check into the compiler]
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17381>
Now that this information is accurately gathered by spirv_to_nir, we no
longer need the hack. We just need to fix up the way we handle some of
the key bits.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
Thanks to the previous commit, we no longer need this check.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
This lets us drop demote_sample_qualifiers as well as a back-end check
for key->multisample_fbo.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
NIR constructs this information for us as part of nir_gather_info these
days so we can simplify our logic a bit. This will also let us be more
correct once we move uses_sample_shading scraping earlier.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
The lowering is currently fake. It just changes the opcode from the
_LOGICAL version to the non-_LOGICAL version.
v2: Remove some rebase cruft. 's/gfx8_//;s/simd8_/' in
brw_instruction_name. Both suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
With stages dispatching with a mask, we can run into situations where
we don't have the global address in all lanes. The existing code
always assumed we had the addres in at least lane0.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: bb40e999d1 ("intel/nir: use a single intel intrinsic to deal with ray traversal")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17330>
While we've taken advantage of split-sends in select situations, there
are many other cases (such as sampler messages, framebuffer writes, and
URB writes) that have never received that treatment, and continued to
use monolithic send payloads.
This commit introduces a new optimization pass which detects SEND
messages with a single payload, finds an adjacent LOAD_PAYLOAD that
produces that payload, splits it two, and updates the SEND to use both
of the new smaller payloads.
In places where we manually used split SENDS, we rely on underlying
knowledge of the message to determine a natural split point. For
example, header and data, or address and value.
In this pass, we instead infer a natural split point by looking at the
source registers. Often times, consecutive LOAD_PAYLOAD sources may
already be grouped together in a contiguous block, such as a texture
coordinate. Then, there is another bit of data, such as a LOD, that
may come from elsewhere. We look for the point where the source list
switches VGRFs, and split it there. (If there is a message header, we
choose to split there, as it will naturally come from elsewhere.)
This not only reduces the payload sizes, alleviating register pressure,
but it means that we may be able to eliminate some payload construction
altogether, if we have a contiguous block already and some extra data
being tacked on to one side or the other.
shader-db results for Icelake are:
total instructions in shared programs: 19602513 -> 19369255 (-1.19%)
instructions in affected programs: 6085404 -> 5852146 (-3.83%)
helped: 23650 / HURT: 15
helped stats (abs) min: 1 max: 1344 x̄: 9.87 x̃: 3
helped stats (rel) min: 0.03% max: 35.71% x̄: 3.78% x̃: 2.15%
HURT stats (abs) min: 1 max: 44 x̄: 7.20 x̃: 2
HURT stats (rel) min: 1.04% max: 20.00% x̄: 4.13% x̃: 2.00%
95% mean confidence interval for instructions value: -10.16 -9.55
95% mean confidence interval for instructions %-change: -3.84% -3.72%
Instructions are helped.
total cycles in shared programs: 848180368 -> 842208063 (-0.70%)
cycles in affected programs: 599931746 -> 593959441 (-1.00%)
helped: 22114 / HURT: 13053
helped stats (abs) min: 1 max: 482486 x̄: 580.94 x̃: 22
helped stats (rel) min: <.01% max: 78.92% x̄: 4.76% x̃: 0.75%
HURT stats (abs) min: 1 max: 94022 x̄: 526.67 x̃: 22
HURT stats (rel) min: <.01% max: 188.99% x̄: 4.52% x̃: 0.61%
95% mean confidence interval for cycles value: -222.87 -116.79
95% mean confidence interval for cycles %-change: -1.44% -1.20%
Cycles are helped.
total spills in shared programs: 8387 -> 6569 (-21.68%)
spills in affected programs: 5110 -> 3292 (-35.58%)
helped: 359 / HURT: 3
total fills in shared programs: 11833 -> 8218 (-30.55%)
fills in affected programs: 8635 -> 5020 (-41.86%)
helped: 358 / HURT: 3
LOST: 1 SIMD16 shader, 659 SIMD32 shaders
GAINED: 65 SIMD16 shaders, 959 SIMD32 shaders
Total CPU time (seconds): 1505.48 -> 1474.08 (-2.09%)
Examining these results: the few shaders where spills/fills increased
were already spilling significantly, and were only slightly hurt. The
applications affected were also helped in countless other shaders, and
other shaders stopped spilling altogether or had 50% reductions. Many
SIMD16 shaders were gained, and overall we gain more SIMD32, though many
close to the register pressure line go back and forth.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17018>
This structure will contain the opcode mapping tables in the next
commit. For now, this is the mechanical change to plumb it into all
the necessary places, and it continues simply holding devinfo.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
fs_visitor::assign_curb_setup() maps UNIFORM registers to HW regs,
and contains the following assert:
assert(inst->src[i].stride == 0);
emit_a64_oword_block_header's striding tricks run afoul of this
restriction, by producing stride 1 values on a 64-bit UNIFORM source.
Work around this by copying the UNIFORM value to a VGRF first.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16938>
In opt_algebraic(), handle TYPE_DF in a different check than TYPE_Q. We have a
separate flag for each type, use separate checks so platforms where one is true
and the other is not can work properly.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15835>
For cases with lots of very small primitives, this may improve
performance because we're not executing those dead channels all the
time.
Shader-db reports no instruction or cycle-count changes. However, by
hacking up the driver to report when this optimization triggers, it
appears to affect about 10% of shader-db.
v2 (Kenneth Graunke): Always enable VMask prior to XeHP for now,
because using VMask on those platforms allows us to perform the
eliminate_find_live_channel() optimization. However, XeHP doesn't
seem to have packed fragment shader dispatch, so we lose that
optimization regardless, and there's no reason not to avoid vmask.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1054>
If we saw a HALT instruction, we would forget to invalidate our analysis
pass information before returning progress.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16677>