Just assert that the array will fit whatever the MAX is for a given
Gfx version.
Fixes: 172c1ab984 ("intel/elk: Add ELK_MAX_MRF_ALL for static allocating arrays")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32978>
This patch exposes shader hashes (computes and draws) to Perfetto and
utrace. By including these hashes in traces, developers can correlate
compute and draw calls with their assoicated ASM dumps when analyzing
the traces.
To achieve this, intel_tracepoint.py has been reworked to preprocess
tracepoint arguments dynamically. Any argument containing "hash" in its
variable name is now forrmated as hexadecimal before being passed to the
tracepoint definition.
Signed-off-by: Michael <michael.cheng@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32708>
Was causing trouble in some build configurations, we don't really need
them. Unless there's a good reason, defaults to use ralloc for
consistency with the larger codebase.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Antonio Ospite <None>
Reviewed-by: Kenneth Graunke <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32916>
Was causing trouble in some build configurations, we don't really need
them. Use ralloc for consistency.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Antonio Ospite <None>
Reviewed-by: Kenneth Graunke <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32916>
Use 3DSTATE_URB_ALLOC_* instruction to program URB for multislice device
config.
In case only one slice is available in the device, SliceN fields will be
ignored by HW.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32736>
Use 3DSTATE_URB_ALLOC_* instruction to program URB for multislice device
config.
In case only one slice is available in the device, SliceN fields will be
ignored by HW.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32736>
It could run the companion batch buffer even if the main batch buffer
failed, that was possible to happen in i915 and Xe KMD.
In case the main context/queue is banned and companion is not it could
still return that submission was properly start what was not.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32850>
On i915 it could be executing the main batch buffer in
i915_queue_exec_locked() even if the perf query batch buffer failed.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32850>
Fix defect reported by Coverity Scan.
Side effect in assertion (ASSERT_SIDE_EFFECT)
assert_side_effect: Argument ++eot_count of assert() has a side effect.
The containing function might work differently in a non-debug build.
Fixes: ebd6738260 ("intel/elk/chv: Implement WaClearArfDependenciesBeforeEot")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32884>
Marek recently changed hole_size to be signed, rather than unsigned.
A negative hole_size means that the two loads overlap - and thus are
prime candidates to be combined.
My original hole_size handling was:
if hole_size > 4 * (8 - low->num_components) then don't vectorize
For non-overlapping loads, this worked: NIR's largest vector is vec16,
and if low was already a vec16, combining it with anything would exceed
that, so it'd never be considered. That meant low would always be a
vec8 or less, so (8 - low->num_components) was a positive number.
Now that we see overlapping loads, we can see a vec16 low, vec4 high,
and also a negative hole size, giving us fun comparisons like:
-16 > 4 * (8 - 16) => -16 > -32 => true, don't vectorize
Which is absolutely the wrong thing to do, because the high load's data
is entirely included within the former load's data.
The idea here was to make sure the second load would be able to pack at
least one component into the first's V8 result. But even this isn't the
best, because...even if it's simply adjacent, doing one V16 load is more
efficient than requesting two back to back V8 loads.
So, we just simplify down to a static check: if there's an entire V8 of
hole, don't vectorize. This already won't happen because the core pass
has max_hole set to 28 bytes (7 32-bit components), but that could
change based on the needs of other drivers, so let's be defensive.
fossil-db results on Alchemist:
Instrs: 161533978 -> 161295137 (-0.15%); split: -0.20%, +0.05%
Subgroup size: 8092544 -> 8092568 (+0.00%)
Send messages: 7915233 -> 7844503 (-0.89%); split: -0.94%, +0.05%
Cycle count: 16577700697 -> 16702609256 (+0.75%); split: -0.59%, +1.35%
Spill count: 72338 -> 67226 (-7.07%); split: -7.36%, +0.29%
Fill count: 134058 -> 125980 (-6.03%); split: -6.83%, +0.80%
Scratch Memory Size: 4092928 -> 3786752 (-7.48%); split: -7.53%, +0.05%
Max live registers: 33031460 -> 32945994 (-0.26%); split: -0.27%, +0.01%
Max dispatch width: 5778384 -> 5778536 (+0.00%); split: +0.26%, -0.26%
Non SSA regs after NIR: 179809505 -> 152735471 (-15.06%); split: -15.08%, +0.03%
Fixes: c21bc65ba7 ("nir/opt_load_store_vectorize: make hole_size signed to indicate overlapping loads")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32932>
3DSTATE_CPS_POINTERS is deprecated on PTL, so let's switch to
3DSTATE_COARSE_PIXEL to deliver CPS state as pipelined state.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32737>
This change adds CPS related new state instruction, structure and
enum.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32737>
When a SEND instruction is a EOT, the scoreboard lowering will not
allocate a new SBID for it, since nothing needs to wait for it. In
Gfx12 this allowed the SEND to get out-of-order $.dst or $.src
dependencies.
Starting on Xe2+ this is not supported anymore, in favor of supporting
more combined modes.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32712>
For Xe3+, set preferred SLM and SLM per threadgroup size.
Bspec: 73211
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32872>
It was hard-coded to 64k but Xe2 platforms and newer supports
larger SLM sizes.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Cc: mesa-stable
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32874>
The push_constant_loc[] array is always an identity mapping these days,
so it's kind of pointless. Just use the original uniform number and
skip the unnecessary "remap" step. With that gone, and shrinking UBO
ranges gone, assign_constant_locations() is now empty and can be removed
as well.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>
Now that we never shrink ranges in the backend, we never lower push
constants to pull constants late in the backend either. get_pull_loc
will never return true, and so all of brw_lower_constant_loads becomes
a noop.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>
Back in the bad old days (vec4?) we had a bunch of smarts in the backend
to dead code eliminate unused vector components and re-pack regular
uniforms, so we really couldn't decide how much data we were pushing
until very late in the backend. Nowadays we have none of that - we do
all of our elimination and packing in NIR. anv shrinks ranges to deal
with Vulkan API push constants, and iris treats everything as a UBO and
as of the previous commit will also shrink appropriately.
So we don't need to do this anymore...which will let us simplify quite
a bit of code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>
anv already does this limiting, since it needs to handle non-UBO push
constants as well. iris treats everything as a UBO, but doesn't have
a limiter and was relying on the backend to handle it.
Do this in the NIR pass so that we can eliminate the backend code.
It's not necessary for anv, but handling it here is simple and less
error prone for iris, which calls this in a number of places. We know
we need to limit things to this much; anv can limit more if needed.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>
Fix to handle 16 refs.
v1. handle the case where a slot index is negative.
(Lionel Landwerlin <lionel.g.landwerlin@intel.com>)
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32823>
We'll want to know about the empty push constant for device generated
commands. It's easier if the information is stored in
anv_pipeline_bind_map::push_ranges[].
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32828>
When the runtime is going to potentially emit no dispatch, we need to
have a way to capture a timestamp. Add a new flag for this to tell
whether we don't have a HW instruction to capture the timestamp and
rely on MI_STORE_REGISTER_MEM instead.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: de00fe3f66 ("anv: add BVH building tracking through u_trace")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12382
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32835>
Coverity notices that `err` might be used uninitialized, which is true
as we don't assign the value we want to check! Fix that assignment so
the EXPECT_EQ macro does what we expect.
CID: 1635272
Fixes: 6b931a68c7 ("intel/common: Implement Xe KMD in mi_builder tests")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32849>
If the assert were to fail the memory would leak, which is pretty
harmless in a unit test, but the fix is trivial.
CID: 1635429
Fixes: 6b931a68c7 ("intel/common: Implement Xe KMD in mi_builder tests")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32849>
A negative hole size means the loads overlap. This will be used by drivers
to handle overlapping loads in the callback easily.
Reviewed-by: Mel Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32699>
Same thing as ANISOTROPIC including all restrictions except HW is
allowed to take liberties with precision to speed things up, Currently
only has an affect on formats of type *_sRGB.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32738>
Add new ANISOTROPIC_FAST filter mode value to the Min/MagModeFilter
field.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32738>