Commit graph

13874 commits

Author SHA1 Message Date
Konstantin Seurer
cb31b5a958 clc,libcl: Clean up CL includes
This patch does a couple of things to make CL integration with drivers
as seamless as possible:
- We pull in opencl-c.h and opencl-c-base.h to stop relying on system
  headers.
- Parts of libcl.h are moved to new headers that are incomplete CL-safe
  variants of libc headers.
- A couple of util headers are changed to remove now unnecessary
  __OPENCL_VERSION__ guards and make more headers CL safe.
- Drivers now include src/compiler/libcl and use headers like
  macros.h,u_math.h instead of libcl.h.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33576>
2025-04-11 21:27:37 +00:00
Kenneth Graunke
eb1ec9cf8e brw: Don't assert about MAX_VGRF_SIZE in brw_opt_split_virtual_grfs()
This allows us to create temporary VGRFs that are larger than
MAX_VGRF_SIZE(devinfo), which will be split eventually.  They may not
be split on the initial pass, because we may need LOAD_PAYLOAD lowering,
copy propagation, and so on to occur first.  So we allow registers to
exceed that size initially.

The "Register allocation relies on split_virtual_grfs()" assertion in
brw_reg_allocate.cpp still asserts that all VGRFs which reach the
register allocator have been properly split.

One case where this is useful is for vectorizing convergent block loads.
We create temporaries to splat the SIMD1 values out to SIMD(N), which
can lead to some very large temporaries.  However, copy propagation and
so on ultimately eliminate these and they'll get split down to proper
sizes or elided entirely in the end.

(Note: both this and the prior commits from this merge request are
 needed to close the linked issue.)

Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12324
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>
2025-04-11 20:34:51 +00:00
Kenneth Graunke
a45583f078 brw: Use live->max_vgrf_size in pre-RA scheduling
Post-RA scheduling doesn't use liveness analysis, so we continue using
MAX_VGRF_SIZE(devinfo).  But for pre-RA scheduling, we now use
live->max_vgrf_size.

This helps get us to a place where we can emit arbitrarily large VGRFs
early on in compilation, but which will be split and cleaned up prior to
register allocation.  It may also allocate smaller arrays in practice
since MAX_VGRF_SIZE(devinfo) assumes the worst case scenario for things
we actually could need to allocate.

Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>
2025-04-11 20:34:51 +00:00
Kenneth Graunke
4b27b5895c brw: Use live->max_vgrf_size in register coalescing
We already require liveness, so just use the actual maximum size we saw
instead of a hardcoded pessimal size.

Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>
2025-04-11 20:34:51 +00:00
Kenneth Graunke
ea468412f6 brw: Track the largest VGRF size in liveness analysis
We're already looking at this data to calculate the per-component
vars_from_vgrf[] and vgrf_from_vars[] mappings, so just record the
largest VGRF size while we're here.  This will allow passes to size
arrays based on the actual size needed, rather than hardcoding some
fixed size.  In many cases, MAX_VGRF_SIZE(devinfo) is larger than
necessary, because e.g. vec5 sparse sampling results aren't used.
Not hardcoding this means we can also temporarily handle very large
VGRFs which we know will be split eventually, without having to
increase the maximum which is ultimately used for RA classes.

Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461>
2025-04-11 20:34:51 +00:00
José Roberto de Souza
68a617076d intel/perf: Update intel_perf to match xe_drm.h
There was a mismatch between drm-next version of xe_drm.h and the one
in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142.
So this does the necessary changes to build with current and new
xe_drm.h

Fixes: 2a828c35a1 ("intel/perf: add eu stall sampling support")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34457>
2025-04-11 18:35:49 +00:00
Lionel Landwerlin
243c01c703 anv/iris: implement Wa_18040903259
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34433>
2025-04-11 13:54:35 +00:00
Lionel Landwerlin
d123aedfc7 anv: remove ALWAYS_INLINE from globally visible functions
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34433>
2025-04-11 13:54:35 +00:00
Lionel Landwerlin
bcaf08b47c intel/dev: remove ADLN references
Not used anymore, just use the existing ADL definitions.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34433>
2025-04-11 13:54:35 +00:00
Lionel Landwerlin
938f79ed82 anv: update Wa_1607156449 to use WA infrastructure
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34433>
2025-04-11 13:54:35 +00:00
Valentine Burley
b49eaf0966 ci/lava: Consolidate piglit trace job definitions
Clean up LAVA job definitions.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34424>
2025-04-11 07:05:07 +00:00
Valentine Burley
1aeedddbb6 ci/piglit: Drop redundant PIGLIT_PROFILES variable
PIGLIT_PROFILES was only used with the piglit-runner.sh script, which no
jobs were using anymore.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34424>
2025-04-11 07:05:06 +00:00
Valentine Burley
09f86df938 intel/ci: Convert iris-kbl-piglit to deqp-runner suite
This was the last job using the piglit-runner.sh script.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34424>
2025-04-11 07:05:06 +00:00
Lionel Landwerlin
06ad9a25e5 brw: fix Wa_22013689345 emission
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
2 problems :
  - not detecting null destination correctly
  - applied too late using SHADER_OPCODE_MEMORY_FENCE, when lowering
    already happened

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34319>
2025-04-10 16:44:28 +00:00
Lionel Landwerlin
e321c438dc anv: fix self dependency computation
Some upcoming changes in the runtime will make it impossible to rely
on the pipeline or runtime information to know whether a fragment
shader has input attachments.

Instead we gather that information at compile time and store it in our
shader bind_map.

At runtime we check whether the fragment shader has input attachments
and whether those map to the runtime depth/stencil input attachments
to set the 3DSTATE_PS_EXTRA::PixelShaderKillsPixel.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d2f7b6d5a7 ("anv: implement VK_KHR_dynamic_rendering_local_read")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32540>
2025-04-10 13:17:53 +00:00
Paulo Zanoni
fdbdfaed01 anv: add ANV_SYS_MEM_LIMIT for debugging system memory restrictions
If you suspect a workload is failing because it needs more memory, you
can set ANV_SYS_MEM_LIMIT=100 to give it all the memory available.
This could make, for example, certain games start working (it really
depends on how much RAM you have and how much the game wants).

If you suspect a workload is too resource hungry, you can try to limit
it with ANV_SYS_MEM_LIMIT=30 (or some other value) to see if it can
deal with the more restricted environment and behave accordingly.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28513>
2025-04-09 22:48:18 +00:00
Paulo Zanoni
ec4b2ce664 anv: restore the old behavior of up to 75% of RAM for the system heap
"We paid for sixteen gigs of RAM, so we gonna use the whole damn
   sixteen gigs of RAM!"
     - My Mom

First, some history:

The Anv 50%-or-75% rule was originally added in 2017 by 060a6434ec
("anv: Advertise larger heap sizes"). When i915.ko started reporting
memory sizes in its ioctls, it didn't impose any restrictions: 100% of
SRAM was reported as available, so the restriction was in Mesa. When
xe.ko was introduced, it only reported 50% of the SRAM as available
through its ioctls, so commit b571ae6e7a ("intel: Make memory heaps
consistent between KMDs") adapted the code to not take an extra 25% of
the 50% that was already cut, and restricted i915.ko to 50% instead of
the 50%-or-75%. In Kernel commit d2d5f6d57884 ("drm/xe: Increase the
XE_PL_TT watermark"), xe.ko changed to reporting 100% of SRAM through
its ioctls, so we adapted Mesa to do the right thing depending on
which Kernel version was running.

While this was all happening, we were discussing about which behavior
was actually the best: restrict everything to 50% in order to avoid
issues when many things are running in parallel, or keep the
restriction only at 75% in order to allow high demanding workloads to
make full use of the hardware.

The way I see, if parallel applications are causing the system to run
out of resources, the user always has the option to kill applications
and use one thing at a time. On the other hand, if a single
application needs more than 50% of the SRAM and we don't allow it in
our heaps, the application will never work (unless, of course, the
user patches Mesa). So in this commit we go back to allowing
high-demanding applications to work by restoring the 50%-or-75% rule.

This commit is especially useful in systems with integrated graphics,
like LNL, where the option to upgrade RAM is not present.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28513>
2025-04-09 22:48:18 +00:00
Paulo Zanoni
02e896bc49 anv/xe: detect the newer xe.ko memory reporting model and act accordingly
Kernel commit d2d5f6d57884 ("drm/xe: Increase the XE_PL_TT watermark")
changed how xe.ko reportes memory: its ioctls now report 100% of the
system RAM as available. Since our policy is to report 50% of the SRAM
as available for the heaps, add some code to check the amount reported
by xe.ko against the amount reported by the system, then act
accordingly.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28513>
2025-04-09 22:48:18 +00:00
Paulo Zanoni
3db8931d4a intel/i915: restrict the RAM size restrictions to Anv
Before commit b571ae6e7a ("intel: Make memory heaps consistent
between KMDs"), we had the following policy for reporting Sytem RAM
memory sizes:

- For OpenGL, we reported the total available RAM.
- For Vulkan, we reported the total available RAM as:
  - 50% of the total RAM if the total RAM was <= 4GB,
  - 75% otherwise
  - In addition, the Memory Budget (for VK_EXT_memory_budget) is 90%
    of the "free" memory, which can be an extra 10% off of the 50% or
    75%.

When xe.ko was added, one key difference was noted: while i915.ko
reported the "real" RAM memory sizes in its ioctls, xe.ko reported
only 50% of the system RAM as available. Because of that (and other
reasons, see this discussion on MR 28513), commit b571ae6e7a decided
to unify the behavior by changing the Anv i915.ko rule to "always 50%"
instead of "50% or 75%". This also changed the Iris rule to 50%
instead of 100%.

In my research, I couldn't find any reason why this restriction should
also apply to Iris, so here we revert back to handling these size
restrictions on Anv only.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28513>
2025-04-09 22:48:18 +00:00
Ian Romanick
cb69d019cf brw/nir: Use offset() for all uses of offs in emit_pixel_interpolater_alu_at_offset
This is necessary to appropriately uniformize the first component
access of a convergent vector. Without this, this is produced:

    load_payload(16) %18:D, 0d, 0d NoMask group0
    add(32) %21:F, %18+0.0:F, 0.5f
    add(32) %22:F, %18+2.0<0>:F, 0.5f

This is the correct code:

    load_payload(16) %18:D, 0d, 0d NoMask group0
    add(32) %21:F, %18+0.0<0>:F, 0.5f
    add(32) %22:F, %18+2.0<0>:F, 0.5f

Without 38b58e286f, the code generated was more incorrect, but happened
to work for this test case:

    load_payload(16) %18:D, 0d, 0d NoMask group0
    add(32) %21:F, %18+0.0<0>:F, 0.5f
    add(32) %22:F, %18+0.4<0>:F, 0.5f

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 38b58e286f ("brw/nir: Fix source handling of nir_intrinsic_load_barycentric_at_offset")
Closes: #12969
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34427>
2025-04-09 22:21:18 +00:00
Caleb Callaway
64b5ee3001 intel/tools: fix 32b build for EU stall tool
Fixes: 610ad8d3 ("intel/tools: create intel_monitor for sampling eu stalls")

Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34439>
2025-04-09 21:40:46 +00:00
Caio Oliveira
7457c4ecfd brw: Make brw_range use half-open ranges
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>
2025-04-09 19:06:49 +00:00
Caio Oliveira
6509f8139d brw: Use brw_range::last() to explicit get the last valid IP
This is a preparation to change what is stored in brw_range::end.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>
2025-04-09 19:06:49 +00:00
Caio Oliveira
596bbb2c95 brw: Use brw_range to store Vars ranges
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>
2025-04-09 19:06:49 +00:00
Caio Oliveira
0b4a3c0ff6 brw: Use brw_range to store VGRF ranges
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>
2025-04-09 19:06:49 +00:00
Caio Oliveira
e644b42e59 brw: Use brw_range when operating with live ranges
Makes the intention of some comparisons clearer by using the named
helper functions.  Add commentary when the straightforward range is not
the one used, e.g. VGRF interference.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>
2025-04-09 19:06:49 +00:00
Caio Oliveira
f56a5cf1eb brw: Use brw_range in IP ranges analysis
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>
2025-04-09 19:06:49 +00:00
Caio Oliveira
fb50461220 brw: Add brw_range struct
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>
2025-04-09 19:06:48 +00:00
Caio Oliveira
8d9155e34d brw: Clean up saturate propagation after non-defs version removal
Remove now unused analysis and no need to walk blocks in reverse
after the non-defs version of the pass was removed.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>
2025-04-09 19:06:48 +00:00
Caio Oliveira
cfc4067b0e brw: Add a few basic tests for register coalesce
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>
2025-04-09 19:06:48 +00:00
Tapani Pälli
0750c4c5f1 intel/dev: update mesa_defs.json from internal database
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34430>
2025-04-09 15:44:22 +00:00
Lionel Landwerlin
76096d04bb anv: relax restriction on variable count descriptors
VUID-VkDescriptorSetAllocateInfo-pSetLayouts-09380 says that :

   "If pSetLayouts[i] was created with an element of pBindingFlags
   that includes VK_DESCRIPTOR_BINDING_VARIABLE_DESCRIPTOR_COUNT_BIT,
   and VkDescriptorSetVariableDescriptorCountAllocateInfo is included
   in the pNext chain, and
   VkDescriptorSetVariableDescriptorCountAllocateInfo::descriptorSetCount
   is not zero, then
   VkDescriptorSetVariableDescriptorCountAllocateInfo::pDescriptorCounts[i]
   must be less than or equal to
   VkDescriptorSetLayoutBinding::descriptorCount for the corresponding
   binding used to create pSetLayouts[i]"

But applications like are not following the spec. RADV doesn't apply
that limit and allocates if there is enough space in the pool. Let's
just do the same.

Note that this issue got resolved with a vkd3d-proton change :

a7ac1a7d2f

But since this change is deleting more code than it adds, might as
well go with it.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12185
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32305>
2025-04-09 16:29:21 +03:00
Lionel Landwerlin
19e4dda9a2 brw: fix shuffle with scalar/uniform index
The fixes commit isn't actually the source of the bug but likely the
biggest enabler because it creates scalar values that more easily end
up in the shuffle operations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 1b24612c57 ("brw/nir: Treat load_*_uniform_block_intel as convergent")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12927
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12688
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12570
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12905
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12734
Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34393>
2025-04-08 20:14:11 +00:00
Felix DeGrood
610ad8d378 intel/tools: create intel_monitor for sampling eu stalls
Created stand alone tool for sampling gfx data on regular
intervals. Tool has inner loop that performs sampling every N
useconds. Press any key to end sampling. Results will be dumped
when intel_monitor exits.

First application of intel_monitor will be to collect eu stall
data. Perhaps more applications can be added at a later date.

How to use:
 0. Set sysctl dev.xe.observation_paranoid=0
 1. Clean shader cache and launch gfx INTEL_DEBUG=shaders-lineno.
    Redirect stderr to asm.txt.
 2. When gfx app ready to monitor, begin capturing eustall data by
    launching `intel_monitor -e > eustall.csv` in separate console.
 3  When done collected, close intel_monitor by pressing any key.
 4. Correlate eustall data in eustall.csv with shader instructions in
    asm.txt by matching instruction offsets. Use data to determine which
    instructions are stalling and why.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142>
2025-04-08 19:39:53 +00:00
Felix DeGrood
2a828c35a1 intel/perf: add eu stall sampling support
Xe2+ GPUs have support for eu stall sampling perf debug feature.
This feature allows driver to collect count and reasons for why
EUs are stalled on GPU. Stall data is cross referenced with ip
address within individual shaders so it is possible to know which
instructions in which shaders are generating stalls. This should
be a very useful feature for debugging performance of slow shaders.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142>
2025-04-08 19:39:53 +00:00
Felix DeGrood
d6a379f7a7 intel/perf: remove unnused argument from xe_perf_stream_read_error
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142>
2025-04-08 19:39:53 +00:00
Felix DeGrood
a09ddc3b77 anv: add INTEL_DEBUG=shaders-lineno
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142>
2025-04-08 19:39:53 +00:00
Felix DeGrood
7a3de9e877 intel/brw: support for dumping shader line numbers
Add support for dumping shader asm containing instruction line numbers
matching offsets within instruction state pool buffer. Offsets
should match values collected from eu stall sampling. This is
required for match eu stall data with individual shader instructions.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30142>
2025-04-08 19:39:53 +00:00
Renato Pereyra
7190949927 perfetto/android: align datasource names with tooling expectations
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
A few Android tools are based on/assume the datasource names
gpu.renderstages and gpu.counters. It is less effort to align with that
naming for Android builds than to chase down those tools and fix them,
not to mention account for new tools that may be created in the future.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34330>
2025-04-08 18:29:10 +00:00
Faith Ekstrand
436f175187 intel/compiler: Use nir_split_conversions()
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34266>
2025-04-07 17:45:21 -05:00
Caio Oliveira
bf9ad36f2d brw: Properly handle cooperative matrices created with constants
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Expand constant sources to cover the region read by DPAS, and also
use NULL register as accumulator when possible.

Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34373>
2025-04-07 14:27:43 -07:00
Collabora's Gfx CI Team
fcf19bf335 Uprev ANGLE to 3818d37d5e94317f01810053b8f28c1f1e8b98e6
1b34d2a18a...3818d37d5e

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34378>
2025-04-07 18:16:00 +00:00
Ian Romanick
f33faa4648 brw/nir: Allow b2f(not(X)) optimization on Gfx12.5+
Since there are no type conversions, no restrictions are violated.

No shader-db or fossil-db changes on any Gfx12 or older Intel
platforms.

shader-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 16956077 -> 16944933 (-0.07%)
instructions in affected programs: 1957573 -> 1946429 (-0.57%)
helped: 4629 / HURT: 35

total cycles in shared programs: 915668518 -> 915684808 (<.01%)
cycles in affected programs: 341925598 -> 341941888 (<.01%)
helped: 3040 / HURT: 1305
helped stats (abs) min: 2 max: 23034 x̄: 205.36 x̃: 16
helped stats (rel) min: <.01% max: 41.21% x̄: 1.28% x̃: 0.48%
HURT stats (abs)   min: 2 max: 68820 x̄: 490.88 x̃: 22
HURT stats (rel)   min: <.01% max: 103.69% x̄: 2.29% x̃: 0.37%
95% mean confidence interval for cycles value: -50.28 57.78
95% mean confidence interval for cycles %-change: -0.35% -0.07%
Inconclusive result (value mean confidence interval includes 0).

LOST:   40
GAINED: 42

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 209828027 -> 209790349 (-0.02%); split: -0.03%, +0.01%
Cycle count: 30504938008 -> 30514045408 (+0.03%); split: -0.06%, +0.09%
Spill count: 512182 -> 512168 (-0.00%)
Fill count: 623432 -> 623426 (-0.00%); split: -0.00%, +0.00%
Max live registers: 65465029 -> 65464959 (-0.00%)

Totals from 57895 (8.19% of 706589) affected shaders:
Instrs: 50144907 -> 50107229 (-0.08%); split: -0.11%, +0.03%
Cycle count: 7549692606 -> 7558800006 (+0.12%); split: -0.25%, +0.37%
Spill count: 58834 -> 58820 (-0.02%)
Fill count: 102324 -> 102318 (-0.01%); split: -0.01%, +0.01%
Max live registers: 9129045 -> 9128975 (-0.00%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>
2025-04-07 17:42:05 +00:00
Ian Romanick
853ead2073 brw/nir: Optimize b2f(not(X)) using logical operations instead of arithmetic
Funny story... this is how regular b2f was implemented before Curro
implmented the `MOV dst:F -src:D` method 9 years ago (see
3ee2daf23d).

Eliminating the type conversion in the arithmetic operation enables the
next commit.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>
2025-04-07 17:42:05 +00:00
Ian Romanick
3d23496fd9 brw/copy: Copy prop -X into Y&1
This commit prevents code quality regressions in the next
commit. Without this, some fragment shaders in Batman: Arkham Origins
have code like:

    shr(8)          g51<1>UW        g1.28<1,8,0>UB  0x76543210V
    ...
    and(8)          g52<1>UD        ~g51<8,8,1>UW   0x0001UW
    ...
    add(8)          g56<1>D         -g52<8,8,1>D    1D

transformed to

    shr(8)          g51<1>UW        g1.28<1,8,0>UB  0x76543210V
    ...
    and(8)          g52<1>UD        ~g51<8,8,1>UW   0x0001UW
    ...
    mov(8)          g56<1>D         -g52<8,8,1>D
    ...
    and(8)          g57<1>UD        ~g56<8,8,1>D    0x00000001UD

Propagating through the negation allows the added MOV to be deleted.

shader-db:

All Intel platforms had simlar results. (Lunar Lake shown)
total instructions in shared programs: 16968020 -> 16968019 (<.01%)
instructions in affected programs: 281 -> 280 (-0.36%)
helped: 1 / HURT: 0

total cycles in shared programs: 914598850 -> 914598832 (<.01%)
cycles in affected programs: 5398 -> 5380 (-0.33%)
helped: 1 / HURT: 0

A single Blender vertex shader was affected.

fossil-db:

Lunar Lake, Tiger Lake, Ice Lake, and Skylake had similar results. (Lunar Lake shown)
Totals:
Instrs: 209894650 -> 209894651 (+0.00%)
Cycle count: 30545958586 -> 30545952860 (-0.00%)

Totals from 2 (0.00% of 706657) affected shaders:
Instrs: 3582 -> 3583 (+0.03%)
Cycle count: 1875100 -> 1869374 (-0.31%)

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Subgroup size: 9906400 -> 9906416 (+0.00%)

Totals from 2 (0.00% of 805770) affected shaders:
Subgroup size: 16 -> 32 (+100.00%)

Two compute shaders in Hogwarts Legacy were affected.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>
2025-04-07 17:42:05 +00:00
Ian Romanick
e82464e6e0 brw/copy: Refactor source modifier type checking
This simplifies the next commit.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33931>
2025-04-07 17:42:05 +00:00
Ian Romanick
dee49f4206 brw/algebraic: Optimize derivative of convergent value
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This is mostly defensive. If a convergent value ever ended up as a
source of a DDX or DDY, the eu_emit code will ignore the stride. This
will result in bad code being generated.

No shader-db or fossil-db changes on any Intel platform.

v2: DDX and DDY will always be float, but brw_imm_for_type only works
with integer types.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Suggested-by: Ken
Fixes: d5d7ae22ae ("brw/nir: Fix up handling of sources that might be convergent vectors")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33007>
2025-04-07 17:16:34 +00:00
Ian Romanick
5656682344 brw/nir: Eliminate default parameter to get_nir_src
The vast majority of the callers want channel = 0. During the
development process, using this default parameter value saved a lot of
pain in rebasing. However, it seems to be more trouble than it's worth.

Issue #12464 occurred because LNL was merged while this code was in
review. As a result, one caller of get_nir_src that wanted channel = -1
was not inspected closely, and it got the default channel = 0 instead.

To prevent this happening in the future (with possible branches still
yet to be merged, for example), remove the default parameter. This will
force the inspection of any callers that don't have an explicit channel
parameter. Hopefully that will prevent more problems.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33007>
2025-04-07 17:16:34 +00:00
Ian Romanick
38b58e286f brw/nir: Fix source handling of nir_intrinsic_load_barycentric_at_offset
The source of nir_intrinsic_load_barycentric_at_offset is a vector, so
-1 should be passed to get_nir_src. This is also done for texture
sampling intrinsics.

I skimmed the other user of get_nir_src, and I believe they are
correct. This one was just missed as LNL support landed an many, many
rebases of the original MR occurred.

v2: Fix another get_nir_src call. Suggested by Lionel.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: d5d7ae22ae ("brw/nir: Fix up handling of sources that might be convergent vectors")
Closes: #12464
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33007>
2025-04-07 17:16:34 +00:00
Caio Oliveira
6a55581d41 intel/executor: Fix check for open() failure
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes: 71ae31dbd8 ("intel/executor: Allow selecting a device to use")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34400>
2025-04-06 19:43:51 -07:00