There are a few structures located in the dynamic state heap that
blorp also emits. Instead of repacking them after a blorp operation,
just reemit the old dynamic state heap offset.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28368>
Of the dynamic states we have blorp reemit for each operations, a few
actually never change :
* BLEND_STATE (it looks like it does, but actually for anv no)
* COLOR_CALC_STATE
* CC_VIEWPORT
* SAMPLER_STATE
We add infrastructure here to upload into the driver and retrieve the
state offset later.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28368>
In geometry shaders, calling EmitVertex() makes the contents of all
output variables undefined. We need to rewrite our layer ID and view
index outputs before each EmitVertex() call; assuming they'll preserve
their values is undefined behavior.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28067>
Currently, build_view_index and build_instance_id emit load_instance_id
intrinsics, which want the instance ID coming into the program, which is
the true instance ID multiplied by the view count.
The loop also remaps any load_instance_id in the original program back
to the true instance ID, which is the one coming in divided by the view
count. Because we call build_view_index and build_instance_id as part
of the loop, and emit the new load_instance_id instructions earlier in
the shader, we successfully avoid seeing those.
However, this is a bit fragile as it means you can't call
build_view_index or build_instance_id prior to the loop without
accidentally remapping things we don't intend to. To fix this
fragility, we save off the original instance ID (including the view
count) and directly reference that.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28067>
We were asserting that entry->dst.offset % REG_SIZE == 0, which is
easily tripped by a simple LOAD_PAYLOAD that writes a 16-bit vec2:
load_payload(8) vgrf1:UW, vgrf2+0.0:UW, vgrf3+0.0:UW
We create separate ACP entries corresponding to the values coming from
vgrf2 and vgrf3, with entry->dst set to the location within vgrf1 where
those sources get written to. So the second entry will have offset 16,
which is not REG_SIZE aligned.
It looks like this assert was originally added back in 2014 (see commit
1728e74957) and adjusted through the ages,
including at a point when we combined reg and subreg offsets into a
single byte offset, and over time also extended copy propagation.
Here the destination offset is already accounted for via rel_offset,
at the byte offset level, so things ought to work and there is no need
to assert that this is the case. Ian had already noted that the assert
tripped in commit e3f502e007, but checking
for inst->opcode == MOV here doesn't really make sense - it's just the
case that he found that broke.
Remove the erroneous assertion.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28067>
opt_copy_propagation() can sometimes propagate FIXED_GRF sources into
SHADER_OPCODE_SENDs as the message payload. For example, GS input
reads, which simply take a URB handle and have the offset in the
descriptor. For non-VGRFs, there isn't a payload to split, so just
skip past such send messages.
Fixes: 589b03d02f ("intel/fs: Opportunistically split SEND message payloads")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28067>
If we did clear a query buffer in compute mode, the flushing needs to
match the engine used for clearing.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6823ffe70e ("anv: try to keep the pipeline in GPGPU mode when buffer transfer ops")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28285>
The right one is a few lines below.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 44bf552704 ("anv: allocate border colors for descriptor buffers")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28387>
If entry goes until the last byte of the bo it was being marked as
not valid while it is valid.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28376>
Now that these can come from different releases, with different sets of
patches backported to them, it matters that we use the correct one.
Fixes: 78ea3bb43d ("ci/deqp: use the proper gl/gles releases for deqp-gl*, deqp-gles*, deqp-egl")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28343>
According to 00-mesa-defaults.conf, the only game that seems to care
about fp64_workaround_enabled right now is Doom Eternal. After some
brief testing I couldn't spot any performance difference by setting
shaderFloat64 to true.
We want to set this to true so that DIRT 5 can work, as it looks at
shaderFloat64 and then refuses to launch today.
Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9882
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28213>
This is achieved by the following steps:
#ifndef DEBUG => #if !MESA_DEBUG
defined(DEBUG) => MESA_DEBUG
#ifdef DEBUG => #if MESA_DEBUG
This is done by replace in vscode
excludes
docs,*.rs,addrlib,src/imgui,*.sh,src/intel/vulkan/grl/gpu
These are safe because those files should keep DEBUG macro is already excluded;
and not directly replace DEBUG, as we have some symbols around it.
Use debug or NDEBUG instead of DEBUG in comments when proper
This for reduce the usage of DEBUG,
so it's easier migrating to MESA_DEBUG
These are found when migrating DEBUG to MESA_DEBUG,
these are all comment update, so it's safe
Replace comment /* DEBUG */ and /* !DEBUG */ with proper /* MESA_DEBUG */ or /* !MESA_DEBUG */ manually
DEBUG || !NDEBUG -> MESA_DEBUG || !NDEBUG
!DEBUG && NDEBUG -> !(MESA_DEBUG || !NDEBUG)
Replace the DEBUG present in comment with proper new MESA_DEBUG manually
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28092>
The compiler only references `intel_device_info->subslice_masks` for
ray tracing workloads. Platforms which lack raytracing support can
share a cache even if they differ on this field.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28311>
The 64-bit type lowering for SEL in opt_algebraic had a pre-existing bug
where it only triggered when 64-bit float _and_ integer types were
unsupported. Meteorlake supports 64-bit float but not integer, so we
need to lower Q/UQ in that case still.
When I moved this to a later pass, opt_peephole_sel started generating
Q/UQ SEL instructions which were failing to be lowered.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10867
Fixes: ea423aba1b ("intel/brw: Split out 64-bit lowering from algebraic optimizations")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28328>
This is similar to what RADV implements using the NIR_LOOP_PASS
helpers. I have not used those helpers for a couple of reasons:
1. They use the pointer to the optimization function, which doesn't
work if the same function is called multiple times in one invocation
of the loop (fixable)
2. After fixing them, due to Intel's use of sub-expressions, the amount
of code added to wrap the shared macro becomes more than simply
reimplementing them for the Intel compiler
On most workloads the results are a wash, but on compile heavy
workloads like Cyberpunk 2077 and Rise of the Tomb Raider, I saw
fossil-db runtimes fall by 1-2% on my ICL, with no changes to the
compiled shaders. Caio saw closer to 2.5% on TGL.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27510>
The mlen tracking is in REG_SIZE units, but in Xe2 each GRF has
doubled the size. The optimization can only elide full GRFs, so
round down the amount of trailing zeros to ensure the optimization
will remove only full GRFs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28279>
Folks, there's more than one accumulator. In general, when the
register file is ARF, the upper 4 bits of the register number specify
which ARF, and the lower 4 bits specify which one of that ARF. This
can be further partitioned by the subregister number.
This is already mostly handled correctly for flags register, but lots
of places wanted to check the register number for equality with
BRW_ARF_ACCUMULATOR. If acc1 is ever specified, that won't work.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28281>
Make the filename more descriptive and since the file is used by
multiple drivers, move it into appropriate util/ directory.
Cosmetics:
- use SPDX license tag
- add newline before main function
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27804>
This will be useful to decode HW context in the next patch.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27888>
xe_topic can't be inside of the for loop otherwise it will be set to
TOPIC_INVALID at every iteration.
TOPIC_INVALID was added after it was reviewed by Lionel because CI
complained that xe_topic may be not initialized, turns out leaving it
not initialized was causing the xe_topic value to keep the value set
in the previous interation makeing the parser to work by luck.
Fixes: 90e38bbb3b ("intel/tools/error_decode: Parse Xe KMD error dump file")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27888>
In what is probably the most common case cross of compilation, x86_64
-> x86, it should be possible to build intel-clc for the host machine
and run it. Doing so simplifies the build by not needing to be able to
cross compile half of mesa, and should ease developer and distro strain
for building Intel drivers for x86.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28222>