Gfx 12.5 struct has only one major difference with gfx9, that is OaCntr lenght,
while on gfx 9 it is 36 uint64_t long on gfx 12.5 it is 38 uint64_t long.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>
We are missing handling for gfx12.5 so to add it we will need a switch case over
verx.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>
Looking at the reference code, there is no new struct for Xe3 so it should
use the same struct as Xe2.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>
With no more users of intel_perf_load_configuration() it can be
removed with other i915 functions around it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>
We have no usage of the information returned by
intel_perf_load_configuration(). It is only used to add a copy of the
configuration so we have the metric id but we could instead get the
metric id from sysfs, that is added by mdapi.
Xe KMD don't have a uAPI to query the metrics configuration, so
using sysfs also fixes the integration of mdapi with Xe KMD.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>
There is no usage for register_config outside of
anv_AcquirePerformanceConfigurationINTEL(), so we don't need to store
it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>
Not only is it questionable for code quality to not call nir_opt_algebraic_late
after nir_opt_algebraic, it also breaks correctness for late lowerings.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39180>
We can have only one of those calls before the 'if GFX_VERx10 >= 125' block.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39362>
Fixes:
dEQP-VK.api.copy_and_blit.dedicated_allocation.resolve_image.whole_copy_before_resolving_transfer.2_bit
Otherwise we attempt to use blorp and hit various asserts later in:
- blorp_copy_supports_blitter
- blorp_xy_block_copy_blt
Fixes: 61287b00f3 ("anv: Stop using RCS companion for MSAA copy/clear on Xe3+")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39346>
Fixes a crash with:
dEQP-VK.api.external.semaphore.opaque_fd.signal_export_import_wait_temporary
when driver calls genX(CmdSetEvent2) -> emit_apply_pipe_flushes with
having NULL in emitted_flush_bits.
Fixes: 8834ef8bcd ("anv: use flushing PIPE_CONTROL for event signaling")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39343>
Allows a shader to have multiple ray queries without spilling them to a shadow
stack. Instead, the driver provides the shader with an array of multiple
RTDispatchGlobals structs to give each query its own dedicated stack.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38778>
Since the move to MEMORY_*_LOGICAL the result value was being ignored, so
change to use that.
Since the conversion to use new registers, some issues were introduced:
- Even with `has_64bit_int` ADD with 64-bit immediate value is not supported;
- `dst_high` was not being filled if there was no overflow;
- Only `dst_low` returned.
Found when writing some new code involving large block loads.
Fixes: b79e85a93f ("brw: always use new registers for load address increments")
Fixes: b55f77161d ("intel/brw: Switch to emitting MEMORY_*_LOGICAL opcodes")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39282>
The previous approach does ensure that all entries are zero'd, but that
may not be clear to the reader (i.e., me). Using `{ 0 }` is clearer.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39245>
This gives us the infrastructure that allows us to slowly migrate
pieces of blorp shaders from NIR to OpenCL, which, IMHO, are much
easier to read. We can't fully migrate everything due to all the
conditional building we do with these shaders, but I'm sure we'll find
opportunities to replace some NIR with OpenCL eventually.
The conversion of blorp_check_in_bounds() serves as the first example.
I also plan to have the shaders from the new indirect copy extension
be OpenCL shaders (mixed with some NIR as well), so having this patch
merged now will reduce the diff for the extension later.
Thanks to Alyssa Rosenzweig for her help here.
v2:
- Use SPDX (Alyssa).
- Use nir_trim_vector() (Alyssa).
- Adjust CL variable declaration (Alyssa).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39046>
We have two distinct code paths sharing blorp_params->wm_inputs for
different purposes: the code from blorp_blit.c and the code from
blorp_clear.c. While blorp_blit.c uses most of the parameters (all
except clear_color), blorp_clear.c only uses clear_color and
bounds_rect. Split the parameters in two structs: one for blits and
the other for clears.
This not only helps save some space in the shader inputs, but it also
organizes things so it's more clear which parameters are used by what.
In addition, my plan is to later add struct blorp_wm_inputs_indirect,
which won't share anything that the others use, and would otherwise
grow the struct even more.
This change would reduce the size of struct blorp_wm_inputs from 96 to
80, but we have to add padding due to the assertion that compares it
to cs_prog_data->push.cross_thread.size. Still good, though.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39046>
Because blorp_params_get_clear_kernel() calls
blorp_params_get_clear_kernel_cs(), which reads params->num_samples,
which we have not properly set yet at this point.
I am also planning to have the functions that create the shader to
rely on params.op, which we have not set yet either.
I found this by inspection (when writing another patch), I'm not sure
if this fixes something relevant, but it may be relevant to ver >= 30
multi-sampled cases.
Fixes: de0c547448 ("blorp: Handle 2D MSAA array image copies on compute shader")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39046>
When I first looked at this struct, my tiny little brain felt
overwhelmed.
- Add some white spaces in order to group the parameters into
"logical" groups so it's easier to reason about everything.
- Change the parameter order just a little bit - without breaking the
logical groups - so the struct size decreases by 1.7% to 1864 bytes.
- Add a comment explaining what the void * pointers point to.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39046>
If we ever add more entries, things won't explode.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39046>
I'm sorry, but I have OCD and the rest of the file is nicely aligned.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39046>
On Broadwell, using the debug mode, you can't create even a single
VkImage:
createimage: ../../src/util/u_math.h:829: util_is_aligned: Assertion `(a != 0) && ((a & (a - 1)) == 0)' failed.
Thread 1 "createimage" received signal SIGABRT, Aborted.
Download failed: Invalid argument. Continuing without source file ./nptl/./nptl/pthread_kill.c.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
warning: 44 ./nptl/pthread_kill.c: No such file or directory
(gdb) bt
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1 0x00007ffff789573f in __pthread_kill_internal (threadid=<optimized out>, signo=6) at ./nptl/pthread_kill.c:89
#2 0x00007ffff7840462 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007ffff78284ac in __GI_abort () at ./stdlib/abort.c:77
#4 0x00007ffff7828420 in __assert_fail_base (fmt=<optimized out>, assertion=<optimized out>, file=<optimized out>, line=829, function=<optimized out>) at ./assert/assert.c:118
#5 0x00007ffff5a5fb0c in util_is_aligned (n=0, a=0) at ../../src/util/u_math.h:829
#6 0x00007ffff5a6060d in memory_range_end (memory_range=...) at ../../src/intel/vulkan_hasvk/anv_image.c:51
#7 0x00007ffff5a61c52 in check_memory_range_s (p=0x7fffffffd800) at ../../src/intel/vulkan_hasvk/anv_image.c:779
#8 0x00007ffff5a61ef3 in check_memory_bindings (device=0x555555654d50, image=0x55555566e050) at ../../src/intel/vulkan_hasvk/anv_image.c:830
#9 0x00007ffff5a62ea3 in anv_image_init (device=0x555555654d50, image=0x55555566e050, create_info=0x7fffffffd9d0) at ../../src/intel/vulkan_hasvk/anv_image.c:1263
#10 0x00007ffff5a63147 in anv_image_init_from_create_info (device=0x555555654d50, image=0x55555566e050, pCreateInfo=0x7fffffffda80) at ../../src/intel/vulkan_hasvk/anv_image.c:1333
#11 0x00007ffff5a63211 in anv_CreateImage (_device=0x555555654d50, pCreateInfo=0x7fffffffda80, pAllocator=0x0, pImage=0x7fffffffdd20) at ../../src/intel/vulkan_hasvk/anv_image.c:1356
#12 0x00007ffff44ff376 in vvl::dispatch::Device::CreateImage (this=0x55555562c480, device=0x555555654d50, pCreateInfo=0x7fffffffdcb8, pAllocator=0x0, pImage=0x7fffffffdd20)
at ./layers/vulkan/generated/dispatch_object.cpp:1160
#13 0x00007ffff43e8214 in vulkan_layer_chassis::CreateImage (device=0x555555654d50, pCreateInfo=0x7fffffffdcb8, pAllocator=0x0, pImage=0x7fffffffdd20) at ./layers/vulkan/generated/chassis.cpp:2181
#14 0x0000555555560af4 in vks::Image::init (this=0x7fffffffdcb0) at /home/przanoni/git/random-stuff/vk/vks/libvulkanscript.hpp:1298
#15 0x000055555556557d in main () at createimage.cpp:36
Since we haven't noticed this issue as quickly as I imagined we would,
let's opt for what's mostly a revert of the behavior change in the
original commit.
Fixes: 7be63ef956 ("intel: do not NIH util_is_aligned")
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39045>
Each group of 16 lanes inside a SIMD32 shader will load different globals.
In SIMD8/16 shaders, the divergence analysis will turn this load into
nir_load_global_constant_uniform_block_intel.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36181>
If it wasn't for the workaround, it wouldn't be necessary to track the
whether instructions are exec_all or not. The workaround affects
results when mixing a dep and inst with different exec_all.
Add the predicate so that, when the workaround is disabled, none of
the effects of having different exec_all will kick in, all them will
be considered `exec_all = true`.
This patch don't change any behavior, just adds the predicate.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36659>
nr_params & params array are gone.
brw_ubo_range is not stored on the prog_data structure anymore (Anv
already stored a copy of that with its own additional information)
The backend now only deals with load_push_data_intel. load_uniform &
load_push_constant have to be lowered by the driver.
Pre Gfx12.5 platforms have to provide a subgroup_id_param to specify
where the subgroup_id value is located in the push constants.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
Anv already manages this itself. This allows removing the logic from
the compiler.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
Drivers can do all the lowering to push constants to find the only
value useful in that array (subgroup_id). Then drivers call into
brw_cs_fill_push_const_info() to get the cross/per thread constant
layout computed in the prog_data.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
The way we build our ranges, the first empty one is the end of the
ranges.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
The current code walks the instructions, and when needed,
it will scan to find the next "end of scope" and sometimes
the next "end of block". It also has a separate patching
logic for HALTs.
The new code collects the necessary scope information up front,
then walks the instruction backwards, making avoiding the need
to scan for the end of scope. It will also walk only the
relevant instructions that were previously collected. It also
replaces the previous HALT-specific patching logic.
With this new change, many cases that were jumping to
intermediate HALTs, will now jump straight to the end of
scope (or the "end of the program" section). E.g. in
```
if
...
(...) HALT
...
(...) HALT
endif
```
both HALTs now will jump to the end of the scope, instead of the
first HALT jumping into the second one.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38914>