Each group of 16 lanes inside a SIMD32 shader will load different globals.
In SIMD8/16 shaders, the divergence analysis will turn this load into
nir_load_global_constant_uniform_block_intel.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36181>
If it wasn't for the workaround, it wouldn't be necessary to track the
whether instructions are exec_all or not. The workaround affects
results when mixing a dep and inst with different exec_all.
Add the predicate so that, when the workaround is disabled, none of
the effects of having different exec_all will kick in, all them will
be considered `exec_all = true`.
This patch don't change any behavior, just adds the predicate.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36659>
nr_params & params array are gone.
brw_ubo_range is not stored on the prog_data structure anymore (Anv
already stored a copy of that with its own additional information)
The backend now only deals with load_push_data_intel. load_uniform &
load_push_constant have to be lowered by the driver.
Pre Gfx12.5 platforms have to provide a subgroup_id_param to specify
where the subgroup_id value is located in the push constants.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
Anv already manages this itself. This allows removing the logic from
the compiler.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
Drivers can do all the lowering to push constants to find the only
value useful in that array (subgroup_id). Then drivers call into
brw_cs_fill_push_const_info() to get the cross/per thread constant
layout computed in the prog_data.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
The way we build our ranges, the first empty one is the end of the
ranges.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
The current code walks the instructions, and when needed,
it will scan to find the next "end of scope" and sometimes
the next "end of block". It also has a separate patching
logic for HALTs.
The new code collects the necessary scope information up front,
then walks the instruction backwards, making avoiding the need
to scan for the end of scope. It will also walk only the
relevant instructions that were previously collected. It also
replaces the previous HALT-specific patching logic.
With this new change, many cases that were jumping to
intermediate HALTs, will now jump straight to the end of
scope (or the "end of the program" section). E.g. in
```
if
...
(...) HALT
...
(...) HALT
endif
```
both HALTs now will jump to the end of the scope, instead of the
first HALT jumping into the second one.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38914>
Most of instructions follow the basic formats (1, 2 and 3 src), so
consolidate their emission code in generator.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38878>
Move validation, noting that LRP only supports BRW_TYPE_F -- the
previous assert had DF because it also was used by MAD in the past.
With that change, ALU3F can be replaced by ALU3 for LRP.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38878>
When repctrl is used, the swizzle/chansel is ignored. Instead of setting
a swizzle that has all zeros and encode that, don't encode anything.
For context see e7598c5a62 ("intel/compiler: Set swizzle to BRW_SWIZZLE_XXXX
for scalar region").
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38878>
Replaying a dump file requires the VM state in order to feed the
replay tool with the necessary VMA properties that described the hang,
however, these properties are not necessarily useful once the replay
tool re-runs said traces, however, this patch makes this optional.
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34829>
The tool now can seamlessly support GPU hang dump files from
either the i915 or the Xe drivers.
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34829>
The changes as part of the Contexts state now include:
**** Contexts ****
[HWCTX].replay_offset: 0x0
[HWCTX].replay_length: 0xd000
and the changes as part of the VM state now include:
**** VM state ****
VM.uapi_flags: 0x1
[40000].length: 0x2000
[40000].properties: read_write|bo|mem_region=0x1|pat_index=2|cpu_caching=1
[40000].data: &-)\3!!E9mzzzzzzzzzz
In order to be able to replay a GPU hang from a devcore dump file
new properties have been added describing the offset and the length
of the affected hw context as well as a global VM flag and
several VMA property types: memory region, bo caching, pat index,
memory permission and memory type.
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34829>
Before bringing support for Xe let's create a lib so that
the common code can live there.
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34829>
initial refactoring of the i915 code in preparation
for Xe. No functional changes.
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34829>
That whole comment about Ivy Bridge is not relevant as ANV don't support IVB.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39175>
There is no side affects for this shadowing but better fix it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39175>
When this flag is set, it gives a hint to KMD to skip some operations around
compressed buffers, like copying the auxiliary buffer to smem during eviction.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38425>
There is no users for that function, is_volatile is only used in
brw_opt_cse.cpp is_expression() but it access the information using brw_send_inst
struct.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39104>
We don't need one bit per bitsize per instruction if only one actually
matters in the end.
First step towards moving NIR in the direction of full float_controls2
only.
Also rename this from fp_fast_math, because that name implied that 0 is
the no fast math mode, while the opposite was the case.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39026>
This patch adds compute_w_to_host_r barrier and also throw QueueWaitIdle
just to make sure all operations are completed before we fetch the BVH
data on the host.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39079>
The entire range of the allocated bo up to bo->size will be used, even if
alloc_size was way less, so to track the growth precisely for load factoring,
the allocated_batch_size should increase by bo->size.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38703>
Although a specific size is requested, the entire range of the returned bo up
to bo->size may end up being used by anv_batch_chain, spamming memcheck errors.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38703>
Decode logic in Gfx12+ has become complex with the new types, so Caio
suggested that we move to the table like other gens.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39007>
These type encodings were first were used in dpas instructions, but
continue to be used in more places.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39007>