This is done to be able to use ISL_AUX_USAGE_CCS_E with images.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4080>
Patch adds a new arg and modifies existing calls from i965, anv
pass NULL but iris stores this information for later use.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4080>
EXPECT_DEATH works by forking the process and letting the forked process
fail with an assertion. This process is evidently incredibly expensive,
taking ~30 seconds to run the whole isl_aux_info_test on a 2.8GHz
Skylake. Annoyingly all of the (expected) assertion failures also leaves
lots of messages in dmesg and potentially generates lots of coredumps.
Instead, avoid the expense of fork/exec by redefining assert() and
unreachable() in the code we're testing to return a unit-test-only
value. With this patch, the test takes ~1ms.
Also, while modifying the EXPECT_EQ() calls, reverse the arguments so
that the expected value comes first, as is intended. Otherwise gtest
failure messages don't make much sense.
Fixes: https://gitlab.freedesktop.org/mesa/mesa/issues/2567
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4174>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4174>
The batch associated with the compute pipeline only needs room for a
MEDIA_VFE_STATE. So this patch moves the batch_data to each pipeline
struct and cap the one in compute pipeline.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4040>
Add two new structs that use the anv_pipeline as base. Changed all
functions that work on a specific pipeline to use the corresponding
struct.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4040>
Explicitly pass the active stages and the array (and size) of shaders
to be processed. This will make easy to store only the shaders needed
for each pipeline.
The active stages can be identified by a non-NULL shader in the
shaders array, so stop using it and keep track of the flushed stages
as iteration happens.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4040>
Pass the `anv_shader_bin *` instead of expecting the helpers to peek
into the pipeline struct. Also reach for the device from the
cmd_buffer instead of the pipeline.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4040>
The caller has this information, so pass directly instead of making
each helper function call figure that one out. Also, since we can
reach the pipeline from pipe_state, drop that parameter from the
function.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4040>
This will be used to decouple the logic flush_descriptor_sets() from
the position in the shader array, allowing us to store just the
shaders needed for each pipeline.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4040>
Avoids waste for pipelines that don't use all the shaders, and is
flexible enough to cover cases where there are multiple variants per
shader (e.g. SIMD8/16/32 for fragment shader).
Even though we could pre-calculate the exact size of the array, this
is not a critical path so it is worth preventing the bug that will
likely happen when new variants are added but not accounted for.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4040>
This will avoid generating multiple identical fences in a row.
For Gen11+ we have multiple types of fences (affecting different
variable modes), but is still better to combine them in a single
scoped barrier so that the translation to backend IR have the option
of dispatching both fences in parallel.
This will clean up redundant barriers from various
dEQP-VK.memory_model.* tests.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3224>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3224>
In ISL, usage flags only carry intent and not semantic meaning. We
don't have a bulletproof way in ISL to specify that an image is of
depth/stencil type. The usage flags are great but blorp, for instance,
loves to disrespect them. One proposed solution to this problem is to
add explicit depth/stencil formats which are distinct from the
corresponding color formats.
Fortunately, however, empirical evidence suggests that this bit only
affects the sampler's interpretation of the CCS data. Therefore, we can
set the bit based off of the aux_usage which is now very specific and
does carry semantic meaning. In particular, aux_usage now makes a
distinction between color CCS and depth/stencil CCS which appears to be
exactly what the DepthStencilResource bit is for.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
Stencil CCS is slightly different from color CCS. Using a color CCS
resolve with stencil CCS doesn't do the right thing and you can't sample
from a stencil CCS image without the DepthStencilResource bit set or you
will get the wrong data. Stencil CCS also has it's own rules such as it
doesn't support fast-clear and has no partial resolve. This seems to
indicate that it should probably be its own isl_aux_usage. Now that
adding new isl_aux_usage values is pretty cheap, let's split stencil CCS
out on its own.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
We also delete the badly named isl_surf_supports_hiz_ccs_wt. The name
is misleading because it doesn't return whether or not the surface
supports HiZ+CCS in write-through mode (any single-sampled HiZ+CCS
capable surface does) but rather a heuristic decision about whether or
not we want to enable write-through mode based on the usage flags in the
isl_surf.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
Previously, we always set the aux_usage to ISL_AUX_USAGE_HIZ_CCS and let
ISL choose write-through based on isl_surf_supports_hiz_ccs_wt. This
commit makes us choose explicitly at surface creation time whether to
use HIZ_CCS or HIZ_CCS_WT based on the same set of conditions. This is
more explicit and should be more robust as it lets us choose WT mode in
one place rather than trusting isl_surf_supports_hiz_ccs_wt to return
the same thing every time.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
This is distinct from ISL_AUX_USAGE_HIZ_CCS in that the HiZ surface
operates in write-through mode which means that the HiZ surface is only
used for depth-testing acceleration and the CCS-compressed main surface
is always valid so we can texture from it.
Separating full HiZ from write-through mode at the isl_aux_usage level
has a couple of advantages:
1. It's more explicit. Instead of write-through mode depending on the
heuristic decision in isl_surf_supports_hiz_ccs_wt, it's now
something that's explicitly requested by the driver. This should be
more robust than hoping isl_surf_supports_hiz_ccs_wt always returns
the same thing every time. If someone (say BLORP) ever drops a
usage flag on the isl_surf, there's a chance it could return a
different value without us noticing leading to corruptions.
2. Because ISL_AUX_USAGE_HIZ_CCS_WT is it's own isl_aux_usage flag, we
can say inside the driver that HIZ_CCS does not support sampling but
HIZ_CCS_WT does. We can also pass HIZ_CCS_WT to isl_surf_fill_state
and it can do some validation for us beyond what we would be able to
do if we conflate HIZ_CCS_WT and CCS_E.
3. In the future, we can add new heuristics to the driver which do
things such as start all depth surfaces (regardless of usage flags)
off in HIZ_CCS and then do a full resolve and drop to HIZ_CCS_WT the
first time it gets used by the sampler. This would potentially let
us enable the faster HIZ_CCS mode even in cases where it technically
comes in through the API as a texture.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
The first check is redundant because the first thing we do in the "emit
the aux surface" section is assert that we actually have an aux_surf.
The second check involves an exclusion list of things which don't have
aux surfaces on Gen12 but an inclusion list is much simpler because it's
just "does it have MCS?".
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
Knowing following:
- CMP writes to flag register the result of
applying cmod to the `src0 - src1`.
After that it stores the same value to dst.
Other instructions first store their result to
dst, and then store cmod(dst) to the flag
register.
- inst is either CMP or MOV
- inst->dst is null
- inst->src[0] overlaps with scan_inst->dst
- inst->src[1] is zero
- scan_inst wrote to a flag register
There can be three possible paths:
- scan_inst is CMP:
Considering that src0 is either 0x0 (false),
or 0xffffffff (true), and src1 is 0x0:
- If inst's cmod is NZ, we can always remove
scan_inst: NZ is invariant for false and true. This
holds even if src0 is NaN: .nz is the only cmod,
that returns true for NaN.
- .g is invariant if src0 has a UD type
- .l is invariant if src0 has a D type
- scan_inst and inst have the same cmod:
If scan_inst is anything than CMP, it already
wrote the appropriate value to the flag register.
- else:
We can change cmod of scan_inst to that of inst,
and remove inst. It is valid as long as we make
sure that no instruction uses the flag register
between scan_inst and inst.
Nine new cmod_propagation unit tests:
- cmp_cmpnz
- cmp_cmpg
- plnnz_cmpnz
- plnnz_cmpz (*)
- plnnz_sel_cmpz
- cmp_cmpg_D
- cmp_cmpg_UD (*)
- cmp_cmpl_D (*)
- cmp_cmpl_UD
(*) this would fail without changes to brw_fs_cmod_propagation.
This fixes optimisation that used to be illegal (see issue #2154)
= Before =
0: linterp.z.f0.0(8) vgrf0:F, g2:F, attr0<0>:F
1: cmp.nz.f0.0(8) null:F, vgrf0:F, 0f
= After =
0: linterp.z.f0.0(8) vgrf0:F, g2:F, attr0<0>:F
Now it is optimised as such (note change of cmod in line 0):
= Before =
0: linterp.z.f0.0(8) vgrf0:F, g2:F, attr0<0>:F
1: cmp.nz.f0.0(8) null:F, vgrf0:F, 0f
= After =
0: linterp.nz.f0.0(8) vgrf0:F, g2:F, attr0<0>:F
No shaderdb changes
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2154
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3348>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3348>
Avoid looping over all VARYING_SLOT_MAX urb_setup array
entries from genX_upload_sbe. Prepare an array indirection
to the active entries of urb_setup already in the compile
step. On upload only walk the active arrays.
v2: Use uint8_t to store the attribute numbers.
v3: Change loop to build up the array indirection.
v4: Rebase.
v5: Style fix.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/308>
The current workaround for this hardware bug involved marking the ADD
instruction used to initialize the address register as NoMask on
Gen12, which was based on the assumption that the problem was caused
by a hardware bug affecting the application of the execution mask to
the address register write.
However that doesn't seem to be the case: The address register write
was working correctly, the real problem leading to hangs on TGL is
that the indirect addressing logic is unable to deal with garbage
values in the address register (e.g. misaligned offsets), even for
channels which are currently inactive due to non-uniform control flow.
The current workaround isn't able to avoid that situation in general,
since the result of the NoMask ADD instruction for a dead channel is
calculated based on the corresponding (dead) component of the
indirect_byte_offset source, which would still be undefined in the
likely case that the source was initialized under control flow itself.
This would lead to hangs whenever MOV_INDIRECT was used under
non-uniform control flow in some scenarios like a tessellation shader
from GFXBench5/gl_4 (AKA Car Chase) on TGL. In addition I've managed
to reproduce the same issue on earlier platforms by initializing the
whole address register with garbage before the ADD instruction, so
this seems to be a long-standing issue we have avoided mostly by luck.
This patch fixes the problem and applies the workaround to all
platforms, since even when the hardware is able to deal with garbage
address values without hanging there might be a significant
performance cost from reading random GRF registers due to the useless
extra EU cycles spent fetching registers for dead channels and due to
the potential for unintended serialization with respect to other
random instructions that could be executed in parallel, which may have
had a cost of the order of hundreds of cycles in the worst case
scenario.
Fixes: f93dfb509c "intel/fs: Write the address register with NoMask for MOV_INDIRECT"
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Passing shader_stats to the fs_generator constructor means that the
SIMD8 shader stats from the visitor (such as the scheduler mode) will be
reported out for the SIMD16/SIMD32 versions as well.
As you can see, we are now passing 'shader_stats' and 'stats' to
generate_code(), which is obviously odd looking. Ian rebased and
committed an old patch of mine which added the shader_stats struct on
July 30 in commit dabb5d4bee (i965/fs: Add a shader_stats struct.) and
shortly after on August 12 Jason added the brw_compile_stats struct in
commit 134607760a (intel/compiler: Fill a compiler statistics struct).
I'd like to combine the two, but I'm not sure how. shader_stats is an
input to generate_code() while brw_compile_stats is an output and is
only used by the Vulkan driver. Leave it as is for now...
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4093>
While we're here, we add an assert that bind_map::push_ranges is tightly
packed. If it isn't, it breaks assumptions in the emit_push_constant*
functions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3777>
The compiler should be smart enough to figure out that it's unused on
Gen11 and earlier and delete the code which calculates. Us adding an
`if (GEN_GEN >= 12)` check is unnecessary and just dirties the code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3777>
The client may enable robustBufferAccess2 via either
pCreateInfo->pEnabledFeatures or via a chained-in
VkPhysicalDeviceFeatures2 struct. We need to parse both.
Fixes: 022e5c7e5a "anv: Implement VK_KHR_get_physical_device_properties2"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3777>