Use it instead of hard-coding f0.1 for the sample mask of programs
that use discard. This will make the task easier when we replace f0.1
with another flag register location in order to support discard with
SIMD32 shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's only really useful there. This will avoid confusion with another
helper with a similar purpose I'm about to introduce that will be
useful in multiple files from the FS back-end.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The SIMD8 dual-source blending framebuffer write messages seem to have
trouble releasing the pixel scoreboard dependency in SIMD32 dispatch
mode, which leads to hangs. I have a better workaround for this which
doesn't involve disabling SIMD32 when dual-source blending is enabled,
but I'm still investigating some issues with it. Limit the dispatch
width to SIMD16 in such cases for the moment in order to make the CI
happy on ICL with SIMD32 fragment shaders enabled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently the "Source0 Alpha Present to RenderTarget" bit of the RT
write message header is derived from brw_wm_prog_data::replicate_alpha.
However the src0_alpha payload is provided anytime it's specified to
the logical message. This could theoretically lead to an
inconsistency if somebody provided a src0_alpha value while
brw_wm_prog_data::replicate_alpha was false, as I'm planning to do in
a future commit in order to implement a hardware workaround.
Instead calculate the header bit based on whether a src0_alpha value
was provided to the logical message, which guarantees the same
behavior on pre-ICL and ICL+ (the latter used an extended descriptor
bit for this which didn't suffer from the same issue). Remove the
brw_wm_prog_data::replicate_alpha flag.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Together with the fixup_nomask_control_flow() pass introduced in a
previous patch, this implements a less invasive alternative to the
workaround documented in the hardware spec for GEN:BUG:1407528679,
which doesn't involve disabling structured control flow.
Under some conditions Gen12 hardware can end up executing a BB with
all channels disabled, which will lead to the execution of any NoMask
instructions in it, even though any execution-masked instructions will
be correctly shot down. This could break assumptions of the SWSB pass
if the data computed by a NoMask instruction is synchronized against
by using an SWSB annotation baked into a regular execution-masked
instruction, since the first (NoMask) instruction may be executed
redundantly by the hardware, even though the second will correctly be
shot down, potentially leading to a RaW or WaW hazard if a third
instruction subsequently accesses the destination register of the
first instruction.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
Found by inspection. Existing code was trying to avoid assuming that
an SBID had been assigned to the virtual instruction, but
synchronizing the header setup with respect to the previous SIMD16
SEND by using SYNC.ALLRD doesn't really seem possible unless the SEND
instruction had been assigned an SBID. Assert-fail instead if no SBID
has been allocated.
Fixes: 15e3a0d9d2 "intel/eu/gen12: Set SWSB annotations in hand-crafted assembly."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
This is a less invasive alternative to the workaround documented in
the hardware spec for GEN:BUG:1407528679, which doesn't involve
disabling structured control flow (it's unlikely that switching to
GOTO/JOIN would have actually fixed the problem anyway).
Under some conditions Gen12 hardware can end up executing a BB with
all channels disabled, which will lead to the execution of any NoMask
instructions in it, even though any execution-masked instructions will
be correctly shot down. This may break assumptions of some NoMask
SEND messages whose descriptor depends on data generated by live
invocations of the shader.
This avoids the problem by predicating certain instructions on an ANY
horizontal predicate that makes sure that their execution is omitted
when all channels of the program are disabled. The shader-db impact
of this patch seems to be minimal:
total instructions in shared programs: 17169833 -> 17169913 (0.00%)
instructions in affected programs: 30663 -> 30743 (0.26%)
helped: 0
HURT: 42
total cycles in shared programs: 336966176 -> 336968568 (0.00%)
cycles in affected programs: 2367290 -> 2369682 (0.10%)
helped: 0
HURT: 13
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
We need to pass a width of 32 since the opcode bashes the whole f1.0
register on IVB. This is unlikely to have caused problems since f1.0
is largely unused currently. That's likely to change soon though,
even on platforms other than Gen7.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
Found by inspection. This seems particularly likely to cause problems
with instructions dependent on the current execution mask like
SHADER_OPCODE_FIND_LIVE_CHANNEL or the FS_OPCODE_LOAD_LIVE_CHANNELS
instruction I'm about to introduce, but one could imagine it leading
to data corruption if CSE ever managed to combine two instructions
before and after the FS_OPCODE_PLACEHOLDER_HALT, since the one before
may not be executed for some channels.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
"ringbuffer" is now called only "ring" in the error state.
v2: Keep compatible with old error state (Lionel).
v3: Also update "gtt_offset" -> "batch".
Closes: https://gitlab.freedesktop.org/drm/intel/issues/1206
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Gen12 added CCS_E support for A8_UNORM. Intercept A8_UNORM format and
switch to R8_UNORM, as both share the same aux map format encoding so
they are compatible.
Fixes Piglit's ext_framebuffer_multisample-formats all_samples, which
was hitting an assert about A8_UNORM and R8_UINT not being CCS_E
compatible formats.
v2: Add gen check (Kenneth Graunke)
v3: Intercept A8_UNORM and set format to R8_UNORM (Jason Ekstrand)
v4:
- Remove gen check and move block little bit down (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3719>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3719>
6790397346 added code which attempts to reject modifiers on
depth/stencil formats but it was placed after the early return for depth
and stencil aspects. This commit moves it up so it actually works.
Of course, this doesn't actually matter because the only user of any of
the modifiers stuff is the WSI code and it will never do anything with
depth/stencil.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3794>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3794>
This param is only available starting on kernel 4.1. Use a default
value of 0 if it is not found instead.
v2: Update commit message (Lionel)
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Mark Janes <mark.a.janes@intel.com>
Fixes: 96e1c945f2 ("i965: Move device info initialization to common
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3727>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3727>
This was adding "#define reserved 2" to genxml includes, which is a
fairly mean lowercase word to redefine. It ends up breaking the build
on Android, which has __u32 reserved fields in headers.
Defining it also has no purpose. Just drop it.
Fixes: 5bea0cf779 ("intel/isl: Move iris's pipe-to-isl format function to isl.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3729>
This means you can directly use format utils on it without having to have
your own GL enum to number-of-components switch statement (or whatever) in
your vulkan backend.
Thanks to imirkin for fixing up the nouveau driver (and a couple of core
details).
This fixes the computed qualifiers for EXT_shader_image_load_store's
non-integer sizeNxM qualifiers, which we don't have tests for.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v3d)
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>
This will get reused in the shader compiler once we switch it over to pipe
formats instead of GL enums. We can't easily deduplicate i965's
mesa-to-isl mapping because of cases like A32_FLOAT that are mapped
differently.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>
We've been missing this workaround for a while and since it's required
for Gen12, let's implement it for Gen9 first.
v2: Update comment for Gen9.
v3: Fix clearing of bits... (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3405>
This patch fixes the way_size_per_bank for Gen12+
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge<sagar.ghuge@intel.com>
This patch is required to fix 11K+ vulkan CTS failures we were
getting with way_size_per_bank of 4 (see next patch).
Thanks to Sagar Ghuge and Jordan Justen for all the hard work of
debugging and testing.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge<sagar.ghuge@intel.com>
The old code would only break at stride boundaries if the stride was
less than 32B; otherwise it would just break every 32B. This commit
makes it break at stride boundaries and 32B boundaries (starting from
the last stride). This makes reading large vertex buffers in aubinator
much nicer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3642>
Otherwise, the validator tries to read the type of src1 of a SEND/SENDS
which doesn't actually have a type field. This prevents validation
issues in the next commit.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3642>
Previously, i965/iris tried to reuse the currently programmed URB config
if it was good enough for BLORP, rather than reprogramming it each time.
However, this will make some things harder on Gen12+ and we've not seen
any performance impact from emitting URB more frequently in ANV.
This makes the blorp <-> driver interface a bit simpler on Gen7+ because
now all the driver has to do is to provide the L3$ config rather than
trying to hand off URB re-config to blorp.
Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
SML is no longer in the L3$ on Gen11+. It's not incredibly clear from
the docs but no Gen11 platforms are in the list of platforms on which
this bit exists. Also, we've been always setting it false on Gen11 in
ANV and i965 thanks to GEN_L3P_SLM being zero with no ill effects.
Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
According to the BSpec, this should prevent hangs when using shaders
with large URB entries. A more precise fix can be done but it requires
re-arranging URB setup.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
Because softpin block pools are made up of a set of BOs with different
maps, it was possible for a single state to end up straddling blocks.
To fix this, we pass a contiguous size to anv_block_pool_grow and it
ensures that the next allocation in the pool will have at least that
size.
We also add an assert in anv_block_pool_map to ensure we always get
contiguous maps. Prior to the changes to anv_block_pool_grow, the unit
tests failed with this assert. With this patch, the tests pass.
This was causing problems on Gen12 where we allocate the pages for the
AUX table from the dynamic state pool. The first chunk, which gets
allocated very early in the pool's history, is 1MB which was enough that
it was getting multiple BOs. This caused the gen_aux_map code to write
outside of the map and overwrite the instruction state pool buffer which
lead to GPU hangs.
Fixes: 731c4adcf9 "anv/allocator: Add support for non-userptr"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>