The key change is to use a builder to write the expected shader result
and compare that. To make this less error prone, a few helper functions
were added
- a way to allocate VGRFs from both shaders in parallel, that way the
same brw_reg can be used in both of them;
- assertions that a pass will make progress or not, and proper output
when the unexpected happens;
- use a common brw_shader_pass_test class so to collect some of the helpers;
- make some helpers work directly with builder.
The idea is to improve the signal in tests, so that the disasm comments
are not necessary anymore. For example
```
TEST_F(saturate_propagation_test, basic)
{
brw_reg dst1 = bld.vgrf(BRW_TYPE_F);
brw_reg src0 = bld.vgrf(BRW_TYPE_F);
brw_reg src1 = bld.vgrf(BRW_TYPE_F);
brw_reg dst0 = bld.ADD(src0, src1);
set_saturate(true, bld.MOV(dst1, dst0));
/* = Before =
*
* 0: add(16) dst0 src0 src1
* 1: mov.sat(16) dst1 dst0
*
* = After =
* 0: add.sat(16) dst0 src0 src1
* 1: mov(16) dst1 dst0
*/
brw_calculate_cfg(*v);
bblock_t *block0 = v->cfg->blocks[0];
EXPECT_EQ(0, block0->start_ip);
EXPECT_EQ(1, block0->end_ip);
EXPECT_TRUE(saturate_propagation(v));
EXPECT_EQ(0, block0->start_ip);
EXPECT_EQ(1, block0->end_ip);
EXPECT_EQ(BRW_OPCODE_ADD, instruction(block0, 0)->opcode);
EXPECT_TRUE(instruction(block0, 0)->saturate);
EXPECT_EQ(BRW_OPCODE_MOV, instruction(block0, 1)->opcode);
EXPECT_FALSE(instruction(block0, 1)->saturate);
}
```
becomes
```
TEST_F(saturate_propagation_test, basic)
{
brw_builder bld = make_shader(MESA_SHADER_FRAGMENT, 16);
brw_builder exp = make_shader(MESA_SHADER_FRAGMENT, 16);
brw_reg dst0 = vgrf(bld, exp, BRW_TYPE_F);
brw_reg dst1 = vgrf(bld, exp, BRW_TYPE_F);
brw_reg src0 = vgrf(bld, exp, BRW_TYPE_F);
brw_reg src1 = vgrf(bld, exp, BRW_TYPE_F);
bld.ADD(dst0, src0, src1);
bld.MOV(dst1, dst0)->saturate = true;
EXPECT_PROGRESS(brw_opt_saturate_propagation, bld);
exp.ADD(dst0, src0, src1)->saturate = true;
exp.MOV(dst1, dst0);
EXPECT_SHADERS_MATCH(bld, exp);
}
```
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33936>
The problem occurs with a series of instructions build the subgroup
invocation value :
mov(8) g23<1>UW 0x76543210V
add(8) g23.8<1>UW g23<8,8,1>UW 0x0008UW
add(16) g23.16<1>UW g23<16,16,1>UW 0x0010UW
Our register spilling code operates on physical registers (64B on
Xe2+) and using the brw_inst::is_partial_write() helper only considers
32B registers. So the spiller doesn't see that the add(16) instruction
is doing a partial write and ends up discarding the previous value.
You can reproduce the issue by running a test like :
INTEL_DEBUG=spill_fs ./deqp-vk -n dEQP-VK.compute.pipeline.cooperative_matrix.khr_a.subgroupscope.constant.uint8_uint8.buffer.rowmajor.linear
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: aa494cbacf ("brw: align spilling offsets to physical register sizes")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33642>
With all the RT-enabled driver setting this field, we can now have the
runtime use it instead of calling into the driver's vfunc.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34026>
The premise of the change was wrong: the case where the defs analysis
was required was rare and requiring the analysis inside just the
case we care was being used for another analysis too. So for now,
the change doesn't really helps. I'll revisit this whole pass later on.
This backs out commit 6e19215810.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34039>
Instead of calling `require()` every instruction, call it once per pass.
Even though the defs are cached (i.e. we are not re-calculating them every
instruction), this prevents the extra check and the call to analysis
validation.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34010>
- brw_lower_3src_null_dest: Allocating a new destination, so include
INSTRUCTION_DATA_FLOW class.
- brw_lower_alu_restriction: Removing instruction, so include
INSTRUCTION_IDENTITY. No details are changed so remove
INSTRUCTION_DETAIL.
- brw_lower_vgrfs_to_fixed_grfs: Changing source and destination
numbers, so include INSTRUCTION_DETAIL.
- brw_lower_send_gather: Insert new instructions (scalar register) and
change sources and other information on existing ones. So include
INSTRUCTION_DETAIL and INSTRUCTION_IDENTITY. Promote to INSTRUCTIONS.
- brw_opt_eliminate_find_live_channel: Can change source, so include
INSTRUCTION_DATA_FLOW.
- brw_opt_copy_propagation_defs and brw_opt_cse_defs: Both can remove
instructions, so include INSTRUCTION_IDENTITY. Promote to
INSTRUCTIONS.
- brw_opt_saturate_propagation: Instruction can have `sat` modified,
and operands can have type modified, so include INSTRUCTION_DETAIL.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33993>
This simplifies the propagation of the protection value, we just set
it on buffer->address at creation time and forget about it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33909>
In 4064b5546b, the idea was to have the minimum value as if all
stages are active, however hwconfig does not follow that for the
tessellation control stage. Ignore min values from hwconfig.
Fixes: 4064b5546b ("intel/dev: reduce warning noise from urb settings");
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33554>
This workaround is similar to anv_assume_full_subgroups, but it applies
to the shaders that use shared memory. If they rely on the implicit
synchronization, and we choose a smaller group size than the
(broken) shader expects, it will produce incorrect results.
Cc: mesa-stable
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23408>
This was only needed on Sandybridge. We can delete the brw code,
and replace the generic devinfo bit with a helper inside the elk
compiler itself.
Thanks to Iván Briano for noticing we still had dead brw code for this.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
We were not using the minimum values from devinfo for anything. For
tessellation control, the minimum value is 0, so we continue taking
MAX2 of that with 1 when tessellation is enabled so we have at least
something guaranteed to be present. For geometry, the minimum value
is already non-zero (and updated by the previous patch).
This will have the side-effect of raising the minimum number of URB
entries for geometry stages. This is currently not known to fix
anything, but should be more closely following the documentation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
We've been programming our minimum number of URB entries for geometry
shaders to 2, but it appears that we should have been setting 8 on
Broadwell and later. Additionally, there's a workaround on Skylake
and later that requires us to add flushing (which we haven't) or use
a minimum of 16 URB entries.
This alone will not fix anything, as nothing reads this devinfo field
presently (will be fixed in the next commit).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
As we added new platforms, the device info macros evolved over time
Most platforms had a "FEATURES" macro, some had a "HW_INFO" macro,
a few had macros for URB entries - some with min entries only, some
with min and max, some including the .urb = { ... } braces, others not.
Thread counts or subslice info was sometimes considered FEATURES,
sometimes HW_INFO, sometimes inserted only in the final structure.
FEATURES macros often inherited from an ancestor platform, but not
necessarily the prior platform - many were based on GFX8_FEATURES.
Many redundantly set the same feature bits as prior platforms.
This patch aims to clean up the situation, so it's a little more
organized, especially if you look at multiple generations. Macros
are now split into several separate pieces:
1. The FEATURES macro only has architectural features, such as LSC,
ray tracing support, 64-bit integers, flat CCS, and so on. Thread
counts, subslice info, and URB sizes that may vary by SKU are not
included here. This makes it easy for one platform to inherit the
features from the previous, while not pulling in that extra data.
2. THREAD_COUNTS macros contain maximum thread counts from the
3DSTATE_VS documentation and so on.
3. URB_MIN_MAX_ENTRIES macros contain the entire URB configuration,
including .urb = { ... }.
4. PAT_ENTRIES macros (on modern platforms) contains our choice of which
PAT entries to use for various types of resources.
5. CONFIG macros combine all of the above into a tidy bundle for use
in defining various structures, and may also include the platform
macro or simulator ID for convenience.
On recent platforms where hwconfig tables exist, items #2-3 could
potentially be dropped and filled in from there instead. For XEHP+
where we require hwconfig, we instead have a PLACEHOLDER_THREADS_AND_URB
macro that makes it clear that these values are updated from hwconfig.
One nice thing is that the bits that could (or do) come from hwconfig
tables are now cleanly separate from those that do not (i.e. platform
feature support, PAT entry selection, and so on).
This patch does not touch GFX7 or earlier macros. We could probably
offer a similar treatment there, but they're generally working and not
quite as complex.
To verify that this commit does not have unintentional changes, I
recommend running
objdump -s build/src/intel/dev/libintel_dev.a.p/intel_device_info.c.o
before and after this commit, and diffing the output. The devinfo
structures produced are identical.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
intel_device_info_init_common calculates this for Gfx9+ based on
max_threads_per_psd and slice information. Mark it as zero in the
structures to make clear that the value there isn't useful, and make
it easier to diff binaries for the next commit's refactors.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
The documentation for 3DSTATE_URB_HS has 0 as the minimum number of HS
URB entries for all platforms. See BSpecs 32162, 47137, 56271 for
Gfx6-11, Xe, and Xe2-3, respectively.
This should silence warnings about our device info field not matching
the hwconfig tables.
Notably, nothing in our drivers currently uses this value so it cannot
have a functional impact.
Fixes: 4064b5546b ("intel/dev: reduce warning noise from urb settings")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
This is only needed for original 965G/GM clipper code, which only exists
in the legacy compiler. Send it off to live with the elk.
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
This is used in exactly one place in crocus, which already has a comment
indicating that this code is needed for original Gfx4 hardware. Just
replace that with a verx10 == 40 check.
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
This is used by a single place in ISL only for sanity checking the
decisions it has already made. The knowledge is already all centralized
in ISL these days, so we don't need a device info bit.
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
This code is a lot of mess for no real benefit. It's existed since
the dawn of isl, and serves to let you optimize out a single check
in release builds for Ironlake and Sandybridge systems. All other
uses are for asserts, which already get compiled out in release mode.
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
This code, since the dawn of time, has had a redundant check for gen5-6
separate stencil in the final else clause:
} else if (doing separate stencil on gen5-6) {
return compact
} else {
if (doing separate stencil on gen5-6)
return compact
...
}
We can eliminate that one. The else clause then has a single if, so it
can be folded into the "else if" ladder alongside the others.
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
Lowering the indirect derefs multiple times leads to very inefficient
shaders because of all the control flow inserted.
In particular on some DGC tests with mesh shaders, the tests can spin
for 1hour on an i7 and still not complete compilation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33809>
There are some jobs that were missing the FARM variable, which is useful
to lava_job_submitter.py to classify how it should interact with each
LAVA server and how it should assemble the job definition.
Right now, we use a set of regexex with the RUNNER_TAG variable, but
that is error-prone.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33888>
For each label store its offset and two lists of uses (for JIP and UIP).
Because the parser itself already restricts what opcodes can use labels
(and which ones), don't re-validate them.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33522>
The "JIP:" and "UIP:" markers were being ignored, so was possible
to switch their order in the text but the parser would act as the
same. Just fix the order now and enforce it through the parsing.
Since we are here, remove the "Jump:" and "Pop:" that are not used
for Gfx9+ anymore.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33522>
All remaining uses of that constructor would also use at_end(),
and vice-versa. So just implement that behavior in the constructor
itself.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33815>
And use brw_builder(brw_shader *) and brw_builder() constructors
where possible.
The way tests are written, it is necessary to initialize an "empty"
builder -- which is later replaced by a proper one. Default parameter
NULL make that initialization implicit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33815>
Since brw_inst now has the block it belongs and the block can
reach the shader, the only necessary information to create a
builder is the brw_inst itself.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33815>