on top of scheduler changes, compile-time of shaders/blender/1017.shader_test:
Difference at 95.0% confidence
-0.00173202 +/- 0.00116931
-0.791537% +/- 0.532384%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
We generalize the sample_mask_in lowering to handle this too.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
Modify empty initializer list to use a zero initializer so we aren't
relying on the gnu extension.
Fixes: 690d9b0d00 ("util/u_trace: Rework resource management")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41715>
Some video decoders spit out AFBC(32x8,sparse) images. Advertise
support for this modifier so we can import such images.
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40886>
AFBC has a number of superblock sizes and valid layouts, with differing
combinations allowed.
It's quite clear that 16x16 is ambivalent about whether or not
block-split mode is used. 64x4 prohibits block-split mode, and 32x8
either requires or prohibits it depending on the format.
Add proper handling so we filter out the right combinations.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40886>
Reorder the AFBC modifier checking code to first query whether the
device can do the mode at all, then to query whether or not the format +
modifier is supported at all, then to query whether the specific image
usage is OK, then to query whether or not it's optimal.
This will come in useful later when we want to split modifier queries
into: can this modifier ever be used, what can this modifier be used
for, and is this the best modifier for this usage.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40886>
Zink never sets the fp16 screen cap, but the caps also are initialized
after zink_screen_init_compiler. So just replicate the check to be safe
here.
Fixes: 2146e09962 ("zink: keep ffma_weak and use GLSLstd450Fma for it")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41722>
Metal provides straightforward ways to copy an image to/from memory,
and image-to-image copies can be implemented by chaining them.
Note that host copy of combined depth-stencil is not supported, as
Metal does not allow CPU copy for these formats. Additionally, GPU
optimized contents are not allowed with host image copy usage; CTS
directly initializes the raw memory of optimized images to random
invalid data, which appears to decompress differently on GPU vs CPU
and fail.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41714>
`linear` controls whether the created image is in linear layout, and
`optimized_layout` controls only the `allowGPUOptimizedContents`
Metal property.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41714>
The job runs the following modules with ANGLE:
- CtsGraphicsTestCases
- CtsNativeHardwareTestCases
- CtsSkQPTestCases
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41440>
Blender uses atomic operations as part of its virtual shadow mapping
implementation. Virtual shadow mapping page tagging in compute shaders
benefits from divergent atomics fusion, while fragment shaders doing the
atomic raster step in general have worse performance with this
optimization turned on.
Thus, an option is added to only apply divergent atomics fusion to compute
shaders in ANV, and this option is enabled for Blender.
Initial support for divergent atomics fusion optimization in ANV was added
in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40631.
Signed-off-by: Christoph Neuhauser <christoph.neuhauser@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41706>
In shader-db, with `-p skl`, shaders/0ad/12.shader_test does not
compact an instruction because precompact overwrites portions of the
instruction. (Treating the three source instruction as a two source
when accessing instruction fields.)
This instruction could be compacted:
mad(8) g65<1>F g61<4,4,1>F g64<4,4,1>F -g17<4,4,1>F { align16 1Q };
But, since precompact erroneously sets bits, the instruction isn't
compacted.
Fossil testing:
* Tested with 0a3f3fd193 ("brw: drop unused color_outputs_valid
key") reverted, as fossils are currently producing inconsitent
results otherwise.
* Tested skl, icl, dg2, mtl, lnl, bmg and ptl. Only skl had a change.
SKL:
Totals:
CodeSize: 8335219296 -> 8320248992 (-0.18%)
Totals from 359508 (14.42% of 2492689) affected shaders:
CodeSize: 2838254352 -> 2823284048 (-0.53%)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41588>
Instead of doing the iadd manually we can use the uniform slot of the
ld/st/atom instruction getting rid of the iadd altogether.
Additionally for global memory we can also consume a 32 bit offset instead
of requiring it to be 64 bit.
Totals from 158539 (13.07% of 1212873) affected shaders:
CodeSize: 2308216336 -> 2242231136 (-2.86%); split: -2.86%, +0.00%
Number of GPRs: 8682436 -> 8662675 (-0.23%); split: -0.26%, +0.04%
SLM Size: 238816 -> 238604 (-0.09%)
Static cycle count: 2169063422 -> 2147747544 (-0.98%); split: -0.99%, +0.01%
Spills to memory: 25845 -> 25799 (-0.18%); split: -0.20%, +0.02%
Fills from memory: 25845 -> 25799 (-0.18%); split: -0.20%, +0.02%
Spills to reg: 45053 -> 45273 (+0.49%); split: -0.04%, +0.53%
Fills from reg: 36385 -> 36757 (+1.02%); split: -0.04%, +1.06%
Max warps/SM: 6027232 -> 6034616 (+0.12%); split: +0.12%, -0.00%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
This tries to handle all combinations we might run into to. We should rely
on previous optimizations that the more difficult cases never happend.
As a side benefit instead of lowering a UGPR to a GPR, it will now be
moved to the UGPR slot.
Totals from 258010 (21.27% of 1212873) affected shaders:
CodeSize: 3742700224 -> 3576740928 (-4.43%); split: -4.44%, +0.01%
Number of GPRs: 13606055 -> 13496463 (-0.81%); split: -0.86%, +0.05%
SLM Size: 589740 -> 589660 (-0.01%)
Static cycle count: 3271547493 -> 3272550831 (+0.03%); split: -0.47%, +0.50%
Spills to memory: 56180 -> 56136 (-0.08%)
Fills from memory: 56180 -> 56136 (-0.08%)
Spills to reg: 108211 -> 110013 (+1.67%); split: -0.63%, +2.30%
Fills from reg: 99216 -> 100471 (+1.26%); split: -0.30%, +1.56%
Max warps/SM: 9921228 -> 9972060 (+0.51%); split: +0.52%, -0.00%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
Adding the zero constants have a minor impact on stats due to some unlucky
interactions with nir_opt_cse, opt_instr_sched_prepass and assign_regs.
Totals from 61 (0.01% of 1212873) affected shaders:
CodeSize: 1044720 -> 1047472 (+0.26%); split: -0.00%, +0.27%
Static cycle count: 1198932 -> 1198490 (-0.04%); split: -0.07%, +0.04%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
Stop passing drmVersionPtr to backends and make sure all
manual version checks are transitioned to
pan_kmod_driver_version_at_least() to encourage new checks
to do the same.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41704>