We need to store the driver location when we add it to the state
param list. Currently this code only works because
st_nir_assign_uniform_locations() later adds duplicate params but
this will be fixed in a following patch.
Unfortunatly we could not simply move nir_lower_clip to the state
tracker like we did in the reset of this MR as it is also called
directly from some drivers. Also to avoid making nir depend on the
gl_parameter defintions we simply loop over the results of the lowering
and fix up the locations.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37037>
This previously worked because the driver locations would later
be set when st_nir_assign_uniform_locations() was called for a
second time but we will be skipping the extra call in a later patch.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37037>
This is gl specific and a following fix will add more gl specific
params so here we move it to the st to avoid filling nir.h with
more junk.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37037>
This previously worked because the driver locations would later
be set when st_nir_assign_uniform_locations() was called for a
second time but we will be skipping the extra call in a later patch.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37037>
This is gl specific and a following fix will add more gl specific
params so here we move it to the st to avoid filling nir.h with
more junk.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37037>
We need to store the driver location when we add it to the state
param list. Currently this code only works because
st_nir_assign_uniform_locations() later adds duplicate params but
this will be fixed in a following patch.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37037>
This previously worked because the driver locations would later
be overwritten when st_nir_assign_uniform_locations() was called
for a second time but we will be skipping the extra call in a
later patch.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37037>
This will allow us to fix bugs with driver_location in following
patches while keeping code to a minimum.
For now we always set uniform packing to false, this will be
corrected as we go in following patches for now it doesn't matter
as the driver location is later overwritten.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37037>
Check if allocation is large enough to hold the
linear and gc contexts before probing for them.
Fixes: 7b5b164281 ("util: Add function print information about a ralloc tree")
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37017>
When disassembling and BRW IR is available (which happens in the
generator), there will be pointers to the BRW's basic block structures
that are used to print the block numbers and predecessor/successors
in the output.
There are two challenges:
- Because DO and FLOW instructions are not real instructions, they are
not emitted in the output but would still cause the output to contain
empty blocks. Previous code accounted for DO but still had problems.
- DO blocks have special physical links that don't make sense when the
DO is not emitted at the end, but they would be shown even if that
block was omitted.
These issues can be seen here (edited to remove non-essential bits)
```
START B0 (2 cycles)
mov(8) g126<1>UD 0x3f800000UD
END B0 ->B1
START B2 <-B1 <-B4 (0 cycles)
END B2 ->B3
START B3 <-B2 (260 cycles)
LABEL1:
mov(8) g1<1>D 0D
cmp.ge.f0.0(8) null<1>D g2<0,1,0>D 10D
sync nop(1) null<0,1,0>UB
send(1) g0UD g1UD nullUD
(+f0.0) break(8) JIP: LABEL0 UIP: LABEL0
END B3 ->B1 ->B5 ->B4
START B4 <-B3 (1000 cycles)
sync nop(1) null<0,1,0>UB
mov(8) g126<1>UD g0<0,1,0>UD
LABEL0:
while(8) JIP: LABEL1
END B4 ->B2
START B5 <-B1 <-B3 (20 cycles)
```
For example:
- Block 1 is missing (a skipped DO block)
- Block 2 is empty (it was a FLOW block)
- Block 3 ends with a link to Block 1 (the special links involving DO
blocks).
Two key changes were made to fix this. First, skip the DO and FLOW
blocks completely. The use_tail ensures that the instruction group is
reused to avoid empty blocks. Second, when printing, the successors and
predecessors, walk through the skipped blocks. And finally, don't print
the special blocks.
With the fix, here's the output. Note the blocks retain their original
BRW IR number.
```
START B0 (2 cycles)
mov(8) g127<1>UD 0x3f800000UD
END B0 ->B3
START B3 <-B0 <-B4 (260 cycles)
LABEL1:
mov(8) g1<1>D 0D
cmp.ge.f0.0(8) null<1>D g2<0,1,0>D 10D
sync nop(1) null<0,1,0>UB
send(1) g0UD g1UD nullUD
(+f0.0) break(8) JIP: LABEL0 UIP: LABEL0
END B3 ->B5 ->B4
START B4 <-B3 (1000 cycles)
sync nop(1) null<0,1,0>UB
mov(8) g127<1>UD g0<0,1,0>UD
LABEL0:
while(8) JIP: LABEL1
END B4 ->B3
START B5 <-B3 (20 cycles)
```
Issue was spotted by Ken.
Fixes: d2c39b1779 ("intel/brw: Always have a (non-DO) block after a DO in the CFG")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36226>
The sampled and color attachment bits don't actually affect image layout
in any meaningful way. They just cause us to create extra descriptors
in cases where we may not need them. However, now that meta always sets
view usage, we always create the usages meta needs, even if the client
doesn't request them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36957>
It alwways comes in through the create flags now.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36957>
This moves the bit into vk_image.h and handles it automatically in
vk_image_view_init() so drivers don't have to.
This also means that Meta is now hitting the driver_internal path for
all its images so we need to do the same format fixups there that we
sould normally do on the !driver_internal path. We don't want to do
them unconditionally because v3dv and other drivers override
depth/stencil color formats and we don't want to break that.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36957>
This is an optimization, but it also seems to be required because the HW
sometimes fails to set ViewIndex to 0. This fixes flakes with
dEQP-VK.renderpass2.fragment_density_map.*multiviewport where the VS for
the main renderpass is reused for the copy renderpass afterwards and it
copies ViewIndex to ViewportIndex expecting it to be 0 since multiview
is disabled for the copy renderpass.
Closes: #13534
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37206>
The shared memory settings in the QMD affect occupancy as the hardware
needs to manage the available shared memory across all workgroups.
We should set the target to the amount of shared memory used by all the
blocks that can run concurrently taking GPR usage and the local size into
account.
E.g. a shader using 88 gprs, 256 threads and a shared memory size of 18944
can have 2 blocks running concurrently, therefore on an Ampere we need to
set the target to 64kB to properly utilize the hardware.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37135>
It doesn't change the reported values, but it will allow us to easily
advertise real hardware limits in the future.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37135>
We can allocate more than 48k of shared memory, but the limits differ
across hardware, so we need to take it all into account to create the
shared memory splits the hardware can accept.
This does change behavior on Turing, but the assumption is, that the
hardware has simply rounded up. Might need performance testing on Turing
to verify nothing regresses here.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37135>
It's a bit of a disaster, but each generation supports a different set of
shared memory configurations.
Knowing the maximum is important for compute shader performance, knowing
all the legal sizes for QMD generation.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37135>
On hw we have up to 228k of available Shared memory so a 16 bit int isn't
enough for that.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37135>
Now that we're guaranteed the upper 32 bits are zero initialized,
there's no reason we need to do a 64-bit write here.
This is a 0.3% performance improvement on the Sascha Willems
conditionalrender demo with all rendering disabled (638 fps -> 640 fps)
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37187>
We can initialize this just once from the CPU side instead of
overwriting it each time using the copy engine.
This is a 5% performance improvement on the Sascha Willems
conditionalrender demo with all rendering disabled (607 fps -> 638 fps)
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37187>
Within a single command buffer, we know that our operations will happen
sequentially so we don't need to allocate a unique address per
vkCmdBeginConditionalRenderingEXT - we can re-use the same address
instead.
Improves perf on the Sascha Willems conditionalrender demo with all
rendering disabled by about 2% (595 fps -> 607 fps)
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37187>
This is a 41% performance improvement on the Sascha Willems
conditionalrender demo with all rendering disabled (422 fps -> 595 fps)
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37187>