The next two commits modify the destination regioning in a way that,
which still correct, trigger assertion failures if we try to fix the
regioning here.
Broadcast gets lowered in brw_eu_emit. For the purposes of region
restrictions, let's assume that the final code emission will do the
right thing. Doing a bunch of shuffling here is only going to make a
mess of things.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097>
tracing_ctx is always non-NULL in issue_fragment_jobs. Check
tracing_ctx->enabled instead. This fixes GPU faults when the desc
ringbuf wraps.
Fixes: bd49fa68b0 ("panvk/csf: Use event-based CS tracing")
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32469>
Caterina already implemented this in ed64fa034b ("panvk: never prefer or
require dedicated allocation for buffers") and dbdaefb6ed ("panvk: never
require dedicated allocation for images"), so let's flip the switch.
We pass 4505 of the CTS tests, and fail a single one. Let's mark that
one as an expected failure and move on for now.
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32466>
The spec mandates that if the program object isn't created from IL, it
should not touch the buffer. Passing an empty slice would achieve that,
but it's better to be explicit here.
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32268>
The old way was quite annoying as it required to create a Vec to even get
the size of the result causing needless computations.
This also meant that copying into the result buffer always required to go
through a byte Vec even though we could just do the copy directly.
The main idea here is that instead of returning the result, we simply call
into a write function giving us more flexibility here.
Potentially this will also allow us to add overloads for Iterators or to
even use closures in case the size calculation is cheaper than creating
the value just to get the size.
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32268>
Since the leaf node pass doesn't run for updates, leaf_node_count never
got set. This resulted in updates always running on 0 leaves (i.e. being
no-ops).
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32451>
Backward inter-shader code motion left dead output stores in the producer
in rare cases. Those dead stores would then make their way into drivers
and hw.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
If input_value, index, index1 or index2 is an input, here are examples of
code that this commit moves from consumers to producers:
* input_value * uniform_array[index]
* uniform_array[index]
* ubo[0].array[index]
* ubo[index].var
* ubo[index1].array[index2]
If the array index is computed from an input, it must be flat or convergent
within a primitive to be moved. If the array index is not an input, it must
be a uniform expression.
dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_fragment
has UBO indexing that is moved to the producer by this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
Uniform and UBO loads with non-constant indices are now propagated.
The majority of this code implements cloning deref chains.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
They are a maintainenance burden since they would need changes to
support more instruction types that nir_opt_varyings will be able to
move between shaders, and they are almost identical to
default_varying_estimate_instr_cost, so just use that.
The cost threshold is adjusted for AMD because
default_varying_estimate_instr_cost is slightly different.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
Holes due to indirectly-indexed inputs were ignored, making the compaction
worse when such inputs were present alongside convergent inputs.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
Without this, compaction can put inputs into vec4 slots already occupied
by indirectly-accessed inputs while ignoring their interpolation qualifier,
which is incorrect.
All input components sharing the same vec4 slot must use interpolation
qualifiers that are compatible with each other.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
Most of these conditions are repeated below with a continue statement.
This just puts break at the end where all of them are false.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
Even the most flexible interpolation that we have in NIR options
(nir_io_has_flexible_input_interpolation_except_flat) doesn't allow
mixing flat and non-flat in the same vec4. This (legacy) optimization
can't promote interpolated inputs to flat if it doesn't consider
the interpolation mode of the whole vec4 slot.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>