As a result of 9baa33cef0 our backend compiler leaves params pretty
much untouched. So in order to avoid storing uninitialized values in
the shader cache blobs, just 0 out this array.
I've considered not even allocating this array which works on gen8+
but the vec4 backend still makes a copy of this array and so it
crashes on memcpy on HSW.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9baa33cef0 ("anv: Rework push constant handling")
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3516>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3516>
Store push_ranges in ascending order, and only "shift" them to the end
of the array during state packet emission.
We don't need this workaround with the new 3DSTATE_CONSTANT_ALL packet.
So instead of applying the workaround here just for GEN < 12 (which
requires and extra loop through all the ranges to figure out if we
should shift them or not), we simply move the whole logic to the state
emission code. At that point, in a later commit, we are already looping
through all of the ranges anyway to check which packet we will be using,
so we might as well implement the workaround there, where it is going to
be used.
v3: Move gen8+ workaround to the state emission code (Caio).
v4: Add explanation of why we moved the workaroudn (Caio).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Instead of blindly dirtying descriptors and push constants the moment we
see a pipeline change, check to see if it actually changes the bind
layout or push constant layout. This doubles the runtime performance of
one CPU-limited example running with the Dawn WebGPU implementation when
running on my laptop.
NOTE: This effectively reverts beca63c6c0. While it was a nice
optimization, it was based on prog_data and we can't do that anymore
once we start allowing the same binding table to be used with multiple
different pipelines.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This substantially reworks both the state setup side of push constant
handling and the pipeline compile side. The fundamental change here is
that we're no longer respecting the prog_data::param array and instead
are just instructing the back-end compiler to leave the array alone.
This makes the state setup side substantially simpler because we can now
just memcpy the whole block of push constants and don't have to
upload one DWORD at a time.
This also means that we can compute the full push constant layout
up-front and just trust the back-end compiler to not mess with it.
Maybe one day we'll decide that the back-end compiler can do useful
things there again but for now, this is functionally no different from
what we had before this commit and makes the NIR handling cleaner.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It turns off that emitting push constants is one of the hottest paths in
the driver and ANY work we do there costs us. By pre-computing things a
bit ahead of time, we shave 5% off the runtime of a CPU-limited example
running with the Dawn WebGPU implementation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>