Based the blorp_params, blorp_get_cs_local_y returns a recommended
local_y size for a compute shader.
blorp_set_cs_dims sets the compute program dims based on a given
local_y size.
Reworks:
* Add blorp_set_cs_dims (s-b Jason)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
Reworks:
* Don't pack params, just memcpy param struct (s-b Jason)
* Old subject: "intel/blorp: Emit compute program if
params.cs_prog_data is set"
* Various cleanups of push-const size/alignment (s-b Jason)
* Fix subslice count by moving to devinfo (s-b Ken)
* Simplify cw.InterfaceDescriptor code (s-b Ken)
* Drop some comments from i965 (s-b Ken)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
Render will use blorp_setup_binding_table and blorp_emit_btp, but
compute will only use blorp_setup_binding_table.
Rework:
* Use blorp_setup_binding_table, blorp_emit_btp (s-b Jason)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
The compute path will use blorp_emit_sampler_state, whereas the render
path will use blorp_emit_sampler_state_ps.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
Make INTEL_DEBUG=blorp dump the blorp compute shaders instead using
the general INTEL_DEBUG=cs which is now reserved for actual compute
programs.
Ref: 05933fb0f7 ("intel/compiler: Use INTEL_DEBUG=blorp to dump blorp shaders")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
When they re-arranged all the dataport stuff and added the LSC, doing
URB fencing through the dataport no longer makes sense. Instead, there
is now a fence message on the URB shared function.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13092>
Found this nugget while looking at the ACO driver. It seems sensible to
avoid SLM fences if there is no SLM. This also makes the check depend
on SLM usage rather than just shader stage which will be useful if we
ever implement task/mesh because task shaders also have SLM.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13092>
Start off making everything look like LSC where we have three types of
fences: TGM, UGM, and SLM. Then, emit the actual code in a generation-
aware way. There are three HW generation cases we care about:
XeHP+ (LSC), ICL-TGL, and IVB-SKL. Even though it looks like there's a
lot to deduplicate, it only increases the number of ubld.emit() calls
from 5 to 7 and entirely gets rid of the SFID juggling and other
weirdness we've introduced along the way to make those cases "general".
While we're here, also clean up the code for stalling after fences and
clearly document every case where we insert a stall.
There are only three known functional changes from this commit:
1. We now avoid the render cache fence on IVB if we don't need image
barriers.
2. On ICL+, we no longer unconditionally stall on barriers. We still
stall if we have more than one to help tie them together but
independent barriers are independent. Barrier instructions will
still operate in write-commit mode and still be scheduling barriers
but won't necessarily stall.
3. We now assert-fail for URB fences on LSC platforms. We'll be adding
in the new URB fence message for those platforms in a follow-on
commit.
It is a big enough refactor, however, that other minor changes may be
present.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13092>
This knowledge was repeated in multiple places so move the values to
intel_device_info struct.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13014>
Since only Anv uses the value, I'm only enabling this on anv.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 518693c3ec ("spirv: Handle the SubgroupSize execution mode")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13034>
The populate_base_prog_key will set VARYING depending if the pipeline
flag is used. Later, when full subgroups flag is set, it will flip to
UNIFORM -- which for compute shaders is effectively the same, so don't
bother setting it again.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12946>
The lowering passes will soon be moved to another function, so there
won't be any choice.
As a side benefit, this allows eliminating the uses_atomic_load_store
**pointer** parameter from brw_nir_lower_storage_image. For some reason
crocus was passing false instead of NULL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12858>