Changes in this patch include:
- Rename all files in src/intel/common path
- Update the filenames used in source and build files
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9413>
The Android ones we put in anv_android.c. Maybe one day we'll want a
vk_android.h to put some common Android stuff but, for now, let's keep
it contained to ANV's android code.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>
This commit switches ANV to using the new common physical device and
instance base structs as well as the new dispatch framework. This
should make code sharing between Vulkan drivers much easier.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8676>
Allowing the user to set custom sample locations, by filling the
extension structs and chaining them to the pipeline structs according
to the Vulkan specification section [26.5. Custom Sample Locations]
for the following structures:
'VkPipelineSampleLocationsStateCreateInfoEXT'
'VkSampleLocationsInfoEXT'
'VkSampleLocationEXT'
Once custom locations are used, the default locations are lost and need
to be re-emitted again in the next pipeline creation. For that, we emit
the 3DSTATE_SAMPLE_PATTERN at every pipeline creation.
v2: In v1, we used the custom anv_sample struct to store the location
and the distance from the pixel center because we would then use
this distance to sort the locations and send them in increasing
monotonical order to the GPU. That was because the Skylake PRM Vol.
2a "3DSTATE_SAMPLE_PATTERN" says that the samples must have
monotonically increasing distance from the pixel center to get the
correct centroid computation in the device. However, the Vulkan
spec seems to require that the samples occur in the order provided
through the API and this requirement is only for the standard
locations. As long as this only affects centroid calculations as
the docs say, we should be ok because OpenGL and Vulkan only
require that the centroid be some lit sample and that it's the same
for all samples in a pixel; they have no requirement that it be the
one closest to center. (Jason Ekstrand)
For that we made the following changes:
1- We removed the custom structs and functions from anv_private.h
and anv_sample_locations.h and anv_sample_locations.c (the last
two files were removed). (Jason Ekstrand)
2- We modified the macros used to take also the array as parameter
and we renamed them to start by GEN_. (Jason Ekstrand)
3- We don't sort the samples anymore. (Jason Ekstrand)
v3 (Jason Ekstrand):
Break the refactoring out into multiple commits
v4: Merge dynamic/non-dynamic changes into a single commit (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1887>
This commit adds a "locations" parameter to emit_multisample and
emit_sample_pattern which, if provided, will override the default
sample locations.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1887>
When calculating a URB configuration, we start with a notion of how
much space each stage /wants/ (to achieve the maximum amount of
concurrency), but sometimes fall back to giving it less than that,
because we don't have enough space. (Typically, this happens when
the per-stage size is large, or there are many stages, or both.)
We now output a "constrained" boolean which is true if we weren't
able to satisfy all the "wants" due to a lack of space.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8721>
Things work a little different on Gen7 than they do on Gen8+. In
particular, SOBufferEnable lives in 3DSTATE_STREAMOUT but BufferPitch
lives in 3DSTATE_SO_BUFFER. This leaves us having to marshal data
around a bit more than we did on Gen8. Still, it's not too bad.
Normally, I don't spend much time on Gen7 but XFB just became a hard
requirement for DXVK so it stopped working for all our Haswell users.
Let's get them happily playing their games again. 😸
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3532
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6997>
Emit 3DSTATE_CLIP during cmd_buffer_flush_state so that we can change
the max viewport count dynamically.
v2: use one common clip state as size is the same for all gens (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5604>
Leave default state values as zero so that when we OR them later
it is only the dynamic state value that matters.
v2: code cleanup + skip topology emit in base batch
when topology is dynamic (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5604>
v2:
* Set pPipeline to NULL in the corresponding
graphics/compute_create_pipeline function.
* Keep current ANV behavior of bailing on the first real error.
v3:
* Don't return early if the pipeline succeeded.
v:4(5?):
* Simplify return conditions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5136>
This cleans up pipline create/destroy a bit after the compute/gfx split.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5457>
It probably doesn't matter because that buffer should have a stride of
zero. However, it still seems like a good idea just to be safe.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5022>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4690>
We should keep this very minimal; I don't know that we need to go all
struct gl_context on it. However, this gives us at least a tiny base on
which we can start building some common functionality.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4690>
gen12 does away with the single patch dispatch mode for tcs, and
increases some limits so that 8_patch mode can always work. Make the
necessary changes so we don't try to fall back to single patch mode.
Fixes KHR-GL46.tessellation_shader.single.max_patch_vertices and others
Fixes: 44754279ac ("intel/fs/gen12: Use TCS 8_PATCH mode.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4843>
Move the calculation to helper functions -- similar to what GL already
needs to do.
This is a preparation for dropping this field since this value is
expected to be calculated by the drivers now for variable group size
case. And also the field would get in the way of brw_compile_cs
producing multiple SIMD variants (like FS).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>
Identify if view_index is used only for position calculation, and use
Primitive Replication to implement Multiview in Gen12. This feature
allows storing per-view position information in a single execution of
the shader, treating position as an array.
The shader is transformed by adding a for-loop around it, that have an
iteration per active view (in the view_mask). Stores to the position
now store into the position array for the current index in the loop,
and load_view_index() will return the view index corresponding to the
current index in the loop.
The feature is controlled by setting the environment variable
ANV_PRIMITIVE_REPLICATION_MAX_VIEWS, which defaults to 2 if unset.
For pipelines with view counts larger than that, the regular
instancing will be used instead of Primitive Replication. To disable
it completely set the variable to 0.
v2: Don't assume position is set in vertex shader; remove only stores
for position; don't apply optimizations since other passes will
do; clone shader body without extract/reinsert; don't use
last_block (potentially stale). (Jason)
Fix view_index immediate to contain the view index, not its order.
Check for maximum number of views supported.
Add guard for gen12.
v3: Clone the entire shader function and change it before reinsert;
disable optimization when shader has memory writes. (Jason)
Use a single environment variable with _DEBUG on the name.
v4: Change to use new nir_deref_instr.
When removing stores, look for mode nir_var_shader_out instead
of the walking the list of outputs.
Ensure unused derefs are removed in the non-position part of the
shader.
Remove dead control flow when identifying if can use or not
primitive replication.
v5: Consider all the active shaders (including fragment) when deciding
that Primitive Replication can be used.
Change environment variable to ANV_PRIMITIVE_REPLICATION.
Squash the emission of 3DSTATE_PRIMITIVE_REPLICATION into this patch.
Disable Prim Rep in blorp_exec_3d.
v6: Use a loop around the shader, instead of manually unrolling, since
the regular unroll pass will kick in.
Document that we don't expect to see copy_deref or load_deref
involving the position variable.
Recover use_primitive_replication value when loading pipeline from
the cache.
Set VARYING_SLOT_LAYER to 0 in the shader. Earlier versions were
relying on ForceZeroRTAIndexEnable but that might not be
sufficient.
Disable Prim Rep in cmd_buffer_so_memcpy.
v7: Don't use Primitive Replication if position is not set, fallback
to instancing; change environment variable to be
ANV_PRIMITVE_REPLICATION_MAX_VIEWS and default it to 2 based on
experiments.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2313>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2313>
Lets specifiy maximum number of patches that will be accumulated before
a thread is dispatched.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3563>
The batch associated with the compute pipeline only needs room for a
MEDIA_VFE_STATE. So this patch moves the batch_data to each pipeline
struct and cap the one in compute pipeline.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4040>
Add two new structs that use the anv_pipeline as base. Changed all
functions that work on a specific pipeline to use the corresponding
struct.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4040>
Avoids waste for pipelines that don't use all the shaders, and is
flexible enough to cover cases where there are multiple variants per
shader (e.g. SIMD8/16/32 for fragment shader).
Even though we could pre-calculate the exact size of the array, this
is not a critical path so it is worth preventing the bug that will
likely happen when new variants are added but not accounted for.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4040>
Because we set the needs_data_cache bit from the NIR during compilation,
any time a shader was pulled out of the pipeline cache, we wouldn't set
the bit and the data cache was disabled. Fortunately, on Gen8+, this
bit is ignored because we always use the ALL section in the L3$ config
instead of separate DC and RO sections. On Gen7, however, this meant
that we were basically never running with the data cache enabled and our
compute performance was suffering massively because of it. This commit
improves Geekbench 5 scores on my Haswell GT3 by roughly 330% (no,
that's not a typo).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3912>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3912>
According to the BSpec, this should prevent hangs when using shaders
with large URB entries. A more precise fix can be done but it requires
re-arranging URB setup.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>