Setting the NIR options takes care of iris thanks to the common st/mesa
linking code, and updating brw_nir_link_shaders should handle anv.
The main effort here is updating remap_tess_levels, which needs to
handle vector stores, writemasking, and swizzling. Unfortunately,
we also need to continue handling the existing single-component
access because it's used for TES inputs, which we don't vectorize.
We could try to vectorize TES inputs too, but they're all pushed
anyway, so it wouldn't buy us much other than deleting this code.
Also, we do have opt_combine_stores, but not one for loads.
One limitation of using nir_vectorize_tess_levels is that it works
on variables, and so isn't able to combine outer/inner writes that
happen to live in the same vec4 slot (for triangle domains). That
said, it's still better than before.
For writes, we allow the intrinsics to supply up to the full size
of the variable (vec4 for outer, vec2 for inner) even if the domain
only requires a subset of those components (i.e. triangles needs 3).
shader-db results on Icelake:
total instructions in shared programs: 19605070 -> 19602284 (-0.01%)
instructions in affected programs: 65338 -> 62552 (-4.26%)
helped: 271 / HURT: 0
helped stats (abs) min: 6 max: 24 x̄: 10.28 x̃: 12
helped stats (rel) min: 1.30% max: 18.18% x̄: 5.80% x̃: 7.59%
95% mean confidence interval for instructions value: -10.71 -9.85
95% mean confidence interval for instructions %-change: -6.17% -5.43%
Instructions are helped.
total cycles in shared programs: 851854659 -> 851820320 (<.01%)
cycles in affected programs: 618749 -> 584410 (-5.55%)
helped: 271 / HURT: 0
helped stats (abs) min: 69 max: 540 x̄: 126.71 x̃: 108
helped stats (rel) min: 2.57% max: 37.97% x̄: 6.17% x̃: 5.06%
95% mean confidence interval for cycles value: -135.89 -117.54
95% mean confidence interval for cycles %-change: -6.72% -5.63%
Cycles are helped.
total sends in shared programs: 1025285 -> 1024355 (-0.09%)
sends in affected programs: 6454 -> 5524 (-14.41%)
helped: 271 / HURT: 0
helped stats (abs) min: 2 max: 8 x̄: 3.43 x̃: 4
helped stats (rel) min: 5.71% max: 25.00% x̄: 14.98% x̃: 17.39%
95% mean confidence interval for sends value: -3.57 -3.29
95% mean confidence interval for sends %-change: -15.42% -14.54%
Sends are helped.
According to Felix DeGrood, this results in a 10% improvement in
the draw call time for certain draw calls from Strange Brigade.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>
VS, TCS, TES, and GS threads must end with a URB write message with the
EOT (end of thread) bit set. For VS and TES, we shadow output variables
with temporaries and perform all stores at the end of the shader, giving
us an existing message to do the EOT.
In tessellation control shaders, we don't defer output stores until the
end of the thread like we do for vertex or evaluation shaders. We just
process store_output and store_per_vertex_output intrinsics where they
occur, which may be in control flow. So we can't guarantee that there's
a URB write being at the end of the shader.
Traditionally, we've just emitted a separate URB write to finish TCS
threads, doing a writemasked write to an single patch header DWord.
On Broadwell, we need to set a "TR DS Cache Disable" bit, so this is
a convenient spot to do so. But on other platforms, there's no such
field, and this write is purely wasteful.
Insetad of emitting a separate write, we can just look for an existing
URB write at the end of the program and tag that with EOT, if possible.
We already had code to do this for geometry shaders, so just lift it
into a helper function and reuse it.
No changes in shader-db.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>
For now we use an empty set of L3 config settings on DG2 & MTL, which
will cause the L3 programming to set L3FullWayAllocationEnable.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18770>
After each 3DPRIMITIVE, we need to send a dummy post sync op if point or
line list was used or if had only 1 or 2 vertices per primitive.
v2: add missing _3DPRIM_POINTLIST_BF (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18746>
Not sure if fixes anything because it's always 16 at least, but this
is more correct.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
And run algebraic when either int64 for float64 are not supported so
those don't end up in the generated code.
Cc: mesa-stable
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
Initially I tried to store ray query state in the RT scratch space but
got the offset wrong. In the end putting this in the scratch surface
makes more sense, especially for non RT stages.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
It's very unfortunate that we have the RT scratch being conflated with
the usual scratch. In our implementation those are 2 different buffers.
The usual scratch access are done through the scratch surface state
(delivered through thread payload), while RT scratch (which outlives
thread dispatch with shader calls) is its own buffer.
So checking the NIR scratch size makes no sense as we can have normal
scratch accesses completely unrelated to RT scratch accesses.
This change switches the validation by looking at whether the scratch
base pointer intrinsic is being used (which is what we use/abuse to
implement RT scratch).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
Initially the level is world (top level), then it's whatever level the
potential hit is.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
We need the traversal stack to saved/restored along with mem hits.
Total spill/fill is 256bytes.
We can potentially optimize this but we have to be very careful about
what state the query is in.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
This function copies the potential hit from its memory location to the
committed hit location. A couple of fields got their bit offset wrong.
Fixes some CTS tests in dEQP-VK.ray_query.*
v2: Copy primitive/instance leaf pointers
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0465714790 ("intel/nir/rt: add more helpers for ray queries")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
MTL has different CS prefetch sizes for each CS type.
So here replacing the cs_prefetch_size in intel_device_info struct
by a function that takes as argument the i915 engine class.
Fixes:
- func.cmd-buffer.small-secondaries.q0
- dEQP-VK.multiview.secondary_cmd_buffer.*
- Several other VK CTS tests that uses secondary_cmd_buffer
v2:
- renamed to intel_device_info_get_engine_prefetch() (Jordan)
v3:
- renamed to intel_device_info_calc_engine_prefetch()
- store each engine class prefetch in intel_device_info
BSpec: 45718
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18597>
Field is ignored on BDW+, 3DSTATE_VF_TOPOLOGY is used to set topology.
We still want to preserve topology information in state because
of other upcoming changes that require it.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18698>
Applications may use out-of-range values, driver is responsible for
clamping to implementation-dependent sample location coordinate
range.
Without clamp we hit assert when packing 3DSTATE_SAMPLE_PATTERN if
application attempts to use bigger value than 0.9375.
util_bitpack_ufixed: Assertion `min <= v && v <= max' failed.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18696>
This was only necessary for gen7 platforms that no longer support by
anv.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18601>
GPUs supported by this driver don't have I915_ENGINE_CLASS_COMPUTE,
so we can drop all this code.
v2:
- keeping anv_override_engine_counts()
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18601>
If the pipeline does not use libraries and the shaders are all found in
the cache, we end up with empty groups and crash at pipeline emit time.
Fixes a bunch of tests under
dEQP-VK.pipeline.monolithic.shader_module_identifier.\*.ray_tracing\*
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18582>
It's already called in brw_postprocess_nir and calling it the second time
actually breaks shading rate.
Initially, when I added this call here in 9acb30c8c4, I was testing it
on an internal tree, which didn't have brw_nir_lower_shading_rate_output call
in brw_postprocess_nir.
Fixes: 9acb30c8c4 ("intel/compiler: implement primitive shading rate for mesh")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18702>
Intel has different Z interpolation float point rounding
than other mesa gpus
For example gl_Position.z = 0.0 will be interpolated to
gl_FragCoord.z = 0.5 for all gpus
gl_FragCoord = -0.00000001 will be interpolated to
gl_FragCoord.z = 0.4999999702 for Intel
and rounded to gl_FragCoord.z = 0.5 for other gpus
Games with LEQUAL depth func will fail depth test on Intel
and will pass it on other gpus in such case
This workaround lowers translated depth range
and several gl_FragCoord.z coords with extra small difference
will be translated to the same UINT16\UINT24\UINT32
value of an integer depth buffer
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7199
Signed-off-by: Illia Polishchuk <illia.a.polishchuk@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18412>
anv_pipeline_get_last_vue_prog_data (used by emit_3dstate_primitive_replication)
doesn't work for mesh stage.
Fixes: ae57628dd5 ("anv: Drop anv_pipeline::use_primitive_replication")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18495>
This was already been done to gen7 platforms, so now extending to all
platforms without has_64bit_int.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18577>
During the lower_regioning() optimization, required_exec_type() is
returning BRW_REGISTER_TYPE_UQ type when processing
SHADER_OPCODE_SHUFFLE instructions of type BRW_REGISTER_TYPE_DF but
MTL has float64 support but lacks int64 support causing shader
compilation to fail.
To fix that we could make required_exec_type() return
BRW_REGISTER_TYPE_DF in such case but SHADER_OPCODE_SHUFFLE virtual
instruction runs in the integer pipeline(inferred_exec_pipe()).
So here replacing the has_64bit check by has_64bit_int, this will
properly handle older and newer cases making this function return
BRW_REGISTER_TYPE_UD.
Then lower_exec_type() will take care to generate 2 32bits operations
to accomplish the same.
While at it also dropping the 'devinfo->verx10 == 70' check as
GFX7_FEATURES fall into the same category as MTL, has float64 but no
int64 support.
Fixes at least this crucible tests:
func.uniform-subgroup.exclusive.fadd64.q0
func.uniform-subgroup.exclusive.fmin64.q0
func.uniform-subgroup.exclusive.fmax64.q0
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18577>
This workaround disables batch level preemption for Polygon,
Trifan and Lineloop primitive topologies.
v2: cleanups (José)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18456>
This can be used to disable batch preemption on DG2+ either
completely or with selected primitive topologies.
Commit adds bit explicitly for Polygon, Trifan and LineLoop
topologies for Wa_14015207028.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18456>