We don't want to use it on gen5 and earlier because only RNDD can be
done with a single instruction and we can implement RNDU(x) as -RNDD(-x)
so it's better to just do that when we have the instruction. On gen6
and above, we may as well just use the right instruction.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5596>
Some day we probably want to move it out of brw_shader if we're going to
share it with IBC but that can be another day.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5596>
Reworks:
* Use device rather than physical_device for info. (Lionel)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5572>
Future (discrete) platforms won't have support for get/set tiling. This
function allows our drivers to query for that, by simply trying to get
the tiling from a dummy buffer.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4956>
We don't have any URB size set in the L3 config, since it's a fixed
value now. So just return the value that we know from gen_device_info.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4956>
If the platform's default L3 config is NULL, then it now gets
initialized only at context init time, and cmd_buffer_config_l3 will
always return immediately.
Rework:
* Remove unneeded check on !cfg in cmd_buffer_config_l3 (Jason)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4956>
On some gen12 platforms we will use the L3FullWayAllocationEnable and
never reconfigure the L3 setup.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4956>
An example entry with URB size being 0 is in the cnl list.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4956>
Add an entry for this modifier in the modifier_info array.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5420>
Add a new aux usage which more accurately describes the behavior of
CCS_E on gen12. On this platform, writes using the 3D engine are either
compressed or substituted with fast-cleared blocks.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5363>
There's no good reason to have the "which table do I use?" code
duplicated twice. The only advantage to the way we were doing it before
was that we could move the switch statement outside the loop. If this
is ever an actual device initialization perf problem that someone cares
about, we can optimize that when the time comes. For now, the
duplicated cases are simply a platform-enabling pit-fall.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5530>
The docs say that these registers should only be read with a certain
type, and I'm inclined to believe that the hardware behaves that way,
but it makes the assembler a little more confusing and also confuses the
user of the assembler that some operands don't take types or regions.
Just always requiring regions and types seems like the sensible thing.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5514>
a0 is the only address register, and cr0 is the only control register,
so there's no need to return the register number, espcially since the
lexer explicitly consumes "a0" and "cr0".
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5514>
v2:
* Set pPipeline to NULL in the corresponding
graphics/compute_create_pipeline function.
* Keep current ANV behavior of bailing on the first real error.
v3:
* Don't return early if the pipeline succeeded.
v:4(5?):
* Simplify return conditions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5136>
Implement the VK_PIPELINE_CACHE_CREATE_EXTERNALLY_SYNCHRONIZED_BIT_EXT
bits of the VK_EXT_pipeline_creation_cache_control extension.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5136>
2 source instruction only support immediate for src1 operand, so no
point in adding optimization condition for src0 oprand.
v2:
- Update commit message and don't remove ADD optimization (Matt Turner)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5341>
Those are currently hurting Felix' ability to look at the batches.
We can probably detect this in the aubinator but that's a bit more
work than falling back to the previous behavior.
v2: Condition VK_KHR_performance_query to not using this variable (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5391>
This cleans up pipline create/destroy a bit after the compute/gfx split.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5457>
Old script created files in the source directory, which is generally
considered bad form.
The rewrite to python instead of duct-taping around in the shell script
goes towards the goal of only having cross-platform python scripts,
which is also harder to make mistakes in than shell scripts.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5155>
Python 2 is dead and this script is only run by devs, all of which have
had python3 available for basically forever.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5362>
Gen9 and Cherryview have the ability to mark texture instructions with
the End-of-thread bit under some conditions, which allows the texture
result to be written to the render target directly, rather than
returning to the EU.
In order to handle overlapping primitives correctly, we have to use the
'sendc' instruction which stalls until other threads potentially writing
to the same locations in the render target are retired. Unfortunately,
this stall happens before the texture is sampled (rather than in
parallel with stall), so for some literal edge cases (like the diagonal
edge between two triangles forming a rectangle) there can be a
performance penalty. As a result, it's probably not a good idea to use
this optimization in general.
I had planned to leave it enabled only for BLORP, where we use rectangle
primitives and are typically clearing/blitting an entire render target
without any overlapping primitives, but I noticed that the optimization
wasn't applied in some normal cases anyway. For example, in the piglit
test tests/shaders/glsl-fs-texture2d-bias.shader_test it is applied to
one BLORP-blit shader but not another due to some kind of mishandling of
register types (the destination register type of the texture operation
is UD while the color source of the render target write is F).
Additionally the instruction scheduler assumed that the combined texture
and render target write operation took 0 cycles, leading to cycle
estimates that are wildly inaccurate. Since the optimization was not
implemented for SIMD32 and our decision whether to use the SIMD32
program is made by comparing the estimated performance with that of the
SIMD16 shader, we wrongly threw out a bunch of SIMD32 programs that are
likely profitable.
total cycles in shared programs: 472807891 -> 473784245 (0.21%)
cycles in affected programs: 108277 -> 1084631 (901.72%)
helped: 0
HURT: 1290
total sends in shared programs: 998955 -> 1000245 (0.13%)
sends in affected programs: 1400 -> 2690 (92.14%)
helped: 0
HURT: 1290
LOST: 0
GAINED: 33
This patch shows no performance changes in Intel's Mesa performance CI.
Given the problems, the lack of evidence that the pass improves
performance, and the fact that the hardware feature was removed from
subsequent GPU generations, I think that the pass is not valuable and
should be removed.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5412>