This is needed for CL image hostptr support, but it's possible
it could hit these paths from GL/Vulkan
Fixes: 9a57dceeb7 ("llvmpipe: add support for user memory pointers")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13375>
This uses a C++ template to compute the memcmp size at compile time,
which is important for getting inlined memcmp.
There are 4 different key sizes now:
GE with inlined uniforms: 68 bytes
GE without inlined uniforms: 52 bytes
PS with inlined uniforms: 28 bytes
PS without inlined uniforms: 12 bytes
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13285>
ps is for the pixel shader, while ge is for VS, TCS, TES, and GS.
si_shader_key: 68 bytes
si_shader_key_ge: 68 bytes
si_shader_key_ps: 28 bytes
The only notable change is that si_shader_select_with_key is changed
to a C++ template. Other changes are trivial.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13285>
With the `gtest` protocol meson will add some extra arguments to the
test to generate better junit results, which may be useful. This
protocol is only available in meson 0.55.0+, so keep using the default
`exitcode` protocol for meson older than that.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8484>
INTEL_DEBUG is defined (since 4015e1876a) as:
#define INTEL_DEBUG __builtin_expect(intel_debug, 0)
which unfortunately chops off upper 32 bits from intel_debug
on platforms where sizeof(long) != sizeof(uint64_t) because
__builtin_expect is defined only for the long type.
Fix this by changing the definition of INTEL_DEBUG to be function-like
macro with "flags" argument. New definition returns 0 or 1 when
any of the flags match.
Most of the changes in this commit were generated using:
for c in `git grep INTEL_DEBUG | grep "&" | grep -v i915 | awk -F: '{print $1}' | sort | uniq`; do
perl -pi -e "s/INTEL_DEBUG & ([A-Z0-9a-z_]+)/INTEL_DBG(\1)/" $c
perl -pi -e "s/INTEL_DEBUG & (\([A-Z0-9_ |]+\))/INTEL_DBG\1/" $c
done
but it didn't handle all cases and required minor cleanups (like removal
of round brackets which were not needed anymore).
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13334>
originally, a slab attempts to reclaim a single bo. there are two outcomes
to this which can occur:
* the bo is reclaimed
* the bo is not reclaimed
if the bo is reclaimed, great.
if the bo is not reclaimed, it remains at the head of the list until it can
be reclaimed. this means that any bo with a "long" work queue which makes it
into a slab will effectively kill the entire slab. in a benchmarking scenario,
this can occur in rapid succession, and every slab will get 1-2 suballocations
before it reaches a bo that blocks long enough for a new slab to be needed.
the inevitable result of this scenario is that all memory is depleted almost instantly,
all because pb assumes that if the first bo in the reclaim list isn't ready, none of them
can be ready
for drivers like radeonsi, this happens to be a fine assumption
for drivers like zink, this is entirely not workable and explodes the gpu
Cc: mesa-stable
Reviewed-by: Witold Baryluk <witold.baryluk@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Tested-by: Witold Baryluk <witold.baryluk@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13345>
Instead of making images have a well-defined size, insert a dummy
variable of the appropriate type which we can use for the parameter
block layout. This will work much better when we switch over to
nir_var_mem_image.
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
Use nir_foreach_image_variable for images so we survive the coming
refactor where they get their own mode.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
Scratch patching code in iris_upload_dirty_render_state (see MERGE_SCRATCH_ADDR
calls) assumes that in all shader stages derived_data field stores 3DSTATE_XS
packet first.
This is not true for TESS_EVAL (DS), so we end up patching 3DSTATE_TE
instead of 3DSTATE_DS leading to DWordLength becoming 11 instead of 9
(9 == 3DSTATE_DS.DWordLength, 2 == 3DSTATE_TE.DWordLength, and 9|2 == 11),
and hardware hanging on the next instruction.
Fix this by reversing the order of packets for TESS_EVAL stage.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5499
Fixes: 4256f7ed58 ("iris: Fill out scratch base address dynamically")
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13358>
(ported from iris - airlied)
The MI_COPY_MEM_MEM version of resource_copy_region has known bugs:
It's failing to set valid_buffer_range correctly
It's missing iris_emit_buffer_barrier_for() for the source/destination, so there may be missing flushes.
There are some bad interactions with the tile cache and VF using L3.
Even with those fixed, if you expand the "no more than 16 bytes" restriction to allow copies up to 1024 bytes, then it starts failing Piglit tests on Icelake.
We could probably fix this. However, I had originally only measured a 0.689096% +/- 0.473968% (n=4) speedup in Shadow of Mordor's OpenGL port, which is already fairly small, especially before adding missing flushes. Further, some of that likely came from not switching between render and compute...which we'll soon be able to avoid thanks to BLOCS.
Folks were also worried that MI_COPY_MEM_MEM can't be pipelined, and that stalling the command streamer may actually slow things down, especially as the GPUs become more powerful. We aren't really sure about this, but it's another concern.
So, let's just get rid of this optimization. It seemed like a good idea at the time, but it's just causing issues for very little gain.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13374>
This patch adds Tessellation Distribution on top of Geometry
Distribution. Using recommended values based on performance studies
across a range of workloads.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12091>
Using recommended values based on performance studies across a range
of workloads.
Rework:
* Always enable geometry distribution
* Set ListCutIndexEnable if primitive restart is enabled
* Set distribution mode based on TEEnable
v2:
- Flag missing IRIS_DIRTY_VFG bit (Ken)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12091>
depending on the runner's load, we might see timeouts. The
subgroupbroadcast one has hit us a couple times this week.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13346>
Apparently, we've been requiring a 4K alignment for internally allocated
clear color addresses to work around some unknown issues. There's a
comment to that effect in iris_resource_create_with_modifiers().
When importing a dmabuf and tacking on an additional clear color BO, we
only required an alignment of 1. This wasn't a problem for a long time
because all BO allocations were naturally aligned to the 4K page size.
However, once we enabled suballocation, we were able to allocate "BOs"
at 256B granularity, making this no longer 4K aligned. Increase the
alignment requirement to 4K to match the behavior of our normal
allocations and also our previous behavior.
Fixes Piglit's ext_image_dma_buf_import-intel-modifiers.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5482
Fixes: ce2e2296ab ("iris: Suballocate BO using the Gallium pb_slab mechanism")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13326>
There's no danger of accidentally using these, the default pixel format
is integer and if you want float you need to have explicitly asked for
it in eglChooseConfig.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13182>
This bit seems like the control for line mode of rastrization.
That can be simply figured out by comparing
dEQP-VK.rasterization.primitives.no_stipple.bresenham_lines,
dEQP-VK.rasterization.primitives.no_stipple.rectangular_lines and
dEQP-VK.rasterization.primitives.no_stipple.lines.
For opengl, the value of bresenham lines mode, which is 0, is set
by default and the value of rectangular mode, which is 0x1, is set
when multi-sampled.
For vulkan, the bresenham lines are enabled when lineRasterizationMode is
VK_LINE_RASTERIZATION_MODE_BRESENHAM_EXT, which sets the bit to 0, while
the value is 1 when it's VK_LINE_RASTERIZATION_MODE_RECTANGULAR_EXT,
that seems to be default.
If both multi-sampled and bresenham-lines are used when primitive type is
line, the bit is to be set as 0 and makes msaa disabled.
Note that this is only tested on a6xx, but I guess it's likely the same
for a5xx.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6020>
This is to fix serious performance drop of texture_upload/
texture_resue relative items in chromeos glbench test.
Staging texture is not efficient for CPU uploading.
Signed-off-by: Jasber Chen <yipeng.chen@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13306>
This patch is to to remove PKT3_CONTEXT_REG_RMW from radeonsi.
and avoid multiple command buffer(PM4 packet)creation for R_02881C_PA_CL_VS_OUT_CNTL.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12789>
r300_vs.c:252:37: runtime error: left shift of negative value -1
r300_state.c:1824:66: runtime error: left shift of 63112 by 16 places
cannot be represented in type 'int'
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13308>
It eliminates "False Sharing" for atomic operations. (see wikipedia)
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11618>
This adds 60 bytes to both structures. It eliminates "False Sharing"
for atomic operations (see wikipedia).
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11618>
Instead of doing a tile cache flush after slow clears or when the clear
value changes, do it before every fast clear of a HIZ_CCS_WT surface.
This agrees with the Bspec.
Fixes: c85ea824bc ("iris: reduce redundant tile cache flushes")
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11539>
DCN only supports an extent < 4K on !64B && 128B.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13056>
It regresses the first snx test because it adds CPU overhead, and there is
no way to work around it. The average effect on viewperf is 0, meaning that
a few cases improve, while a few others regress.
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13279>