An address format representing a purely logical addressing model. In
this model, all deref chains must be complete from the dereference
operation to the variable. Cast derefs are not allowed. These
addresses will be 32-bit scalars but the format is immaterial because
you can always chase the chain. E.g. push constants in anv.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This "fixes" hangs seen w/ various android games. I think a similar
issue to with constant state, we need to avoid CP_LOAD_STATE until
previous draw completes.
It isn't entirely clear why blob doesn't need to do this, but it might
have a different way to accomplish the same thing.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Since we use same constant state for both binning pass program state and
draw pass state, and it is possible for binning pass shader to use fewer
consts, we need to make sure we program a large enough constlen.
Signed-off-by: Rob Clark <robdclark@chromium.org>
This fixes some CTS failures related to VK_EXT_sample_locations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
we validate assert entry just before this, but since that doesn't
stop execution, we need to check entry before the next validation
assert.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Some vector sysval can't be lowered to scaler, so need to break
it to scaler in nir to gpir convertion.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
This commit moves the register allocator out of midgard_compile.c and
into its own midgard_ra.c file. In doing so, a number of dependencies
are identified and moved into their own files in turn. midgard_compile.c
is still fairly monolithic, but this should help.
Code churn, but no functional changes should be introduced by this
commit.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This fixes a few miscellaneous issues with the fixed-function blending
programming, though it is far from complete. For cases known to be
buggy, we force a fallback to blend shaders.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This implements blend shaders via nir_lower_blend, by creating dummy
fragment shaders simply passing through the source color and using the
new lowering pass to inject blendability.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
To prepare for the new nir_lower_blend pass, we wire up the intrinsics
for tilebuffer reads and constant colour loading.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This new lowering pass implements the OpenGL ES blend pipeline in
shaders, applicable to hardware lacking full-featured blending hardware
(including Midgard/Bifrost and vc4). This pass is run on a fragment
shader, rewriting the store to a blended version, loading in the
framebuffer destination color and constant color via intrinsics as
necessary. This pass is sufficient for OpenGL ES 2.0 and is verified to
pass dEQP's blend tests. MIN/MAX modes are included and tested as well.
That said, at present it has the following limitations:
- MRT is not supported (ES3).
- sRGB support is missing (ES3).
- Extended blending is not yet ported from GLSL IR lowering (ES3.2)
- Dual-source blending is not supported. (N/A)
- Logic ops are not supported. (N/A)
v2: Fix code conventions (per Ian Romanick's feedback). Implement color
masks.
This pass should be in common nir/ space, but due to non-technical
reasons, for now it's in Panfrost space. In the future, depending if
other drivers need some of the functionality, we can move this back to
src/compiler/nir space.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This adds a forgotten decode line on Midgard and adds the field of a
blend constant on Bifrost. The Bifrost encoding is fairly weird; whereas
Midgard is just a regular 32-bit float, Bifrost uses a fancy
fixed-point-esque encoding. The decode logic here is experimentally
correct. The encode logic is a sort of "guesstimate", assuming that the
high byte is just int(f / 255.0) and then solving algebraicly for the
low byte. This might be slightly off in some cases.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
This eliminates one major source of #ifdef parity between Midgard and
Bifrost, better representing how the struct acts on Midgard and allowing
proper decodes on Bifrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
We already have the Bifrost disassembler in-tree, so now that panwrap is
able to dump Bifrost command streams, hook up the disassembler to
pandecode.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
For IMMEDIATE and FIFO, most games work in a pipelined manner where the
can produce frames at a rate of 1/MAX(CPU duration, GPU duration), but
the render latency is CPU duration + GPU duration.
This means that with scanout from pageflipping we need 3 frames to run
full speed:
1) CPU rendering work
2) GPU rendering work
3) scanout
Once we have a nonblocking acquire that returns a semaphore we can merge
1 and 3. Hence the ideal implementation needs only 2 images, but games
cannot tellwe currently do not have an ideal implementation and that
hence they need to allocate 3 images. So let us do it for them.
This is a tradeoff as it uses more memory than needed for non-fullscreen
and non-performance intensive applications.
Since this is pretty much a TODO that can use the context I added this as
a comment.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
EGL annoyingly defines a few variants of this token:
EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY_EXT - 0x3138
EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY_KHR - 0x31BD
EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY - 0x31BD
The EGL_EXT_create_context_robustness extension specifies that the EXT
token is only valid for ES contexts, not GL. The EGL_KHR_create_context
extension defines the KHR version, and says it is only allowed for GL
contexts, and specifically calls out that it's an error for ES contexts.
But EGL 1.5 includes the new suffixless token, which has the same value
as the KHR version, and specifically calls out that it's now valid to
use with both GL and ES contexts. So we should allow this.
Fixes KHR-NoContext.es32.robustness.no_reset_notification and
KHR-NoContext.es32.robustness.lose_context_on_reset on iris, which
apparently is exposing EGL 1.5.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
From the Vulkan 1.1.107 spec:
Sample shading is enabled for a graphics pipeline:
- If the interface of the fragment shader entry point of the
graphics pipeline includes an input variable decorated with
SampleId or SamplePosition. In this case minSampleShadingFactor
takes the value 1.0.
- Else if the sampleShadingEnable member of the
VkPipelineMultisampleStateCreateInfo structure specified when
creating the graphics pipeline is set to VK_TRUE. In this case
minSampleShadingFactor takes the value of
VkPipelineMultisampleStateCreateInfo::minSampleShading.
Otherwise, sample shading is considered disabled.
In other words, if sampleShadingEnable is set to VK_FALSE, we should
ignore minSampleShading.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This was an unintended artifact of my testing of bindless images. We
should be choosing bindless or not dynamically.
Fixes: c0d9926df7 "anv: Use bindless handles for images"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We need to free memory allocation PrimitiveOffsets in draw_gs_destroy().
This fixes memory leak found while running piglit on windows.
Fixes: 7720ce32a ("draw: add support to tgsi paths for geometry streams. (v2)")
Tested with piglit
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Now that we have the descriptor buffer mechanism, emulated texture
swizzle can be implemented in a very non-invasive way. Previous
attempts all tried to extend the push constant based image param
mechanism which was gross. This could, in theory, be done much faster
with a magic back-end instruction which does indirect MOVs but Vulkan on
IVB is already so slow this isn't going to matter much.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104355
Cc: "19.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Rob Clark <robclark@freedesktop.org>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The load/store optimizer pass doesn't handle WaW hazards correctly
and this is the root cause of the reflection issue with Monster
Hunter World. AFAIK, it's the only game that are affected by this
issue.
This is fixed with LLVM r361008, but we need a workaround for older
LLVM versions unfortunately.
Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The vmwgfx driver supports emulated coherent surface memory as of version
2.16. Add en environtment variable to enable this functionality for
texture- and buffer maps: SVGA_FORCE_COHERENT.
This environment variable should be used for testing only.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
In order to be able to add access modes to a pb_validate_entry, update
the pb_validate_add_buffer function to take a pointer hash table and also
to return whether the buffer was already on the validate list.
Update the svga winsys accordingly.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The rendered-to flag indicates that the HW surface content is more recent
than the content of the mob. That's the case after a SurfaceDMA transfer
to the surface.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
SVGA_RELOC_INTERNAL indicates a transfer between surface and backing mob.
This means that if the GPU for example reads from the surface it writes
to the backing mob. But since the buffer mapping code allows for
simultaneous gpu- and cpu read access, a read from the surface to the mob
will not synchronize a subsequent map to the readback.
Fix this by inverting the mob access mode in a surface relocation with
SVGA_RELOC_INTERNAL set.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Instead unconditionally call SVGA3D_InvalidateGBSurface() since it's needed
also for Linux for dirty buffers and operation without SurfaceDMA.
For non-guest-backed operation, remove the surface cache surface invalidation
altogether.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This reverts commit 865b9ddae4.
The buffer always reports format PIPE_FORMAT_R8_UNORM so with this patch only
one component would be supported. The original issue is still relevant, but
the fix should be different.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
glsl_to_nir.cpp:276: uninit_member: Non-static class member "sig" is not initialized in this constructor nor in any functions that it calls.
Reported by coverity
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
imgui_draw.cpp:1781: error[shiftTooManyBitsSigned]: Shifting signed 32-bit value by 31 bits is undefined behaviour
Reported by coverity
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
link_uniforms.cpp:477: uninit_member: Non-static class member "shader_storage_blocks_write_access" is not initialized in this constructor nor in any functions that it calls.
Reported by coverity.
v2: fix 9->0 typo (Ilia)
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
src/compiler/glsl_types.cpp:577: uninit_member: Non-static class member "packed" is not initialized in this constructor nor in any functions that it calls.
from Coverity.
Fixes: 659f333b3a (glsl: add packed for struct types)
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Many of these are now patched; one of them we patch here. Regardless,
this is one less thing to worry about in the code, I suppose.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Don't attempt sampling with HiZ if the sampler lacks support for it. On
ICL, the HW docs state that sampling with HiZ is not supported and that
instances of AUX_HIZ in the RENDER_SURFACE_STATE object will be
interpreted as AUX_NONE.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
If the SSA def produced by this instruction is only in the block in
which it is defined and is not used by ifs or phis, then we don't have
a reason to convert it to a register in
nir_lower_ssa_defs_to_regs_block().
The special case for derefs is covered by the general case, so can be
removed: at this point all derefs in the block are
materialized (i.e. the whole deref chain is in the block) and derefs
are not used in phis.
v2: Fix wrong check for if_uses. If there's such an use, the def is
not "local_to_block". (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Normally gl_nir_lower_samplers_as_deref records info->textures_used
for us, but this pass runs after that, attempting to assign samplers
in the same order as st_atom_texture's external_samplers_used loop
so the stars align and we get the same locations.
Since we're adding textures late, we need to amend info->textures_used.
iris uses info->textures_used to set up texture bindings; this fixes
Piglit's ext_image_dma_buf_import-sample-{nv12,yuv420,yvu420} there.
Reviewed-by: Rob Clark <robdclark@gmail.com>
First, allow the case for negative powers of two. Then ensure that we
use the absolute value of the non-constant value to calculate the
quotient -- this was hinted in the code by the name 'uq'.
This fixes an issue when 'd' is positive and 'n' is negative. The
ishr will propagate the negative sign and we'll use nir_ineg() again,
incorrectly.
v2: First version used only ishr, but that isn't sufficient, since it
never can produce a zero as a result. (Jason)
Allow negative powers of two. (Caio)
Fixes: 74492ebad9 "nir: Add a pass for lowering integer division by constants"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
shader-db's report.py will use this to see when we've changed loop
unrolling behavior on a shader and skip including other stats like
instruction count from being considered for that shader, since they won't
be useful as a proxy for real world performance in that case.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eduardo Lima Mitev <elima@igalia.com>
This lets us reuse their report.py, at the expense of fd-report.py no
longer working.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eduardo Lima Mitev <elima@igalia.com>
It was more of a hindrance, as it pretended that we could compile in the
driver with a missing screen.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eduardo Lima Mitev <elima@igalia.com>