This implements blend shaders via nir_lower_blend, by creating dummy
fragment shaders simply passing through the source color and using the
new lowering pass to inject blendability.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
To prepare for the new nir_lower_blend pass, we wire up the intrinsics
for tilebuffer reads and constant colour loading.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This new lowering pass implements the OpenGL ES blend pipeline in
shaders, applicable to hardware lacking full-featured blending hardware
(including Midgard/Bifrost and vc4). This pass is run on a fragment
shader, rewriting the store to a blended version, loading in the
framebuffer destination color and constant color via intrinsics as
necessary. This pass is sufficient for OpenGL ES 2.0 and is verified to
pass dEQP's blend tests. MIN/MAX modes are included and tested as well.
That said, at present it has the following limitations:
- MRT is not supported (ES3).
- sRGB support is missing (ES3).
- Extended blending is not yet ported from GLSL IR lowering (ES3.2)
- Dual-source blending is not supported. (N/A)
- Logic ops are not supported. (N/A)
v2: Fix code conventions (per Ian Romanick's feedback). Implement color
masks.
This pass should be in common nir/ space, but due to non-technical
reasons, for now it's in Panfrost space. In the future, depending if
other drivers need some of the functionality, we can move this back to
src/compiler/nir space.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This adds a forgotten decode line on Midgard and adds the field of a
blend constant on Bifrost. The Bifrost encoding is fairly weird; whereas
Midgard is just a regular 32-bit float, Bifrost uses a fancy
fixed-point-esque encoding. The decode logic here is experimentally
correct. The encode logic is a sort of "guesstimate", assuming that the
high byte is just int(f / 255.0) and then solving algebraicly for the
low byte. This might be slightly off in some cases.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
This eliminates one major source of #ifdef parity between Midgard and
Bifrost, better representing how the struct acts on Midgard and allowing
proper decodes on Bifrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
We already have the Bifrost disassembler in-tree, so now that panwrap is
able to dump Bifrost command streams, hook up the disassembler to
pandecode.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
We need to free memory allocation PrimitiveOffsets in draw_gs_destroy().
This fixes memory leak found while running piglit on windows.
Fixes: 7720ce32a ("draw: add support to tgsi paths for geometry streams. (v2)")
Tested with piglit
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The vmwgfx driver supports emulated coherent surface memory as of version
2.16. Add en environtment variable to enable this functionality for
texture- and buffer maps: SVGA_FORCE_COHERENT.
This environment variable should be used for testing only.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
In order to be able to add access modes to a pb_validate_entry, update
the pb_validate_add_buffer function to take a pointer hash table and also
to return whether the buffer was already on the validate list.
Update the svga winsys accordingly.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The rendered-to flag indicates that the HW surface content is more recent
than the content of the mob. That's the case after a SurfaceDMA transfer
to the surface.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
SVGA_RELOC_INTERNAL indicates a transfer between surface and backing mob.
This means that if the GPU for example reads from the surface it writes
to the backing mob. But since the buffer mapping code allows for
simultaneous gpu- and cpu read access, a read from the surface to the mob
will not synchronize a subsequent map to the readback.
Fix this by inverting the mob access mode in a surface relocation with
SVGA_RELOC_INTERNAL set.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Instead unconditionally call SVGA3D_InvalidateGBSurface() since it's needed
also for Linux for dirty buffers and operation without SurfaceDMA.
For non-guest-backed operation, remove the surface cache surface invalidation
altogether.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This reverts commit 865b9ddae4.
The buffer always reports format PIPE_FORMAT_R8_UNORM so with this patch only
one component would be supported. The original issue is still relevant, but
the fix should be different.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Many of these are now patched; one of them we patch here. Regardless,
this is one less thing to worry about in the code, I suppose.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
shader-db's report.py will use this to see when we've changed loop
unrolling behavior on a shader and skip including other stats like
instruction count from being considered for that shader, since they won't
be useful as a proxy for real world performance in that case.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eduardo Lima Mitev <elima@igalia.com>
This lets us reuse their report.py, at the expense of fd-report.py no
longer working.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eduardo Lima Mitev <elima@igalia.com>
It was more of a hindrance, as it pretended that we could compile in the
driver with a missing screen.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eduardo Lima Mitev <elima@igalia.com>
The TTN path needs access to the screen to make the right decisions about
lowering, but we didn't have pctx->screen set up at fdN_prog_init time.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eduardo Lima Mitev <elima@igalia.com>
The prim discard compute shader bakes InstanceID into the output index buffer.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If a prim discard compute shader hasn't finished compilation, we don't want
to any shader.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The primitive discard compute shader will get the position output this way.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This commit adjusts the capabilities returned
by the SWR driver and the documentation to correctly
report the following extensions:
GL_ARB_texture_query_lod, GL_ARB_texture_cube_map_array,
GL_ARB_gpu_shader_fp64, GL_ARB_texture_gather,
GL_ARB_vertex_attrib_64bit.
Reviewed-by: Alok Hota <alok.hota@intel.com>