Alignment units, i and j, match the compressed format block
width and height respectively.
v2: Don't assert against HALIGN* and VALIGN* enums (Chad)
Reviewed-by: Chad Versace <chad.versace@intel.com>
A non-compressed texture is a 1x1x1 block. Compressed
textures could have values which vary in different
dimensions WxHxD.
Reviewed-by: Chad Versace <chad.versace@intel.com>
cpp (chars-per-pixel) is an integer that fails to give useful data
about most compressed formats. Instead, rename it to "bs" which
stands for block size (in bytes).
v2: Rename vk_format_for_bs to vk_format_for_size (Chad)
Use "block size" instead of "bs" in error message (Chad)
Reviewed-by: Chad Versace <chad.versace@intel.com>
The new mechanism should be able to handle SSBOs as well as properly handle
emitting surface state on gen7 where we need different strides depending on
shader stage.
We never *actually* supported them, we just used them for binding UBOs.
Now that we have BufferInfo and we aren't supporting texture buffers yet,
we should get rid of them until we can do them properly.
Tested by Crucible "func.depthstencil.stencil_triangles.*" in
commit c194292d5eadb84e9d7489fc01ce0b653cdd4ca5 (HEAD -> master)
Author: Chad Versace <chad.versace@intel.com>
Date: Wed Nov 4 16:19:24 2015 -0800
Subject: func.depthstencil: Remove stencil clear workaround for Mesa
The anv_state is supposed to be a flyweight so we're not really saving
anything by using a pointer. Also, we were creating one, setting a pointer
to it, and then having it go out-of-scope which is bad.
Remove members
num_color_clear_attachments
has_depth_clear_attachment
has_stencil_clear_attachment
The new clear code in anv_meta_clear.c does not use them.
Fixes Crucible test "func.clear.load-clear.attachments-8".
The old clear code, when clearing attachments for
VK_ATTACHMENT_LOAD_OP_CLEAR, suffered from some fundamental bugs. The
bugs were not fixable with the old code's approach.
- It assumed that a VkRenderPass contained at most one depthstencil
attachment.
- It tried to clear all attachments (color and the sole
depthstencil) with a single instanced draw call, using the VUE
header's RenderTargetArrayIndex to specify the instance's target
color attachment. But the RenderTargetArrayIndex does not select
entries in the binding table; it only selects an array index of
a singled layered surface.
- If at least one attachment of VkRenderPass had
VK_ATTACHMENT_LOAD_OP_CLEAR,
then the old code cleared *all* attachments. This was
a consequence of using a single draw call and single pipeline for
the clear.
The new clear code fixes those bugs by making a separate draw call for
each attachment, and using one pipeline when clearing color attachments
and a different pipeline for depth attachments.
The new code, like the old code, does not clear stencil attachments. It
is left as a FINISHME.
Consistently rename bitmasks of Vulkan dynamic state to 'dynamic_mask'.
anv_meta_saved_state::dynamic_flags -> dynamic_mask
anv_meta_save(dynamic_state) -> dynamic_mask
As the functions are now exposed in anv_meta.h, let's rename them
to clarify that they are meta functions.
anv_cmd_buffer_save -> anv_meta_save
anv_cmd_buffer_restore -> anv_meta_restore
On future generation platforms the color clear value is stored elsewhere in the
surface state. By extracting this logic, we can cleanly implement the difference
in an upcoming patch.
Should have no functional impact.
v2: Move hunk from the next patch into this patch (Matt)
Whitespace fix (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
The allocate_surface_state already zeroes out the surface state, and doing it
later in the function is destructive for what we want to accomplish when we
split out support for gen9 fast clears (next patch).
NOTE: Only dword 12 actually needed to be fixed, but it seemed more consistent
to remove the other instances as well. I can make an argument both ways (open
coding it, vs. not). I can rework the next patch if requires.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
When emitting the binding table for the fragment shader stage, we no
longer "walk all of the attachments, [inserting only] the color
attachments into the binding table". Instead, we iterate only over the
subpass's color attachments, which is the minimal possible iteration.
While killing the comment, also rename the variable 'attachments' to
'color_count', as it's no longer a count of all framebuffer attachments
but only the subpass's color attachment count.
This fixes crashes with some piglit OpenCL tests.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
To get the size (in bytes) of a compute parameter, clover first calls
get_compute_param() with a NULL data pointer. The RET() macro is based
on nv50.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
A few new PCI ids are added here, and one is removed (0x190B) because it no
longer seems to exist anywhere.
v2-4:
Only use ascii characters (Ilia)
0x1921 is no longer marked as f
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Like other gen8+ hardware, the hardware automatically scales up thread counts.
We must be careful about the URB sizes since GT4 adds another slice.
One of the existing PCI IDs is actually mislabeled as GT3. Arguably this is a
real bug since the URB size will be wrong. Because this patch is simply meant to
add the missing IDs, that will be fixed in a later patch.
v2: No longer relevant.
v3: Update the wm thread count to support GT4. The WM thread count is used to
determine the maximum scratch space required. Currently the code always
allocates the maximum amount even though lower GT SKUs require less. The formula
is threads_per_psd * subslices_per_slice * slices
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Note: The OpenGL 4.3 - 4.5 specification language for DispatchCompute
appears to have an error regarding the max allowed values. When adding
the specification citation, we note why the code does not match the
specification language.
v2:
* Updates based on review from Iago
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
There is some discrepancy between the return values for some error
cases for the DispatchComputeIndirect call in the ARB_compute_shader
specification. Regarding the indirect parameter, in one place the
extension spec lists that the error returned for invalid values should
be INVALID_OPERATION, while later it specifies INVALID_VALUE.
The OpenGL 4.3 and OpenGLES 3.1 specifications appear to be consistent
in requiring the INVALID_VALUE error return in this case.
Here we update the code to match the main specifications, and update
the citations use the main specification rather than the extension
specification.
v2:
* Updates based on review from Iago
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Right now, Broadweel and Ivy Bridge are the only supported platforms.
Hopefully, this reduces the chances that someone will try the driver on
unsupported hardware and be confused that it doesn't work.
We've made a mistake in calling the Channel Enable bits "writemask",
because they do more than control which channels of the destination are
written -- they actually control which channels are enabled (surprise!
surprise!)
So, if we emit
cmp.z.f0(8) null.xy<1>D g10<4,4,1>.xyzzD g2<0,4,1>.xyzzD
mov(8) g12<1>.xUD 0x00000000UD
(+f0.all4h) mov(8) g12<1>.xUD 0xffffffffUD
where the CMP instruction has only .xy channel enables, it won't write
the .zw channels of the flag register, which are of course read by the
+f0.all4 predicate.
We need to always emit CMP instructions whose flag result might be read
by such a predicate with all channels enabled.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>