brw_finish_batch emits commands needed at the end of every batch buffer,
including any workarounds. In the past, we freed up some "reserved"
batch space before calling it, so we would never have to flush during
it. This was error prone and easy to screw up, so I deleted it a while
back in favor of growing the batch.
There were two problems:
1. We're in the middle of flushing, so brw->no_batch_wrap is guaranteed
not to be set. Using BEGIN_BATCH() to emit commands would cause a
recursive flush rather than growing the buffer as intended.
2. We already recorded the throttling batch before growing, which
replaces brw->batch.bo with a different (larger) buffer. So growing
would break throttling.
These are easily remedied by shuffling some code around and whacking
brw->no_batch_wrap in brw_finish_batch(). This also now includes the
final workarounds in the batch usage statistics. Found by inspection.
Fixes: 2c46a67b41 (i965: Delete BATCH_RESERVED handling.)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested with AMD's Anvil OutOfOrderRasterization demo on a RX 560.
Signed-off-by: Nicholas Miell <nmiell@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes quite a few 'texwrap [12]d border color only' tests on NV20
(10de:0201). All told, 40 more tests pass.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian RomanicK <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Ian RomanicK <ian.d.romanick@intel.com>
v2: Force T and R wrap modes to GL_CLAMP_TO_EDGE for 1D textures.
This fixes a regression in tex1d-2dborder. The test uses a 1D texture
but it provides S and T texture coordinates. Since the T wrap mode
would (correctly) be set to GL_CLAMP, the texture would gradually
blend (incorrectly) with the border color.
I also tried setting NV20_3D_TEX_FORMAT_DIMS_1D instead of
NV20_3D_TEX_FORMAT_DIMS_2D for 1D textures, but that did not help.
It is possible that the same problem exists for 2D textures with the
R-wrap mode, but I don't think there are any piglit tests for that.
No test changes on NV20 (10de:0201).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Denotes availability of 64bit int atomic instructions
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There's no reason to use va_copy here.
CID: 1418113
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Fixes: e7fc664b91 ("winsys/amdgpu: add addrlib - texture
addressing and alignment calculator")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The number of viewports/scissors can only be specified at pipeline
creation time, so make sure to copy them when binding a new one
because the dynamic state is cleared in BeginCommandBuffer().
Fixes: dcf46e995d ("radv: do not update the number of scissors in vkCmdSetScissor()")
Fixes: 60878dd00c ("radv: do not update the number of viewports in vkCmdSetViewport()")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This is already done for other programs stages, fixes a leak when using
compute programs.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102844
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Having this separate just makes the code harder to follow, and
requires an extra walk of the IR.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
The Broadwell method of handling uncompressed views of compressed
textures was to make the texture linear and have a tiled shadow copy.
This isn't needed on Sky Lake because the HALIGN and VALIGN parameters
are specified in surface elements and required to be a multiple of 4.
This means that we can just use the X/Y Offset fields and we can avoid
the shadow copy song and dance. This also makes ASTC work because ASTC
can't be linear and so the shadow copy method doesn't work there.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In order to get support everywhere, this gets a bit complicated. On Sky
Lake and later, everything is fine because HALIGN/VALIGN are specified
in surface elements and are required to be at least 4 so any offsetting
we may need to do falls neatly within the heavy restrictions placed on
the X/Y Offset parameter of RENDER_SURFACE_STATE. On Broadwell and
earlier, HALIGN/VALIGN are specified in pixels and are hard-coded to
align to exactly the block size of the compressed texture. This means
that, when reinterpreted as a non-compressed texture, the tile offsets
may be anything and we can't rely on X/Y Offset.
In order to work around this issue, we fall back to linear where we can
trivially offset to whatever element we so choose. However, since
linear texturing performance is terrible, we create a tiled shadow copy
of the image to use for texturing. Whenever the user does a layout
transition from anything to SHADER_READ_ONLY_OPTIMAL, we use blorp to
copy the contents of the texture from the linear copy to the tiled
shadow copy. This assumes that the client will use the image far more
for texturing than as a storage image or render target.
Even though we don't need the shadow copy on Sky Lake, we implement it
this way first to make testing easier. Due to the hardware restriction
that ASTC must not be linear, ASTC does not work yet.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This struct represents a full surface state including the addresses of
the referenced main and auxiliary surfaces (if any). This makes
relocation setup substantially simpler and allows us to move 100% of the
surface state setup logic into anv_image where it belongs. Before, we
were manually fishing data out of surface states when emitting
relocations so we knew how to offset aux address. It's best to keep all
of the surface state emit logic together. This also gets us closer, at
least cosmetically, to a world of no relocations where addresses are
placed in surface states up-front.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This gives us a single centralized place where we take an image view and
use it to fill out a surface state.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It's not SPIR-V that's backwards from GLSL, it's Vulkan that's backwards
from GL. Let's make NIR consistent with the source language and do the
flipping inside the Vulkan driver instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: wait in map_buffer and map_image as well
v3: use event::wait instead of wait (skips fence wait for hard_event)
v4: use wait_signalled()
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
And define a method for other threads to wait until the action
function associated with an event has been executed to completion.
For hard events, this will mean waiting until the corresponding
command has been submitted to the pipe driver, without necessarily
flushing the pipe_context and waiting for the actual command to be
processed by the GPU (which is what hard_event::wait() already does).
This weaker kind of event wait will allow implementing blocking memory
transfers efficiently.
Acked-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
We are really not going to use a winsys which does not need to store
the va, so might as well store it in a standard field.
Not sure this helps perf much though, as most of the cost is in the
cache miss accessing the bo anyway, which we stil need to do.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Since most games use only a few, iterating through all of them is
a waste. Simplifies the code too.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Nothing too exciting, just adding the possibility for a pNext pointer,
and batch binding. Our binding is pretty much trivial.
It also adds VK_IMAGE_CREATE_ALIAS_BIT_KHR, but since we store no
state in radv_image, I don't think we have to do anything there.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This uses all the existing code to calculate lod values for mip linear
filtering. Though we'll have to disable the simplifications (if we know some
parts of the lod calculation won't actually matter for filtering purposes due
to mip clamps etc.). For better or worse, we'll also disable lod calculation
hacks (mostly should make a difference for cube maps) always - the issue with
per-pixel lod being difficult is mostly because we then have different mipmaps
needed for the actual texel fetch, which isn't a problem with lodq.
We still use approximation for the log2 - for that reason I believe the float
part of the lod is only accurate to about 4-5 bits (and one bit less with 1d
textures actually) which is hopefully good enough (though d3d10 technically
requires 6 bits - could use quadratic interpolation instead of linear to get
8 bits or so).
Since lodq requires unclamped lod, we also have to move some sampler key
calculations to texture sampling code - even if we know we're going to access
mipmap 0 we still have to calculate lod and apply lod_bias for lodq.
Passes piglit ARB_texture_query_lod tests (after having fixed the test).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Some DRI image properties weren't properly duplicated in the
new image. Some properties are still missing, but I'm not
certain if there was a good reason to let them out in the first
place.
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This fixes a bug with nearest ("point") mip selection when the fractional
part of max_lod is in (0.5,1). In this case, the spec mandates that
we still select the mip level ceil(max_lod) in the clamping case. However,
MIP_POINT_PRECLAMP will clamp before the mip selection, which is wrong.
Supposedly this setting was originally copied from the closed Vulkan
driver, but as far as I can tell, closed Vulkan was actually changed back
recently :)
Fixes dEQP-GLES3.functional.texture.mipmap.2d.max_lod.{nearest,linear}_nearest
Fixes: f7420ef5b4 ("radeonsi: enable some sampler fields to match the closed driver")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Like for cube map (array) gather, we need to round to nearest on <= VI.
Fixes tests in dEQP-GLES3.functional.shaders.texture_functions.texture.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Prevent an overflow caused by too many output variables. To limit the
scope of the issue, write to the assigned array only for the non-ES
fragment shader path, which is the only place where it's needed.
Since the function will bail with an error when output variables with
overlapping components are found, (max # of FS outputs) * 4 is an upper
limit to the space we need.
Found by address sanitizer.
Fixes dEQP-GLES3.functional.attribute_location.bind_aliasing.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Also add new define ETNA_SW_QUERY_BASE.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
This change makes etna_get_driver_query_info(..) more generic
and puts the knowledge of supported queries directly besides
the implementation.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
The Vulkan spec (1.0.61) says:
"The number of scissors used by a pipeline is still specified
by the scissorCount member of VkPipelinescissorStateCreateInfo."
So, the number of scissors is defined at pipeline creation
time and shouldn't be updated when they are set dynamically.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The Vulkan spec (1.0.61) says:
"The number of viewports used by a pipeline is still specified
by the viewportCount member of VkPipelineViewportStateCreateInfo."
So, the number of viewports is defined at pipeline creation
time and shouldn't be updated when they are set dynamically.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>