The prim discard compute shader bakes InstanceID into the output index buffer.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If a prim discard compute shader hasn't finished compilation, we don't want
to any shader.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The primitive discard compute shader will get the position output this way.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This commit adjusts the capabilities returned
by the SWR driver and the documentation to correctly
report the following extensions:
GL_ARB_texture_query_lod, GL_ARB_texture_cube_map_array,
GL_ARB_gpu_shader_fp64, GL_ARB_texture_gather,
GL_ARB_vertex_attrib_64bit.
Reviewed-by: Alok Hota <alok.hota@intel.com>
Since the using output optimization is only for back buffer case
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
For newcomers to gitlab, it is not evident that it is better to press
the "Resolve Discussion" button when you update your branch handling
feedback.
v2:
* Fix several grammar nits, reorder, use new corrected text (Connor
Abbot)
* Use "reviewers", instead of reviewer (Eric Engestrom)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
transform feedback draws get the number of vertices from the transform
feedback object. In draw, we'll figure this out with the number of bytes
written divided by the stride. However, it is apparently possible we end
up with a stride of 0 there (not entirely sure it could happen with GL).
Probably when nothing was actually ever written (so we don't actually
have a stride set). Just avoid the division by zero by setting the count
to 0.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Use fstat() only to pre-allocate a big enough buffer.
This fixes a race where if the file grows between fstat() and read()
we would be missing the end of the file, and if the file slims down
read() would just fail.
Fixes: 316964709e "util: add os_read_file() helper"
Reported-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This pass moves instructions around and adds control-flow in the
middle of blocks. We need to use nir_foreach_instr_safe to ensure that
we iterate over instructions correctly anyway.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3bd5457641 ("nir: Add a lowering pass for non-uniform resource access")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For a block with a contiguous chunk of 32 vars that don't need updating,
this lets us skip 32 vars at a time. Also, by using bitscan, we only
iterate for each set bit rather than testing them all one at a time.
Looking at perf (with -O0 which is unfortunately necessary to get
reasonable back-traces), this seems to cuts about 50-60% of the time
spent in compute_start_end() which is, itself about 4-6% of the
run-time. In the real world, with a release driver build, this cuts
1.34% off a full shader-db run. (I ran shader-db 5 times in each
configuration).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Otherwise, we get an effectively random spill reg because we no longer
have the information from RA to guide us. Also, a completely clean
graph has undefined data in in_stack which is used for choosing the
spill reg so it really is non-deterministic.
Fixes: e99081e76d "intel/fs/ra: Spill without destroying the..."
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This line is no longer relevant now that booleans are 1-bit, and in fact
causes issues (infinite progress loop between algebraic optimizations
and copy prop) with constant vector masks.
No shader-db changes on Intel platforms (Jason).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This commit adds a bunch of new load/store opcodes, largely related to
OpenCL, as well as adjusting the name of existing opcodes to be more
uniform. The immediate effect is compute shaders are substantially
easier to interpret now.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Midgard ALU features two types of constants: embedded constants (128-bit
chunk, zero/one per schedule bundle) and inline constants (16-bit
splattered into the op, second source if present). Inline constants are
much more efficient from a space and scheduling freedom standpoint, so
it's desirable to inline when possible. Now that integer ops are well
understood and in use, we enable inlining of integers constants in
addition to floats (which have been inlined since forever).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
By default, the "normal" output modifier is set on ALU ops. This is the
correct default for float outputs -- for floats, it preserves the semantic
value. Unfortunately, when used with integers, it does not preserve the
bitstream encoding, causing misbehaviour. (It's an open question what
happens when `normal` is used with integers -- does it apply some other
transformation? or does it do floating point normalization/etc on the
ints as if they were floats?).
Instead, we default to the "clamp to integer" output modifier for
ops writing integers. Semantically, this makes sense (clamping an
integer to the nearest integer is the identity function). In the
hardware with an integer opcode, this is the actual "normal".
This fixes numerous sporadic and sometimes bizarre bugs relating to
integers, especially integer moves. With this in place, we no longer
care about the types involved; it's just bits on the wire again.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
From Gallium (and our) perspective, the stride of a BO is arbitrary. For
internal buffers, we can make it something nice, but for imported linear
buffers (e.g. EGL clients), we don't always have that luxury. To cope,
we calculate the expected stride of a texture, compare it to the BO's
actual reported stride, and if they differ, set the latter as a custom
stride.
Fixes rendering of windows not on tile boundaries (noticeable in Weston
with es2gears_wayland, for instance). Also, this should fix stride
issues with bufer reloading.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
With a special flag, texture descriptors can include custom stride(s).
We haven't seen a case of this used for mipmaps/cubemaps, so it's not
clear how that will be encoded, but this dumps correctly for single
one-level 2D textures.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
One field was not dumped for some reason. It's observed to be 0, but
it's still good to have it available.
Also, extra fields might be snuck in the bitmaps array (it's
variable-lengthed at the end), and we want to guard against that
possibility, so we dump a little more.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Dave Airlie <airlied@redhat.com>
We already use GFX9 and I don't want us to have confusing naming
in the driver. GFXn naming is better from the driver perspective,
because it's the real version of the gfx portion of the hw. Also,
CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI.
It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have
nothing to do with GFXn and they have their own version numbers.
Handle PIPE_TRANSFER_DONT_BLOCK and PIPE_TRANSFER_MAP_DIRECTLY.
Make virgl_resource_transfer_prepare return an enum instead of a
bool for extensibility (e.g., instruct the callers to map
differently).
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
virgl_resource_transfer_prepare should be called before mapping to
prepare the resource. It does flush, readback, and wait as needed.
virgl_res_needs_flush and virgl_res_needs_readback become internal
helpers to the new function.
There should be no externally visible change.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>