The vulkan module is the final HAL. No need to export its headers
since none will import it.
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
The vulkan module is the final HAL. No need to export its headers
since none will import it.
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
This commit re-plumbs all of nir_loop_analyze to use nir_ssa_scalar for
all intermediate values so that we can properly handle swizzles. Even
though if conditions are required to be scalars, they may still consume
swizzles so you could have ((a.yzw < b.zzx).xz && c.xx).y == 0 as your
loop termination condition. The old code would just bail the moment it
saw its first non-zero swizzle but we can now properly chase the scalar
from the if condition to all the way to a, b, and c.
Shader-db results on Kaby Lake:
total loops in shared programs: 4388 -> 4364 (-0.55%)
loops in affected programs: 29 -> 5 (-82.76%)
helped: 29
HURT: 5
Shader-db results on Haswell:
total loops in shared programs: 4370 -> 4373 (0.07%)
loops in affected programs: 2 -> 5 (150.00%)
helped: 2
HURT: 5
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This commit reworks both get_induction_and_limit_vars() and
try_find_trip_count_vars_in_iand to return true on success and not
modify their output parameters on failure. This makes their callers
significantly simpler.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
There are various cases in which we want to chase SSA values through ALU
ops ranging from hand-written optimizations to back-end translation
code. In all these cases, it can be very tricky to do properly because
of swizzles. This set of helpers lets you easily work with a single
component of an SSA def and chase through ALU ops safely.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
None of the current code knows what to do with swizzles. Take the safe
option for now and bail if we see one. This does have a small shader-db
impact but it is at least safe.
Shader-db results on Kaby Lake:
total loops in shared programs: 4364 -> 4388 (0.55%)
loops in affected programs: 5 -> 29 (480.00%)
helped: 5
HURT: 29
Shader-db results on Haswell:
total loops in shared programs: 4373 -> 4370 (-0.07%)
loops in affected programs: 5 -> 2 (-60.00%)
helped: 5
HURT: 2
Fixes: 6772a17acc "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The current code assumes everything is 32-bit which is very likely true
but not guaranteed by any means. Instead, use nir_eval_const_opcode to
do the calculations in a bit-size-agnostic way. We also use the new
constant constructors to build the correct size constants.
Fixes: 6772a17acc "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
One issue was that the original version didn't check that swizzles
matched when comparing ALU instructions so it could end up matching
very different instructions. Using the nir_instrs_equal function from
nir_instr_set.c which we use for CSE should be much more reliable.
Another was that the loop assumes it will only run two iterations which
may not be true. If there's something which guarantees that this case
only happens for phis after ifs, it wasn't documented.
Fixes: 9e6b39e1d5 "nir: detect more induction variables"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Now that we have the nir_const_value_as_* helpers, every one of these
functions is effectively the same except for the suffix they use so we
can easily define them with a repeated macro. This also means that
they're inline and the fact that the nir_src is being passed by-value
should no longer really hurt anything.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
COMPLETED_LIST is always empty. We only need one list.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Now that virgl_transfer_queue_is_queued does not search
COMPLETED_LIST, we don't need to move transfers to that list.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Search only the pending list and return immediately on the first
hit.
When the transfer queue was introduced, the function was used to
deal with
write transfer -> draw -> write transfer
sequence. It was used to tell if the second transfer intersects
with the first transfer. If yes, the transfer queue avoided
reordering the second transfer to before the draw (by flushing) in
case the draw uses the transferred data.
With the recent changes to the transfer code, the function is used
to deal with
write transfer -> readback transfer
We want to avoid reordering the readback transfer to before the
first transfer (also by flushing).
In the old code, we needed to track the compeleted transfers as well
to avoid reordering. But in the new code, a readback transfer is
guaranteed to see the data from the completed transfers (in other
words, it cannot be reoderered to before the already completed
transfers). We don't need to search the COMPLETED_LIST.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
We never use transfers_intersect with textures, but fix it anyway to
avoid confusion.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Rewrite the function and check z/depth more carefully. We
intentionally avoid u_box_test_intersection_2d because it returns
true when two boxes touch but do not intersect and can be confusing.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
It only works if there are not color and no Z exports.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
The highest used index determines the stride for shader outputs in shaders
that use LDS or memory for outputs.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
This can decrease LDS and/or memory usage for shader outputs when geometry
shaders or tessellation is used.
Only PS inputs support higher indices and those aren't eliminated by
kill_outputs.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
- don't pass it via a parameter if it can be derived from other parameters
- set shader_type for ac_rtld_open
- use enum pipe_shader_type instead of unsigned
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Useful for formats that would work with the same driver code path as
RGBA8 UNORM but that don't meet the util_format_is_rgba8_variant
criteria due to a smaller channel count.
v2: Use simpler logic (suggested by Iago).
v3: Fix spelling erorr. boolean->bool (thank you airlied).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This gives more flexibility than the normal store_deref/store_output
versions (particularly, it allows us to abuse the type system in awful
ways, which is necessary for efficient format conversion in blend
shaders.)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Fix si_vid_is_format_supported to expose support
for 10-bit VP9 decode using P016 format. Without
this change, 10-bit decode will be exposed only
for HEVC even though newer hardware support
10-bit decode for VP9.
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
We already have nir_imm_float16 and nir_imm_vec4; let's add the ability
to easily make immediate fp16 vectors as well, now that fp16 support is
maturing in NIR/GLSL.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>