Commit graph

92185 commits

Author SHA1 Message Date
Anuj Phogat
2c7e1165fa anv/gen7_pipeline: Use MSDISPMODE_PERSAMPLE for non-multisampled fbo
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 13:20:34 -07:00
Anuj Phogat
f75a93f610 anv/blorp: Handle zero width/height blits in blorp_copy()
V2: Move the check from copy_buffer_to_image() to blorp_copy(). (Nanley)

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-04 13:20:34 -07:00
Anuj Phogat
2c78b2ec90 intel/isl: Add an assert to check zero width/height surface
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 13:20:34 -07:00
Leo Liu
0e85ff3355 st/omx/dec/h265: add scaling list data
Specified by subclause 7.3.4

v2: get the loop optimized

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-10-04 11:09:59 -04:00
Leo Liu
ffb863fd2c st/omx/dec/h265: fix the skip for before and after list
For reference picture sets, there are cases that rps will not always
be used. Once detect the unused flag from encoded bitstream, we should
not add this rps to any list, otherwise pass the incorrect reference
and skip the correct rps.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-04 11:09:59 -04:00
Leo Liu
c50b68e6a8 st/omx/dec/h265: set the default reference picture set for reference
It will fix the corruption for frame, that only has one stort term ref
picture set, we set NULL rps for this case previously, causing taking
incorrect reference. Instead we should take that only short term set
as reference

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-04 11:09:59 -04:00
Leo Liu
091aae0265 st/omx/dec/h265: decoder size should follow from sps
The video size from format container is not always compatible with
the size from codec bitstream, the HW decoder should take the size
information from bitstream, otherwise the corruption appears with clip
that has different size info between bitstream and format container

So we are passing width(height)_in_samples from sequence parameter
set to video decoder.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-04 11:09:59 -04:00
Leo Liu
2371119db9 st/omx/dec/h265: increase dpb max size to 32
For clip with frame delta poc over 16

Signed-off-by: Leo Liu <leo.liu@amd.com>
2016-10-04 11:09:59 -04:00
Eric Engestrom
66f85c3824 nir/spirv: Remove a duplicate spirv2nir from .gitignore
This reverts commit fc03ecfeaf.

Chad had already pushed the same change between me posting the patch and Jason
pushing it: 44bcf1ffcc (".gitignore: Ignore src/compiler/spirv2nir")

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 07:43:15 -07:00
Nicolai Hähnle
8b1f9fd3b3 radeonsi: optionally run the LLVM IR verifier pass
This is enabled automatically if shader printing is enabled, or separately
by R600_DEBUG=checkir. Catch mal-formed IR before it crashes in a later
pass.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:33 +02:00
Nicolai Hähnle
1e9476e8c5 gallium/radeon: fix argument type of llvm.{cttz,ctlz}.i32 intrinsics
Caught by R600_DEBUG=checkir (next commit).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:28 +02:00
Nicolai Hähnle
1b6fb88ab2 gallium/radeon: unify the creation of basic blocks
This changes the order of basic blocks to be equal to the order of code in the
original TGSI, which is nice for making sense of shader dumps.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:25 +02:00
Nicolai Hähnle
d377f4c1ca gallium/radeon: merge branch and loop flow control stacks
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:21 +02:00
Nicolai Hähnle
b0d50e157d gallium/radeon: simplify if/else/endif blocks
In particular, we no longer emit an else block when there is no ELSE
instruction.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:18 +02:00
Nicolai Hähnle
89e9de2ea6 gallium/radeon: label basic blocks by the corresponding TGSI pc
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:15 +02:00
Nicolai Hähnle
6f87d7a146 gallium/radeon: cleanup and fix branch emits
Some of the existing code is needlessly complicated. The basic principle
should be: control-flow opcodes emit branches to properly terminate the
current block, _unless_ the current block already has a terminator (which
happens if and only if there was a BRK or CONT).

This also fixes a bug where multiple terminators were created in a block.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97887
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:10 +02:00
Nicolai Hähnle
dfc1afda83 winsys/radeon: add buffer_get_reloc_offset
Really fix the bug that was supposed to be fixed by commits 3e7cced4b and
a48bf02d: even when virtual addresses are used, the legacy relocation-based
method with offsets relative to the kernel's buffer object are used for
video submissions.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97969
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:37:44 +02:00
Marek Olšák
71a5cf6f3b radeonsi: don't declare LDS in PS when ds_bpermute is used
I guess this is not needed because dead code elimination removes
the declaration.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:16 +02:00
Marek Olšák
b2a694f079 radeonsi: use DDX/DDY directly in si_llvm_emit_ddxy_interp
We can finally do this, because the opcodes are scalar now.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:14 +02:00
Marek Olšák
b57aef8033 radeonsi: simplify si_llvm_emit_ddxy
si_llvm_emit_ddxy is called once per element, so we don't have to generate
code for 4 elements at once.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:12 +02:00
Marek Olšák
046c199c3a radeonsi: don't call build_gep0 in si_llvm_emit_ddxy on VI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:11 +02:00
Marek Olšák
bcc55e1f32 radeonsi: use a helper function for BuildGEP(0, x)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:10 +02:00
Marek Olšák
e20f7142a3 radeonsi: remove obsolete shader definitions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:09 +02:00
Marek Olšák
8c6ea5a6ff radeonsi: remove unnecessary #includes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:07 +02:00
Marek Olšák
3388f27d84 radeonsi: clean up lucky #include dependencies
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:06 +02:00
Marek Olšák
53d2c8f00f radeonsi: don't re-create shader PM4 states after scratch buffer update
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:05 +02:00
Marek Olšák
6c01684393 gallium/radeon: move r600_common_context::texture_buffers to r600g
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:03 +02:00
Marek Olšák
7ce19d9014 radeonsi: don't set sampler buffer offsets in create_sampler_view
do it at bind time, so that pipe_sampler_view is immutable with regard to
buffer reallocations and we don't have to remember all existing buffer
views.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:01 +02:00
Marek Olšák
7e6428e0a8 radeonsi: optimize si_invalidate_buffer based on bind_history
Just enclose each section with: if (rbuffer->bind_history & PIPE_BIND_...)

Bioshock Infinite: +1% performance

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:00 +02:00
Marek Olšák
e43bd861e8 radeonsi: track buffer bind history
similar to gl_buffer_object::UsageHistory

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:58 +02:00
Marek Olšák
b523a9ddc5 radeonsi: drop support for NULL sampler views
not used anymore. It was used when the polygon stipple texture was constant.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:57 +02:00
Marek Olšák
82e51e8188 radeonsi: separate IA_MULTI_VGT_PARAM and VGT_PRIMITIVE_TYPE emission
We want to emit IA_MULTI_VGT_PARAM less often because it's a context reg.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:56 +02:00
Marek Olšák
3ee9be42ac radeonsi: move VGT_LS_HS_CONFIG to derived tess_state
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:53 +02:00
Marek Olšák
f92113c5a1 radeonsi: don't check PIPE_BARRIER_MAPPED_BUFFER
Caches are always flushed at IB boundary.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:51 +02:00
Marek Olšák
ca1d1e0e19 radeonsi: parse SURFACE_SYNC correctly on CIK-VI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:49 +02:00
Marek Olšák
37065b0583 gallium/radeon: inline r600_context_add_resource_size
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:47 +02:00
James Legg
e33f31d61f radeonsi: Fix primitive restart when index changes
If primitive restart is enabled for two consecutive draws which use
different primitive restart indices, then the first draw's primitive
restart index was incorrectly used for the second draw.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98025

Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 15:57:37 +02:00
Timothy Arceri
338d3c0b0f spirv: replace assert() with unreachable()
This fixes an uninitialized warning for is_vertex_input.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 22:33:51 +11:00
Timothy Arceri
298c2e03d7 intel: use the correct format specifier for printing uint64_t
Fixes a bunch of warnings in 32-bit builds.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-10-04 22:32:57 +11:00
Matt Whitlock
42ed8a6c9c gallium/winsys: replace calls to dup(2) with fcntl(F_DUPFD_CLOEXEC)
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-04 11:09:03 +02:00
Matt Whitlock
ac6064f918 st/xa: replace call to dup(2) with fcntl(F_DUPFD_CLOEXEC)
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-04 11:09:01 +02:00
Matt Whitlock
0c060f691c st/dri: replace calls to dup(2) with fcntl(F_DUPFD_CLOEXEC)
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-04 11:08:58 +02:00
Matt Whitlock
5d0069eca2 gallium/auxiliary: replace call to dup(2) with fcntl(F_DUPFD_CLOEXEC)
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-04 11:08:55 +02:00
Matt Whitlock
c8fd7d060d egl/android: replace call to dup(2) with fcntl(F_DUPFD_CLOEXEC)
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-04 11:08:50 +02:00
Tapani Pälli
387e0af0b4 intel: fix compilation warning on gen_get_device_info
(warning: 'const' type qualifier on return type has no effect)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2016-10-04 07:38:45 +03:00
Kenneth Graunke
9d6ca7c3d0 i965: Only emit 1 viewport when possible.
In core profile, we support up to 16 viewports.  However, in the
majority of cases, only 1 of them is actually used - we only need
the others if the last shader stage prior to the rasterizer writes
gl_ViewportIndex.

Processing all 16 viewports adds additional CPU overhead, which hurts
CPU-intensive workloads such as Glamor.  This meant that switching to
core profile actually penalized Glamor to an extent, which is
unfortunate.

This patch tracks the number of relevant viewports, switching between
1 and ctx->Const.MaxViewports if gl_ViewportIndex is written.  A new
BRW_NEW_VIEWPORT_COUNT flag tracks this.  This could mean re-emitting
viewport state when switching, but hopefully this is offset by doing
1/16th of the work in the common case.  The new flag is also lighter
weight than BRW_NEW_VUE_MAP_GEOM_OUT, which we were using in one case.

According to Eric Anholt, x11perf -copypixwin10 performance improves by
11.5094% +/- 3.10841% (n=10) on his Skylake.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-03 18:41:10 -07:00
Dave Airlie
7eb7684818 spirv: translate cull distance semantic.
This just translates to the correct cull distance slot.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-04 10:16:23 +10:00
Dave Airlie
bd0157d542 compiler: add printable values for cull distance varyings.
We need these for spir-v/nir shaders.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-04 10:15:23 +10:00
Jason Ekstrand
6ffbfc760d nir/spirv/cfg: Use a nop intrinsic for tagging the ends of blocks
Previously, we were saving off the last nir_block in a vtn_block before
moving on so that we could find the nir_block again when it came time to
handle phi sources.  Unfortunately, NIR's control flow modification code is
inconsistent when it comes to how it splits blocks so the block pointer we
saved off may point to a block somewhere else in the shader by the time we
get around to handling phi sources.  In order to get around this, we insert
a nop instruction and use that as the logical end of our block.  Since the
control flow manipulation code respects instructions, the nop will keeps
its place like any other instruction and we can easily find the end of our
block when we need it.

This fixes a bug triggered by a couple of vkQuake shaders.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97233
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-03 16:17:12 -07:00
Jason Ekstrand
7697b4b98b nir: Add a nop intrinsic
This intrinsic has no destination, no sources, no variables, and can be
eliminated.  In other words, it does nothing and will always get deleted by
dead code elimination.  However, it does provide a quick-and-easy way to
temporarily tag a particular location in a NIR shader.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-10-03 16:17:12 -07:00