Commit graph

85275 commits

Author SHA1 Message Date
Nicolai Hähnle
1e9476e8c5 gallium/radeon: fix argument type of llvm.{cttz,ctlz}.i32 intrinsics
Caught by R600_DEBUG=checkir (next commit).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:28 +02:00
Nicolai Hähnle
1b6fb88ab2 gallium/radeon: unify the creation of basic blocks
This changes the order of basic blocks to be equal to the order of code in the
original TGSI, which is nice for making sense of shader dumps.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:25 +02:00
Nicolai Hähnle
d377f4c1ca gallium/radeon: merge branch and loop flow control stacks
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:21 +02:00
Nicolai Hähnle
b0d50e157d gallium/radeon: simplify if/else/endif blocks
In particular, we no longer emit an else block when there is no ELSE
instruction.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:18 +02:00
Nicolai Hähnle
89e9de2ea6 gallium/radeon: label basic blocks by the corresponding TGSI pc
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:15 +02:00
Nicolai Hähnle
6f87d7a146 gallium/radeon: cleanup and fix branch emits
Some of the existing code is needlessly complicated. The basic principle
should be: control-flow opcodes emit branches to properly terminate the
current block, _unless_ the current block already has a terminator (which
happens if and only if there was a BRK or CONT).

This also fixes a bug where multiple terminators were created in a block.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97887
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:39:10 +02:00
Nicolai Hähnle
dfc1afda83 winsys/radeon: add buffer_get_reloc_offset
Really fix the bug that was supposed to be fixed by commits 3e7cced4b and
a48bf02d: even when virtual addresses are used, the legacy relocation-based
method with offsets relative to the kernel's buffer object are used for
video submissions.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97969
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 16:37:44 +02:00
Marek Olšák
71a5cf6f3b radeonsi: don't declare LDS in PS when ds_bpermute is used
I guess this is not needed because dead code elimination removes
the declaration.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:16 +02:00
Marek Olšák
b2a694f079 radeonsi: use DDX/DDY directly in si_llvm_emit_ddxy_interp
We can finally do this, because the opcodes are scalar now.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:14 +02:00
Marek Olšák
b57aef8033 radeonsi: simplify si_llvm_emit_ddxy
si_llvm_emit_ddxy is called once per element, so we don't have to generate
code for 4 elements at once.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:12 +02:00
Marek Olšák
046c199c3a radeonsi: don't call build_gep0 in si_llvm_emit_ddxy on VI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:11 +02:00
Marek Olšák
bcc55e1f32 radeonsi: use a helper function for BuildGEP(0, x)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:10 +02:00
Marek Olšák
e20f7142a3 radeonsi: remove obsolete shader definitions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:09 +02:00
Marek Olšák
8c6ea5a6ff radeonsi: remove unnecessary #includes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:07 +02:00
Marek Olšák
3388f27d84 radeonsi: clean up lucky #include dependencies
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:06 +02:00
Marek Olšák
53d2c8f00f radeonsi: don't re-create shader PM4 states after scratch buffer update
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:05 +02:00
Marek Olšák
6c01684393 gallium/radeon: move r600_common_context::texture_buffers to r600g
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:03 +02:00
Marek Olšák
7ce19d9014 radeonsi: don't set sampler buffer offsets in create_sampler_view
do it at bind time, so that pipe_sampler_view is immutable with regard to
buffer reallocations and we don't have to remember all existing buffer
views.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:01 +02:00
Marek Olšák
7e6428e0a8 radeonsi: optimize si_invalidate_buffer based on bind_history
Just enclose each section with: if (rbuffer->bind_history & PIPE_BIND_...)

Bioshock Infinite: +1% performance

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:12:00 +02:00
Marek Olšák
e43bd861e8 radeonsi: track buffer bind history
similar to gl_buffer_object::UsageHistory

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:58 +02:00
Marek Olšák
b523a9ddc5 radeonsi: drop support for NULL sampler views
not used anymore. It was used when the polygon stipple texture was constant.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:57 +02:00
Marek Olšák
82e51e8188 radeonsi: separate IA_MULTI_VGT_PARAM and VGT_PRIMITIVE_TYPE emission
We want to emit IA_MULTI_VGT_PARAM less often because it's a context reg.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:56 +02:00
Marek Olšák
3ee9be42ac radeonsi: move VGT_LS_HS_CONFIG to derived tess_state
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:53 +02:00
Marek Olšák
f92113c5a1 radeonsi: don't check PIPE_BARRIER_MAPPED_BUFFER
Caches are always flushed at IB boundary.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:51 +02:00
Marek Olšák
ca1d1e0e19 radeonsi: parse SURFACE_SYNC correctly on CIK-VI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:49 +02:00
Marek Olšák
37065b0583 gallium/radeon: inline r600_context_add_resource_size
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-04 16:11:47 +02:00
James Legg
e33f31d61f radeonsi: Fix primitive restart when index changes
If primitive restart is enabled for two consecutive draws which use
different primitive restart indices, then the first draw's primitive
restart index was incorrectly used for the second draw.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98025

Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-10-04 15:57:37 +02:00
Timothy Arceri
338d3c0b0f spirv: replace assert() with unreachable()
This fixes an uninitialized warning for is_vertex_input.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-04 22:33:51 +11:00
Timothy Arceri
298c2e03d7 intel: use the correct format specifier for printing uint64_t
Fixes a bunch of warnings in 32-bit builds.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-10-04 22:32:57 +11:00
Matt Whitlock
42ed8a6c9c gallium/winsys: replace calls to dup(2) with fcntl(F_DUPFD_CLOEXEC)
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-04 11:09:03 +02:00
Matt Whitlock
ac6064f918 st/xa: replace call to dup(2) with fcntl(F_DUPFD_CLOEXEC)
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-04 11:09:01 +02:00
Matt Whitlock
0c060f691c st/dri: replace calls to dup(2) with fcntl(F_DUPFD_CLOEXEC)
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-04 11:08:58 +02:00
Matt Whitlock
5d0069eca2 gallium/auxiliary: replace call to dup(2) with fcntl(F_DUPFD_CLOEXEC)
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-04 11:08:55 +02:00
Matt Whitlock
c8fd7d060d egl/android: replace call to dup(2) with fcntl(F_DUPFD_CLOEXEC)
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-04 11:08:50 +02:00
Tapani Pälli
387e0af0b4 intel: fix compilation warning on gen_get_device_info
(warning: 'const' type qualifier on return type has no effect)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2016-10-04 07:38:45 +03:00
Kenneth Graunke
9d6ca7c3d0 i965: Only emit 1 viewport when possible.
In core profile, we support up to 16 viewports.  However, in the
majority of cases, only 1 of them is actually used - we only need
the others if the last shader stage prior to the rasterizer writes
gl_ViewportIndex.

Processing all 16 viewports adds additional CPU overhead, which hurts
CPU-intensive workloads such as Glamor.  This meant that switching to
core profile actually penalized Glamor to an extent, which is
unfortunate.

This patch tracks the number of relevant viewports, switching between
1 and ctx->Const.MaxViewports if gl_ViewportIndex is written.  A new
BRW_NEW_VIEWPORT_COUNT flag tracks this.  This could mean re-emitting
viewport state when switching, but hopefully this is offset by doing
1/16th of the work in the common case.  The new flag is also lighter
weight than BRW_NEW_VUE_MAP_GEOM_OUT, which we were using in one case.

According to Eric Anholt, x11perf -copypixwin10 performance improves by
11.5094% +/- 3.10841% (n=10) on his Skylake.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-03 18:41:10 -07:00
Dave Airlie
7eb7684818 spirv: translate cull distance semantic.
This just translates to the correct cull distance slot.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-04 10:16:23 +10:00
Dave Airlie
bd0157d542 compiler: add printable values for cull distance varyings.
We need these for spir-v/nir shaders.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-04 10:15:23 +10:00
Jason Ekstrand
6ffbfc760d nir/spirv/cfg: Use a nop intrinsic for tagging the ends of blocks
Previously, we were saving off the last nir_block in a vtn_block before
moving on so that we could find the nir_block again when it came time to
handle phi sources.  Unfortunately, NIR's control flow modification code is
inconsistent when it comes to how it splits blocks so the block pointer we
saved off may point to a block somewhere else in the shader by the time we
get around to handling phi sources.  In order to get around this, we insert
a nop instruction and use that as the logical end of our block.  Since the
control flow manipulation code respects instructions, the nop will keeps
its place like any other instruction and we can easily find the end of our
block when we need it.

This fixes a bug triggered by a couple of vkQuake shaders.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97233
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-03 16:17:12 -07:00
Jason Ekstrand
7697b4b98b nir: Add a nop intrinsic
This intrinsic has no destination, no sources, no variables, and can be
eliminated.  In other words, it does nothing and will always get deleted by
dead code elimination.  However, it does provide a quick-and-easy way to
temporarily tag a particular location in a NIR shader.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-10-03 16:17:12 -07:00
Jason Ekstrand
0176c6a692 intel/isl: Allow non-2D HiZ surfaces
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-03 14:53:01 -07:00
Jason Ekstrand
4e397c6c75 intel/isl: Add a detailed comment about multisampling with HiZ
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-03 14:53:01 -07:00
Jason Ekstrand
c3bd711411 intel/isl: Remove tiling checks from choose_msaa_layout
We already do those checks in filter_tiling.  There's no good reason to
repeat them in choose_msaa_layout.  If anything they should have been
asserts and not "return false" checks.  Also, this check was causing us to
outright reject multisampled HiZ surfaces which wasn't intended.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-03 14:53:01 -07:00
Jason Ekstrand
69d3bb9915 intel/isl: Handle HiZ and CCS tiling more directly
The HiZ and CCS tiling formats are always used for HiZ and CCS surfaces
respectively.  There's no reason why we should go through filter_tiling and
it's much easier to always get HiZ and CCS right if we just handle them
directly.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-03 14:53:01 -07:00
Jason Ekstrand
b1311a48e0 intel/isl: Allow multisampling with ISL_FORMAT_HiZ
HiZ buffers can be multisampled and, on Broadwell and earlier, simply using
interleaved multisampling with a compression block size of 8x4 samples
yields the correct HiZ surface size calculations.  Unfortunately,
choose_msaa_layout was rejecting multisampled HiZ buffers because of format
checks.  Now that we have a simple helper for determining if a format
supports multisampling, that's an easy enough issue to fix.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-03 14:53:01 -07:00
Jason Ekstrand
baade41a5c intel/isl: Allow creation of 1-D compressed textures
Compressed 1-D textures are not well-defined thing in either GL or Vulkan.
However, auxiliary surfaces are treated as compressed textures in ISL and
we can do HiZ and CCS with 1-D so we need to be able to create them.  In
order to prevent actually using them (the docs say no), we assert in the
state setup code.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-03 14:53:01 -07:00
Jason Ekstrand
f82166578f intel/isl: Fix up asserts in calc_phys_level0_extent_sa
The assertion that a format is uncompressed in the multisample layouts
isn't quite right.  What we really want to assert is that the format
supports multisampling which is a bit more complicated query.  We also want
to assert that it has a block size of 1x1 since we do nothing with the
block size in the phys_level0_sa assignment.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-03 14:53:01 -07:00
Jason Ekstrand
5637f3f120 intel/isl: Add a format_supports_multisampling helper
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-10-03 14:53:01 -07:00
Nayan Deshmukh
b7a0f2e1f7 vl/dri3: fix warning about incompatible pointer type
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-10-03 12:51:30 -04:00
Bruce Cherniak
903d00cd32 swr: Removed stalling SwrWaitForIdle from queries.
Previous fundamental change in stats gathering added a temporary
SwrWaitForIdle to begin_query and end_query.  Code has been reworked to
remove stall.

Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2016-10-03 09:57:45 -05:00