Commit graph

29991 commits

Author SHA1 Message Date
Samuel Pitoiset
af7fef12f7 r600: fix a compilation warning in r600_screen_create()
Should be r600_common_screen instead of r600_screen.

Fixes: 80157a2c20 ("gallium/radeon: clean up r600_query_init_backend_mask")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 18:13:18 +01:00
Marek Olšák
f8bc628b2c gallium/radeon: merge dirty_fb_counter and dirty_tex_descriptor_counter
to simplify things in draw_vbo a little

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 17:45:29 +01:00
Marek Olšák
75c425e511 winsys/radeon: clamp vram_vis_size to 256MB
the value from the kernel is wrong

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 17:45:29 +01:00
Marek Olšák
eba9e9dd1d radeonsi: handle count_from_stream_output in a few IA_MULTI_VGT_PARAM cases
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 17:45:29 +01:00
Marek Olšák
a0740d59aa radeonsi: don't invoke DCC decompression in update_all_texture_descriptors
This fixes a bug uncovered by the 17-part patch series, specifically:
  "gallium/radeon: merge dirty_fb_counter and dirty_tex_descriptor_counter"

If dirty_tex_counter has been updated and set_shader_image invokes DCC
decompression, the DCC decompression itself checks the counter and updates
descriptors, which in turn invokes the same DCC decompression. The blitter
can't handle the recursion and the driver eventually crashes.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 17:45:29 +01:00
Marek Olšák
f8dd2f5bac radeonsi: fold info->indirect conditionals into the last one in draw_vbo
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 17:29:36 +01:00
Marek Olšák
408f9a1584 radeonsi: atomize the scratch buffer state
The update frequency is very low.

Difference: Only account for the size when allocating a new one and when
            starting a new IB, and check for NULL. (v3)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 17:29:36 +01:00
Bartosz Tomczyk
a41f2527ae r600: Fix stack overflow
Commit 7b5878ee04 increased number of
outputs to 64, but left output array intact. This caused stack overflow
when number of outputs is bigger then 32. Found by ASAN.

Cc: "12.0 13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 15:30:03 +01:00
Samuel Pitoiset
e2c15ea092 gallium/radeon: add new HUD queries for monitoring the CP
There are even more counters in the CP_STAT register but I think
these ones are enough for now.

v2: only read (and expose) CP_STAT on VI+

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-30 14:37:00 +01:00
Samuel Pitoiset
0e04a078c5 gallium/radeon: add new GPU-sdma-busy HUD query
For simplicity, GPU-sdma-busy will return 0 on previous gens.

v2: only read SRBM_STATUS2 on Evergreen+

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-30 14:37:00 +01:00
Samuel Pitoiset
b0f7ddef4f gallium/radeon: rename grbm to mmio in the gpu load path
We also want to monitor other MMIO counters like SRBM_STATUS2 in
order to know if SDMA is busy.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-30 14:37:00 +01:00
Marek Olšák
2fc5fe0e85 winsys/amdgpu: add a fast exit path into amdgpu_cs_add_buffer
The time spent in the function dropped by 37% for torcs.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:57:09 +01:00
Samuel Pitoiset
86eb52adad winsys/amdgpu: do not iterate twice when adding fence dependencies
The perf difference is very small, 3.25->2.84% in amdgpu_cs_flush()
in the DXMD benchmark.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-30 13:44:25 +01:00
Samuel Pitoiset
5a6b1aadea winsys/amdgpu: add one likely() call in amdgpu_cs_flush()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-30 13:44:19 +01:00
Samuel Pitoiset
db2b0210b1 hud: fix compilation warnings in hud_nic_graph_install()
v2: use PRId64 instead of PRIx64

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-30 13:43:30 +01:00
Marek Olšák
62732ce263 gallium/radeon: remove r600_common_context::max_db
this cleanup is based on the vulkan driver, which seems to do the same thing

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
9327780da6 winsys/amdgpu: fix ADDR_REGISTER_VALUE::backendDisables
This would be a fix if the value was used anywhere.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
80157a2c20 gallium/radeon: clean up r600_query_init_backend_mask
This just needs to be done for r600g in the screen.
We don't need an IB submission for every new context created for GCN.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
5f99c49008 radeonsi: precompute IA_MULTI_VGT_PARAM values into a table
The perf difference is very small: 0.99% -> 0.40% for the time spent
in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
c78177fc64 radeonsi: move VGT_VERTEX_REUSE_BLOCK_CNTL into shader states for Polaris
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
ccecf79c2b radeonsi: state atom IDs don't have to be off by one
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
ac059f1c23 radeonsi: use a bitmask for looping over dirty PM4 states
also move it to draw_vbo, because it should be 0 in most cases

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
802fcdc0d2 radeonsi: atomize L2 prefetches
to move the big conditional statement out of draw_vbo

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
c99ba3eb47 radeonsi: unbind disabled shader stages to prevent useless L2 prefetches
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
4a4ff66dbe radeonsi: also prefetch compute shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
879c73fac8 radeonsi: update dirty_level_mask only after the first draw after FB change
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
cecc068774 gallium/radeon: allow VRAM-only placements again on APUs & recent amdgpu
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
0d0f357de6 radeonsi: don't set +fp64-denormals
it's the default and the name will change to +fp64-fp16-denormals.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
b177162489 radeonsi: remove si_shader_context::param_tess_offchip
we don't use on-chip tess.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Lucas Stach
e158b74971 etnaviv: force vertex buffers through the MMU
This fixes a vertex data corruption issue if some of the vertex streams
go through the MMU and some don't.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-01-30 12:40:57 +01:00
Bas Nieuwenhuizen
b8ee45ebdc llvmpipe: Use LLVMDumpModule, not DumpModule.
Forgot the prefix ...

Fixes: 0fca80b3db
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
2017-01-29 17:03:25 +01:00
Bas Nieuwenhuizen
0fca80b3db various: Fix missing DumpModule with recent LLVM.
Since LLVM revision 293359 DumpModule gets only implemented when
either a debug build or LLVM_ENABLE_DUMP is set.

This patch adds a direct replacement for the function for radv and
radeonsi, However, as I don't know a good place to put common LLVM
code for all three I inlined the implementation for LLVMPipe.

v2: Use the new code for LLVM 3.4+ instead of LLVM 5+ & fixed indentation

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-01-29 10:25:00 +01:00
Ilia Mirkin
ce7a045fee r600g: use ieee variants of multiplication instructions
This matches the behavior of most other drivers, including nouveau,
radeonsi, and i965.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-29 00:00:07 -05:00
Ilia Mirkin
bacbb01105 r600g: add support for optionally using non-IEEE mul ops
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-28 23:59:43 -05:00
Eric Anholt
5b7e2697dc vc4: Coalesce into TLB writes as well as VPM/tex.
This generally cuts an instruction when blending is enabled and we thus
have a single instruction generating the color value.

total instructions in shared programs: 91759 -> 91634 (-0.14%)
instructions in affected programs:     5338 -> 5213 (-2.34%)
2017-01-28 19:35:20 -08:00
Eric Anholt
c1299615fb vc4: Avoid an extra temporary and mov in ffloor/ffract/fceil.
shader-db results:

total instructions in shared programs: 92611 -> 91764 (-0.91%)
instructions in affected programs:     27417 -> 26570 (-3.09%)

The star is one shader in glmark2's terrain (drops 16% of its
instructions), but there are also wins in mupen64plus and glb2.7.
2017-01-28 19:35:20 -08:00
Eric Anholt
0079df0b2d vc4: Flip the switch to run the GLSL compiler optimization loop once.
This has almost no effect on shader-db:

total instructions in shared programs: 92572 -> 92611 (0.04%)
instructions in affected programs:     4486 -> 4525 (0.87%)

Looking at 2 of the 7 different shaders that were hurt (all of which were
in mupen64), they all appear to be just differences in order of
instructions at the NIR level.

The advantage is that this should significantly reduce time in the compiler.
2017-01-28 19:35:20 -08:00
Emil Velikov
1cfe97ff0e nouveau: remove explicit __STDC_FORMAT_MACROS define
Already handled by the build.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-27 17:56:57 +00:00
Emil Velikov
027e04932a scons: swr: remove explicit __STDC_.*_MACROS defines
Analogous to previous commits.

Cc: George Kyriazis <george.kyriazis@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-27 17:56:57 +00:00
Emil Velikov
e809fadb86 gallium: remove explicit __STDC_.*_MACROS defines
Analogous to previous commits.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-27 17:56:57 +00:00
Emil Velikov
01e28c6cf5 gallivm: remove explicit __STDC_.*_MACROS defines
Correctly handled by the build systems.

Cc: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-27 17:56:57 +00:00
Emil Velikov
c4862fa382 st/xa: automake: remove duplicate -Wall
Already handled by configure.ac

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:57 +00:00
Emil Velikov
be3b5e015c svga: remove const qualifier from SVGA3D_vgpu10_GenMips() prototype
Does not match the function definition or how it's used. Triggers the
following warning in AppVeyor

svga_cmd_vgpu10.c(1301) : warning C4028: formal parameter 2 different from declaration

Cc: Charmaine Lee <charmainel@vmware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:56 +00:00
Emil Velikov
d221bf9b91 d3dadapter9: automake: include builddir prior to srcdir
Analogous to previous commit.

Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:55 +00:00
Emil Velikov
517f34b4be st/dri: automake: include builddir prior to srcdir
Analogous to previous commit.

Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:55 +00:00
Emil Velikov
02f991c00d clover: automake: remove -I$(srcdir)
Already implicitly handled by the build system.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Aaron Watry <awatry@gmail.com>
2017-01-27 17:56:55 +00:00
Emil Velikov
65d5a60cac clover: automake: include builddir prior to srcdir
Analogous to previous commit.

Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Cc: Aaron Watry <awatry@gmail.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:55 +00:00
Emil Velikov
a922c82125 freedreno: automake: correctly set MKDIR_GEN
Analogous to previous commit.

Fixes: 4610e5ef28 "freedreno/ir3: fix sin/cos"
Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Cc: Rob Clark <robclark@freedesktop.org>
Cc: Nicolas Dechesne <nicolas.dechesne@linaro.org>
Reported-by: Nicolas Dechesne <nicolas.dechesne@linaro.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Nicolas Dechesne <nicolas.dechesne@linaro.org>
2017-01-27 17:56:55 +00:00
Nicolai Hähnle
c5e76a262a gallium: enable int64 on radeonsi, llvmpipe, softpipe
All of these have had support for the TGSI opcodes since before most of
the glsl compiler work landed.

Also update the docs accordingly, including the missing note about i965.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-27 10:19:48 +01:00
Dave Airlie
f804506d4d gallium: Add integer 64 capability
v1.1: move to using a normal CAP. (Marek)

v2: fill in the cap everywhere

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-27 10:19:25 +01:00