Commit graph

32650 commits

Author SHA1 Message Date
Marek Olšák
064550238e radeonsi: use CLEAR_STATE to initialize some registers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-28 08:03:24 +02:00
Dave Airlie
554aa09440 virgl: drop precise modifier.
The host doesn't understand this yet, so drop it for now.

Fixes: virgl regressions.

Fixes: af22adee4f (tgsi: add precise flag to tgsi_instruction)
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-28 11:04:35 +10:00
Nicolai Hähnle
06e20c4b8c radeonsi: bail out instead of crashing if the main shader part failed to compile
Reviewed: Marek Olšák <marek.olsak@amd.com>
2017-07-27 21:16:45 +02:00
Nicolai Hähnle
4dd86631f4 radeonsi: update a comment for merged shaders
Reviewed: Marek Olšák <marek.olsak@amd.com>
2017-07-27 21:16:45 +02:00
Nicolai Hähnle
4738dd9546 radeonsi/gfx9: dump previous stage LLVM IR for merged shaders
Reviewed: Marek Olšák <marek.olsak@amd.com>
2017-07-27 21:16:45 +02:00
Nicolai Hähnle
760876a7b1 radeonsi: make sure TCS main output VGPRs don't alias inputs
Avoids an unnecessary move introduce by "radeonsi/gfx9: always wrap GS
and TCS in an if-block (v2)"

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-27 21:16:42 +02:00
Nicolai Hähnle
081ac6e5c6 radeonsi/gfx9: always wrap GS and TCS in an if-block (v2)
With merged ESGS shaders, the GS part of a wave may be empty, and the
hardware gets confused if any GS messages are sent from that wave. Since
S_SENDMSG is executed even when EXEC = 0, we have to wrap even
non-monolithic GS shaders in an if-block, so that the entire shader and
hence the S_SENDMSG instructions are skipped in empty waves.

This change is not required for TCS/HS, but applying it there as well
simplifies the logic a bit.

Fixes GL45-CTS.geometry_shader.rendering.rendering.*

v2: ensure that the TCS epilog doesn't run for non-existing patches

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-27 21:16:32 +02:00
Nicolai Hähnle
873789002f radeonsi/gfx9: fix vertex idx in ES with multiple waves per threadgroup
Cc: mesa-stable@lists.freedesktop.org
Reviewed: Marek Olšák <marek.olsak@amd.com>
2017-07-27 21:16:32 +02:00
George Kyriazis
194ff5eed1 swr: fix transform feedback logic
The shader that is used to copy vertex data out of the vs/gs shaders to
the user-specified buffer (streamout or SO shader) was not using the
correct offsets.

Adjust the offsets that are used just for the SO shader:
- Make sure that position is handled in the same special way
  as in the vs/gs shaders
- Use the correct offset to be passed in the core
- consolidate register slot mapping logic into one function, since it's
  been calculated in 2 different places (one for calcuating the slot mask,
  and one for the register offsets themselves

Also make room for all attibutes in the backend vertex area.

Fixes:
- all vtk GL2PS tests
- 18 piglit tests (16 ext_transform_feedback tests,
  arb-quads-follow-provoking-vertex and primitive-type gl_points

v2:

- take care of more SGV slots in slot mapping logic
- trim feState.vsVertexSize
- fix GS interface and incorporate GS while calculating vsVertexSize

Note that vsVertexSize is used in the core as the one parameter that
controls vertex size between all stages, so it has to be adjusted appropriately
for the whole vs/gs/fs pipeline.

Also note that GS and SO is not fully implemented.  This will be addressed
later.

fixes:
- fixes total of 20 piglit tests

CC: 17.2 <mesa-stable@lists.freedesktop.org>

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-07-27 13:54:19 -05:00
Tim Rowley
e21fc2c625 swr/rast: non-regex knob fallback code for gcc < 4.9
gcc prior to 4.9 didn't implement <regex>, causing a startup crash
in the swr knob parameter reading code.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-07-27 08:31:21 -05:00
Dave Airlie
c4652a0a5b virgl: encode index buffer offset.
Fixes arb_vertex_buffer_object-combined-vertex-index

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-27 16:10:07 +10:00
Marek Olšák
ed2b3f5c81 radeonsi: decrease the number of compiler threads
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-26 19:53:26 +02:00
Marek Olšák
433f6f7ac9 gallium/radeon: make S_FIXED function signed and move it to shared code
This fixes a bug uncovered by:
    2412c4c81e
    util: Make CLAMP turn NaN into MIN.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-26 19:53:26 +02:00
Nicolai Hähnle
a0e6b9a2db radeonsi/gfx9: reduce max threads per block to 1024 on gfx9+
The number of supported waves per thread group has been reduced to 16
with gfx9. Trying to use 32 waves causes hangs, and barriers might
not work correctly with > 16 waves.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-26 11:51:00 +02:00
Nicolai Hähnle
65fbaab0b7 radeonsi: fix detection of DRAW_INDIRECT_MULTI on SI
The firmware version numbers for SI were wrong. The new numbers are probably
too conservative (we don't have a definitive answer by the firmware team),
but DRAW_INDIRECT_MULTI has been confirmed to work with these versions on
Tahiti (by Gustaw) and on Verde (by myself).

While this is technically adding a feature, it's a feature we thought we had
for a long time. The change is small enough and we're early enough in the 17.2
release cycle that it should still go in.

Reported-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-26 11:48:32 +02:00
Timothy Arceri
17f05e52e7 gallium/util: fix unused variable warning
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-26 10:39:52 +10:00
Eric Anholt
53492917e2 broadcom/vc4: Use the RA callback to improve register selection's choices.
We simply pick r4 if available (anything else would force a MOV), then
round-robin through accumulators (avoids physical regfile RAW delay
slots), then round-robin through the physical regfile.

The effect on instruction count is pretty impressive:

total instructions in shared programs: 76563 -> 74526 (-2.66%)
instructions in affected programs:     66463 -> 64426 (-3.06%)

and we could probably do better with a little heuristic of "if we're going
to choose a physical reg, and other operands of instructions using this as
a src have the same physical regfile, then use the other regfile".
2017-07-25 14:55:10 -07:00
Eric Anholt
16e17ce04b broadcom/vc4: Scissor blits performed using the rendering engine.
Without this, a BlitFramebuffer would mark the whole framebuffer as being
changed (so we emit loads/stores of all of it) rather than just the
modified subset.
2017-07-25 14:44:52 -07:00
Eric Anholt
93fec49a75 broadcom/vc4: Prefer blit via rendering to the software fallback.
I don't know how I managed to leave this here for so long.  Found when
working on a 1:1 overlapping blit extension for X11.

Cc: mesa-stable@lists.freedesktop.org
2017-07-25 14:44:52 -07:00
Eric Anholt
b3c78a51f3 broadcom/vc4: Switch the Viewport Center fields to a fixed-point representation.
This gets us automatic CL decoding to a floating-point value, and drops a
magic number from the emit code.  250x250 shader runner tests now say they
have a center of 125.0 instead of 2000.
2017-07-25 14:44:52 -07:00
Eric Anholt
299c9a2db1 broadcom/vc4: Use the XML decoder for CL dumping.
The VC4_DEBUG_CL output goes from:

0x00000010 0x00000010: 0x06 VC4_PACKET_START_TILE_BINNING
0x00000011 0x00000011: 0x38 VC4_PACKET_PRIMITIVE_LIST_FORMAT
0x00000012 0x00000012: 0x12
0x00000013 0x00000013: 0x66 VC4_PACKET_CLIP_WINDOW
0x00000014 0x00000014: 0x00
0x00000015 0x00000015: 0x00
0x00000016 0x00000016: 0x00
0x00000017 0x00000017: 0x00
0x00000018 0x00000018: 0xfa
0x00000019 0x00000019: 0x00
0x0000001a 0x0000001a: 0xfa
0x0000001b 0x0000001b: 0x00

to:

0x00000010 0x00000010: 0x06 Start Tile Binning
0x00000011 0x00000011: 0x38 Primitive List Format
    Data Type: 1 (16-bit index)
    Primitive Type: 2 (Triangles List)
0x00000013 0x00000013: 0x66 Clip Window
    Clip Window Height in pixels: 250
    Clip Window Width in pixels: 250
    Clip Window Bottom Pixel Coordinate: 0
    Clip Window Left Pixel Coordinate: 0

v2: Squash in robher's fixes for Android
2017-07-25 14:44:52 -07:00
Brian Paul
91735e2d4a svga: implement MSAA alpha_to_one feature
The device doesn't directly support this feature so we implement it with
additional shader code which sets the color output(s) w component to
1.0 (or max_int or max_uint).

Fixes 16 Piglit ext_framebuffer_multisample/*alpha-to-one* tests.

v2: only support unorm/float buffers, not int/uint, per Roland.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-25 15:40:24 -06:00
Brian Paul
71d3b69b23 svga: rework the FS white fragments code
When we forcibly write white to FS outputs (for XOR mode emulation)
we were using a temp register.  But that's not really necessary.
This also fixes the case of writing white to multiple color buffers.

Subsequent changes will build on this.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-25 15:40:23 -06:00
Brian Paul
1ab8901d6f gallium/util: s/unsigned/enum tgsi_texture_type/
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-07-25 15:40:23 -06:00
Daniel Stone
45383d32d4 st/dri2: Return invalid modifier when no driver support
Always initialise whandle.modifier for DRIImage modifier queries, so if
the driver doesn't support it then we return false for the query.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes: d33fe8b84e ("st/dri: enable DRIimage modifier queries")
2017-07-25 18:40:07 +01:00
Daniel Stone
b4a18f13ce st/dri: Check get-handle return value in queryImage
In the DRIImage queryImage hook, check if resource_get_handle() failed
and return FALSE if so.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-25 18:40:06 +01:00
Michal Srb
e6d7937b86 r600: Add support for B5G5R5A1.
Fixes rendercheck errors when using glamor acceleration in X server.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-07-25 19:17:03 +02:00
Leo Liu
82fcf3142f radeon/vcn: move message buffer to vram for now
To workaround an unknown bug.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2017-07-25 12:27:09 -04:00
Jose Fonseca
8d655263ca trace: Correct transfer box size calculation.
For textures we must not approximate the calculation with `stride *
height`, or `slice_stride * depth`, as that can easily lead to buffer
overflows, particularly for partial transfers.

This should address the issue that Bruce Cherniak found and diagnosed.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-07-25 17:18:04 +01:00
Constantine Charlamov
dacb319777 r600g: constify some args at r600_asm.c
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-25 09:24:27 +02:00
Constantine Charlamov
3823e4905b r600g: remove unused "bc" args, and one unneeded forward declaration
To ease review just highlight "bc," string.

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-25 09:24:17 +02:00
Charmaine Lee
bbc29393d3 st/mesa: create framebuffer iface hash table per st manager
With commit 5124bf9823, a framebuffer interface hash table is
created in st_gl_api_create(), which is called in
dri_init_screen_helper() for each screen. When the hash table is
overwritten with multiple calls to st_gl_api_create(), it can cause
race condition. This patch fixes the problem by creating a
framebuffer interface hash table per state tracker manager.

Fixes crash with steam.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101876
Fixes: 5124bf9823 ("st/mesa: add destroy_drawable interface")
Tested-by: Christoph Haag <haagch@frickel.club>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-07-24 14:03:28 -07:00
Emil Velikov
4d53b16f55 swr: use the correct variable for no undefined symbols
The variable name was missing a leading LD_, which resulted in a missing
check for unresolved symbols in the backend binaries.

With the link addressed with earlier patches, we can correct the typo.

Thanks to Laurent for the help spotting this.

v2: Split from a larger patch.

Cc: mesa-stable@lists.freedesktop.org
Cc: Bruce Cherniak <bruce.cherniak@intel.com>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Cc: Laurent Carlier <lordheavym@gmail.com>
Fixes: 9475251145 "swr: standardize linkage and check for
                             unresolved symbols"
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reported-by: Laurent Carlier <lordheavym@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-24 10:23:45 +01:00
Emil Velikov
9fd23435c2 swr: don't forget to link KNL/SKX against pthreads
Analogous to previous commit but for the KNL/SKX backends.

Cc: Bruce Cherniak <bruce.cherniak@intel.com>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Cc: Laurent Carlier <lordheavym@gmail.com>
Fixes: 1cb5a6061c ("configure/swr: add KNL and SKX architecture targets")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-24 10:23:45 +01:00
Emil Velikov
33d397ada5 swr: don't forget to link AVX/AVX2 against pthreads
Seems like the backends have been using pthreads since day one, yet
we've been missing the link.

With later commit we'll fix a typo, hence the libraries will be build
with -Wl,no-undefined, aka failing the build on unresolved symbols.

v2: Split from a larger patch.

Cc: mesa-stable@lists.freedesktop.org
Cc: Bruce Cherniak <bruce.cherniak@intel.com>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Cc: Laurent Carlier <lordheavym@gmail.com>
Fixes: c6e67f5a93 "gallium/swr: add OpenSWR rasterizer"
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-24 10:23:45 +01:00
Wladimir J. van der Laan
15a1ceb127 etnaviv: Clear lbl_usage array correctly
Fill the entire array instead of just a quarter. This avoids
crashes with large shaders.
(currently this never causes a problem because shaders larger than 2048/4
instructions are not supported by this driver on any hardware, but it will
cause problems in the future)

Fixes: ec43605189 ("etnaviv: fix shader miscompilation with more than 16 labels")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-07-23 21:52:44 +02:00
Neha Bhende
1820ef64c9 svga: Limit number of immediates in shader
imm {128.0, -128.0, 2.0, 3.0} is used for lit instruction which
is not used very frequently. So allocate it only if lit instruction is used.

Tested with mtt piglit and mtt glretrace

v2: As per Charmaine's comment

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-22 13:18:56 -06:00
Charmaine Lee
83ca6b9d31 svga: fix constant indices for texcoord scale factors and texture buffer size
This patch fixes the ordering of the constant indices for texcoord scale
factor and texture buffer size to match the order they were added to the
constant buffer in svga_get_extra_constants_common().

Tested with MTT piglit, glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-07-22 13:18:56 -06:00
Neha Bhende
acfb1583a5 svga: fix unnormalized->normalized texture coordinate conversion
Sometimes, converting unnormalized coordinates to normalized
coordinates requires an epsilon value to produce the right texels with
nearest filtering.  Adding 0.0001 to the coordinates when the min/mag
filter is nearest fixes the issue.
Fixes piglit test fbo-blit-scaled-linear

Tested with mtt-piglit, mtt-glretrace

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-22 13:18:56 -06:00
Brian Paul
dc62ddfb39 svga: only support 4x, 8x, 16x msaa
Skip 2x MSAA, for example, since it's seldom used and just bloats
the list of pixel formats.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-22 13:18:56 -06:00
Karol Herbst
f98a221f2d nv50/ir: disable mul+add to mad for precise instructions
fixes
    missrendering in TombRaider
    KHR-GL44.gpu_shader5.precise_qualifier
    KHR-GL45.gpu_shader5.precise_qualifier

v4: disable opt only for MAD, it's fine for SAD

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2017-07-21 23:45:18 -04:00
Karol Herbst
f9bfc93014 nv50/ir/tgsi: handle precise for most ALU instructions
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2017-07-21 23:45:18 -04:00
Karol Herbst
1d7c232fbd nv50/ir: add precise field to Instruction
v4: initialize field with NULL

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2017-07-21 23:45:18 -04:00
Karol Herbst
c5cbb9a543 gallium/docs: add precise instruction modifier
v4: add comment about intermediate rounding step to MAD

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-07-21 23:45:18 -04:00
Karol Herbst
4611343bcc tgsi/text: parse _PRECISE modifier
v2: use str_match_no_case to fix _SAT_PRECISE detection
v4: usd is_digit_alpha_underscore to match end of mods

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-21 23:45:18 -04:00
Karol Herbst
d0dfdf704d tgsi: populate precise
Only implemented for glsl->tgsi. Other converters just set precise to 0.

v2: remove precise paramter from ureg_tex_insn and ureg_memory_insn

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-21 23:45:18 -04:00
Karol Herbst
0341aea2f8 tgsi/dump: print _PRECISE modifier on Instructions
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-21 23:45:18 -04:00
Karol Herbst
af22adee4f tgsi: add precise flag to tgsi_instruction
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-07-21 23:45:18 -04:00
Charmaine Lee
5124bf9823 st/mesa: add destroy_drawable interface
With this patch, the st manager will maintain a hash table for
the active framebuffer interface objects. A destroy_drawable interface
is added to allow the state tracker to notify the st manager to remove
the associated framebuffer interface object from the hash table,
so the associated framebuffer and its resources can be deleted
at framebuffers purge time.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101829
Fixes: 147d7fb772 ("st/mesa: add a winsys buffers list in st_context")
Tested-by: Brad King <brad.king@kitware.com>
Tested-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-07-20 17:34:34 -07:00
Roland Scheidegger
dbde58dd31 gallivm: handle call attributes for llvm < 4.0 in lp_add_function_attr
We had some caller using LLVMAddInstrAttributes, which couldn't be
converted to lp_add_function_attr, because attributes were only handled
for functions in this case, so fix this.
For llvm >= 4.0, this already works correctly.
(radeonsi seems to avoid setting call site attributes prior to llvm 4.0,
the patch then citing it doesn't work when calling intrinsics. But at
least for calling external functions we always used that, albeit only
for actual call attributes, not call parameter attributes, though some
quick test shows llvm seems to handle that as well. The attribute index
is sort of iffy though, since attribute 0 of the call is the actual function,
attribute 1 corresponds to the first parameter of the called function.)
(Verified with GALLIVM_DEBUG=dumpbc plus llvm-dis that the correct
attributes are shown for calls, both for llvm 4.0 and 3.3.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-07-21 22:46:04 +02:00