Commit graph

33175 commits

Author SHA1 Message Date
Boyuan Zhang
d9727f31a8 vl: remove is idr flag
Remove is_idr flag since not being used anymore.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-12-15 16:04:05 -05:00
Boyuan Zhang
3181065b7f st/va: directly use idr pic flag
Remove is_idr flag, and use idr_pic_flag provided by vaapi directly

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-12-15 16:04:05 -05:00
Boyuan Zhang
130e1d142f radeon/vce: determine idr by pic type
Vaapi encode interface provides idr frame flags, where omx interface doesn't.
Therefore, change to use picture type to determine idr frame, which will
work for both interfaces.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2017-12-15 16:04:05 -05:00
Boyuan Zhang
c87d91b9d8 radeon/vcn: determine idr by pic type
Vaapi encode interface provides idr frame flags, where omx interface doesn't.
Therefore, change to use picture type to determine idr frame, which will
work for both interfaces.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-12-15 16:04:05 -05:00
Tim Rowley
f475ac3c40 swr/rast: Move more RTAI handling out of binner
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:57:12 -06:00
Tim Rowley
11a9d4f9b5 swr/rast: EXTRACT2 changed from vextract/vinsert to vshuffle
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:57:06 -06:00
Tim Rowley
12adf2c815 swr/rast: Fix cache of API thread event manager
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:57:01 -06:00
Tim Rowley
c68b2d5c79 swr/rast: Replace VPSRL with LSHR
Replace use of x86 intrinsic with general llvm IR instruction.

Generates the same final assembly.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:56:54 -06:00
Tim Rowley
20f9006603 swr/rast: Rework thread binding parameters for machine partitioning
Add BASE_NUMA_NODE, BASE_CORE, BASE_THREAD parameters to
SwrCreateContext.

Add optional SWR_API_THREADING_INFO parameter to SwrCreateContext to
control reservation of API threads.

Add SwrBindApiThread() function to allow binding of API threads to
reserved HW threads.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:56:46 -06:00
Tim Rowley
182cc51a50 swr/rast: Pull of RTAI gather & offset out of clip/bin code
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:56:40 -06:00
Tim Rowley
ca59b2e75c swr/rast: Remove no-op VBROADCAST of vID
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:56:36 -06:00
Tim Rowley
01a57c11cb swr/rast: SIMD16 Fetch - Fully widen 32-bit integer vertex components
Also widen the 16-bit a 8-bit integer vertex component gathers to SIMD16.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:56:30 -06:00
Tim Rowley
fa3105cdb5 swr/rast: Replace INSERT2 vextract/vinsert with JOIN2 vshuffle
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:56:25 -06:00
Tim Rowley
b38ac9dca1 swr/rast: SIMD16 Fetch - Fully widen 16-bit float vertex components
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:56:19 -06:00
Tim Rowley
df54678ba0 swr/rast: SIMD16 Fetch - Fully widen 32-bit float vertex components
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:56:03 -06:00
Tim Rowley
fbc27ff027 swr/rast: Pass prim to ClipSimd
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:55:54 -06:00
Tim Rowley
8b06920796 swr/rast: Pull most of the VPAI manipulation out of the binner/clipper
Move out of binner/clipper; hand them down from the frontend code instead.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:55:49 -06:00
Tim Rowley
f882891684 swr/rast: Move GatherScissors to header
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:55:42 -06:00
Tim Rowley
cdb61d45cd swr/rast: Rewrite Shuffle8bpcGatherd using shuffle
Ease future code maintenance, prepare for folding simd8 and simd16 versions.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:55:38 -06:00
Tim Rowley
3ec98ab5d4 swr/rast: Convert gather masks to Nx1bit
Simplifies calling code, gets gather function interface closer to llvm's
masked_gather.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:55:33 -06:00
Tim Rowley
36e276b6b0 swr/rast: WIP - Widen fetch shader to SIMD16
Widen vertex gather/storage to SIMD16 for all component types.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:55:28 -06:00
Tim Rowley
6d5275498a swr/rast: Corrections to multi-scissor handling
binner's GatherScissors() will be turned into a real gather in the not
too distant future.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:55:24 -06:00
Tim Rowley
0e9e247687 swr/rast: Binner fixes for viewport index offset handling
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:55:19 -06:00
Tim Rowley
f2e3900a1e swr/rast: Remove unneeded copy of gather mask
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:55:01 -06:00
Rob Clark
d1465b3aee freedreno: use u_transfer_helper
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-12-15 08:09:44 -05:00
Rob Clark
e94eb5e600 gallium/util: add u_transfer_helper
Add a new helper that drivers can use to emulate various things that
need special handling in particular in transfer_map:

 1) z32_s8x24.. gl/gallium treats this as a single buffer with depth
    and stencil interleaved but hardware frequently treats this as
    separate z32 and s8 buffers.  Special pack/unpack handling is
    needed in transfer_map/unmap to pack/unpack the exposed buffer

 2) fake RGTC.. GPUs designed with GLES in mind, but which can other-
    wise do GL3, if native RGTC is not supported it can be emulated
    by converting to uncompressed internally, but needs pack/unpack
    in transfer_map/unmap

 3) MSAA resolves in the transfer_map() case

v2: add MSAA resolve based on Eric's "gallium: Add helpers for MSAA
    resolves in pipe_transfer_map()/unmap()." patch; avoid wrapping
    pipe_resource, to make it possible for drivers to use both this
    and threaded_context.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-12-15 08:09:44 -05:00
Roland Scheidegger
1ae48963f7 gallivm: implement accurate corner behavior for textureGather with cube maps
The spec says the missing texel (when we wrap around both x and y axis)
should be synthesized as the average of the 3 other texels. For bilinear
filtering however we instead adjusted the filter weights (because, while
the complexity looks similar, there would be 4 times as many color values
to fix up than weights). Obviously this could not work for gather (hence
accurate corner filtering was disabled with gather).
Implement this by just doing it as the spec implies - calculate the 4th
texel as the average of the other 3. With gather of course there's only
one color to worry about, so it's not all that many instructions neither
(albeit surely the whole cube map filtering is hilariously complex).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-12-14 22:59:55 +01:00
Roland Scheidegger
a485ad0bcd gallivm: fix an issue with NaNs with seamless cube filtering
Cube texture wrapping is a bit special since the values (post face
projection) always are within [0,1], so we took advantage of that and
omitted some clamps.
However, we can still get NaNs (either because the coords already had NaNs,
or the face projection generated them), and in fact we didn't handle them
quite safely. I've seen -INT_MAX + 1 been propagated through as the final int
coord value, albeit I didn't observe a crash. (Not quite a coincidence, since
any stride mul with -INT_MAX or -INT_MAX+1 will turn up as a small positive
number - nevertheless, I'd rather not try my luck, I'm not entirely sure it
can't really turn up negative neither due to seamless coord swapping, plus
ifloor of a NaN is not guaranteed to return -INT_MAX by any standard. And
we kill off NaNs similarly with ordinary texture wrapping too.)
So kill off the NaNs by using the common max against zero method.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-12-14 22:59:55 +01:00
Samuel Pitoiset
225b198802 amd/common: add ac_build_waitcnt()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:24:44 +01:00
Samuel Pitoiset
d43e72fd8c radeonsi: make use of ac_build_fdiv()
And move the comment to amd/common.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:24:38 +01:00
Samuel Pitoiset
45872a0a6d radeonsi: make use of ac_get_spi_shader_z_format()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:23:25 +01:00
Bruce Cherniak
ea2ee9cd19 swr: Correct texture allocation and limit max size to 2GB
This patch fixes piglit tex3d-maxsize by correcting 4 things:

The total_size calculation was using 32-bit math, therefore a >4GB
allocation request overflowed and was not returning false (unsupported).

Changed AlignedMalloc arguments from "unsigned int" to size_t, to handle
>4GB allocations.

Added error checking on texture allocations to fail gracefully.

Finally, temporarily decreased supported max texture size from 4GB to 2GB.
The gallivm texture-sampler needs some additional work to correctly handle
larger than 2GB textures (offsets to LLVMBuildGEP are signed).

I'm working on a follow-on patch to allow up to 4GB textures, as this is
useful in HPC visualization applications.

Fixes piglit tex3d-maxsize.

v2: Updated patch description to clarify ">4GB".

Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
2017-12-13 14:44:04 -06:00
Bruce Cherniak
709f5bdc4a swr: Fix KNOB_MAX_WORKER_THREADS thread creation override.
Environment variable KNOB_MAX_WORKER_THREADS allows the user to override
default thread creation and thread binding.  Previous commit to adjust
linux cpu topology caused setting this KNOB to bind all threads to a single
core.

This patch restores correct functionality of override.

Cc: <mesa-stable@lists.freedesktop.org>

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-12-13 14:44:01 -06:00
Brian Paul
c27a6c45c2 gallium/docs: document behavior of set_sample_mask()
The sample mask is used even if msaa is not explicity enabled when we
have a framebuffer with multisampled surfaces.  That's DX behavior and
what the Radeon drivers do.  Not sure about other drivers at this point.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-12-13 08:38:07 -07:00
Timothy Arceri
3308f4b81a radeonsi: create get_tcs_tes_buffer_address helper
This will be shared between the NIR and TGSI backends.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-12-13 13:41:53 +11:00
Brian Paul
7469966ed2 cso: add point rasterization sanity check assertion
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-12-12 09:46:18 -07:00
Brian Paul
38a4fd8ad6 gallium/u_blitter: replace tabs with spaces
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-12-12 09:46:18 -07:00
Brian Paul
63b03dc924 gallium/util: don't pass a pipe_resource to util_resource_is_array_texture()
No need to pass a pipe_resource when we can just pass the target.
This makes the function potentially more usable.  Rename it too.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-12-12 09:44:59 -07:00
Brian Paul
dde8309cde gallium/aux: include nr_samples in util_resource_size() computation
This function is only used in two places:
1. VMware driver, but only for HUD reporting
2. st/nine state tracker, used for texture memory accounting

Fixes: a69efa9482 ("util: add new util_resource_size() function in
u_resource.[ch]")

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-12-12 09:44:59 -07:00
Brian Paul
09b69828a3 svga: trivial whitespace/formatting fixes in svga_pipe_rasterizer.c 2017-12-12 09:44:59 -07:00
Roland Scheidegger
84c363fb09 gallivm: fix texture wrapping for texture gather for mirror modes
Care must be taken that all coords end up correct, the tests are very
sensitive that everything is correctly rounded. This doesn't matter
for bilinear filter (since picking a wrong texel with weight zero is
ok), and we could also switch the per-sample coords mistakenly.
While here, also optimize the coord_mirror helper a bit (we can do the
mirroring directly by exploiting float rounding, no need for fixing up
odd/even manually).
I did not touch the mirror_clamp and mirror_clamp_to_border modes.
In contrast to mirror_clamp_to_edge and mirror_repeat these are legacy
modes. They are specified against old gl rules, which actually does
the mirroring not per sample (so you get swapped order if the coord
is in the mirrored section). I think the idea though is that they should
follow the respecified mirror_clamp_to_edge rules so the order would be
correct.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-12-12 04:23:02 +01:00
Marek Olšák
bf0904e31f winsys/amdgpu: disable local BOs again due to worse performance
Cc: 17.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-12-11 19:11:14 +01:00
Leo Liu
6d74cb2570 radeon/vce: move destroy command before feedback command
VCE processing IBs starts from session and task info at first level,
other commands processed subsequently. The task info for destroy is
embedded to destroy command, resulting that feedback command is not
properly procoessed. This is causing kernel spin VM fault messages on
Polaris and Vega10 card when running ends at encode application.

The fix is also verified on VCE physical mode card.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Christian König <christian.koenig@amd.com>
2017-12-08 12:56:48 -05:00
Dylan Baker
2adc3817c6 meson: Add lmsensors to gallium libgl-xlib target.
Fixes: 5e71efef44 ("meson: Add lmsensors support")
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-12-07 10:20:58 -08:00
Eric Engestrom
4cba39331d meson: add dep_thread to every lib that includes threads.h
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104141
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2017-12-07 17:29:42 +00:00
Eric Engestrom
f0337f0f70 meson: fix pl111 dependency on vc4
src/gallium/winsys/pl111/drm/libpl111winsys.a(pl111_drm_winsys.c.o): In function `pl111_drm_screen_create':
pl111_drm_winsys.c:(.text+0x33): undefined reference to `vc4_drm_screen_create_renderonly'

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2017-12-07 17:21:03 +00:00
Gert Wollny
6c268ea79a r600/sb: do not convert if-blocks that contain indirect array access
If an array is accessed within an if block, then currently it is not known
whether the value in the address register is involved in the evaluation of the
if condition, and converting the if condition may actually result in
out-of-bounds array access. Consequently, if blocks that contain indirect array
access should not be converted.

Fixes piglits on r600/BARTS:
spec/glsl-1.10/execution/variable-indexing/
  vs-output-array-float-index-wr
  vs-output-array-vec3-index-wr
  vs-output-array-vec4-index-wr

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104143

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-12-07 09:48:41 +10:00
Dave Airlie
81683c3d42 r600: add support for compute grid/block sizes. (v2)
We just pass these in from outside in a constant buffer.

The shader side stores them once they are accessed once.

v2: fix to not use a temp_reg.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-12-06 23:21:09 +00:00
Dave Airlie
4525cdb751 r600: handle image/buffer sizes correctly.
This adds support to compute for the resq workarounds (buffer/cube sizes)

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-12-06 23:21:06 +00:00
Dave Airlie
f51458637c r600/compute: add support for emitting compute image/buffer atoms
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-12-06 23:21:02 +00:00