Commit graph

98594 commits

Author SHA1 Message Date
Tim Rowley
cdb61d45cd swr/rast: Rewrite Shuffle8bpcGatherd using shuffle
Ease future code maintenance, prepare for folding simd8 and simd16 versions.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:55:38 -06:00
Tim Rowley
3ec98ab5d4 swr/rast: Convert gather masks to Nx1bit
Simplifies calling code, gets gather function interface closer to llvm's
masked_gather.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:55:33 -06:00
Tim Rowley
36e276b6b0 swr/rast: WIP - Widen fetch shader to SIMD16
Widen vertex gather/storage to SIMD16 for all component types.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:55:28 -06:00
Tim Rowley
6d5275498a swr/rast: Corrections to multi-scissor handling
binner's GatherScissors() will be turned into a real gather in the not
too distant future.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:55:24 -06:00
Tim Rowley
0e9e247687 swr/rast: Binner fixes for viewport index offset handling
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:55:19 -06:00
Tim Rowley
f2e3900a1e swr/rast: Remove unneeded copy of gather mask
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-15 10:55:01 -06:00
Chris Wilson
a68873f668 i965: Allow old begin/end queryobj for gen4/5 with HW contexts
Since we have HW contexts on gen4/5, we could take advantage of them, as
done for gen6+ in commit e32cd5ffbb ("i965: Rely on hardware contexts
for query objects on Gen6+."), to only emit a pair of counters at
begin/end queryobj, rather than around every primitive. However, to keep
queryobj working in the meantime as we bringup support for HW ctx on
gen4/5, we can keep using the existing code.

References: e32cd5ffbb ("i965: Rely on hardware contexts for query objects on Gen6+.")
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-12-15 13:41:18 +00:00
Rob Clark
d1465b3aee freedreno: use u_transfer_helper
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-12-15 08:09:44 -05:00
Rob Clark
e94eb5e600 gallium/util: add u_transfer_helper
Add a new helper that drivers can use to emulate various things that
need special handling in particular in transfer_map:

 1) z32_s8x24.. gl/gallium treats this as a single buffer with depth
    and stencil interleaved but hardware frequently treats this as
    separate z32 and s8 buffers.  Special pack/unpack handling is
    needed in transfer_map/unmap to pack/unpack the exposed buffer

 2) fake RGTC.. GPUs designed with GLES in mind, but which can other-
    wise do GL3, if native RGTC is not supported it can be emulated
    by converting to uncompressed internally, but needs pack/unpack
    in transfer_map/unmap

 3) MSAA resolves in the transfer_map() case

v2: add MSAA resolve based on Eric's "gallium: Add helpers for MSAA
    resolves in pipe_transfer_map()/unmap()." patch; avoid wrapping
    pipe_resource, to make it possible for drivers to use both this
    and threaded_context.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-12-15 08:09:44 -05:00
Tapani Pälli
eac1aad624 i965: enable EXT_disjoint_timer_query extension
Following dEQP cases pass:
   dEQP-EGL.functional.get_proc_address.extension.gl_ext_disjoint_timer_query
   dEQP-EGL.functional.client_extensions.disjoint

Piglit test 'ext_disjoint_timer_query-simple' passes with these changes.

No changes/regression observed in Intel CI.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-12-15 08:42:48 +02:00
Tapani Pälli
33f73345da mesa: GL_EXT_disjoint_timer_query extension API bits
Patch adds GL_GPU_DISJOINT_EXT and enables to use timer queries when
EXT_disjoint_timer_query is enabled.

v2: enable extension only when EXT_disjoint_timer_query set

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-12-15 08:42:48 +02:00
Tapani Pälli
0a202dd5e8 glapi: add GL_EXT_disjoint_timer_query
Most entrypoints already available via other extensions like
GL_EXT_occlusion_query_boolean, GL_EXT_timer_query.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-12-15 08:42:48 +02:00
Tapani Pälli
80d96ca4c8 mesa: add DisjointOperation to gl_shared_state
This state will be used by EXT_disjoint_timer_query. As first
usage, patch sets DisjointOperation true when gpu reset happens.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-12-15 08:42:48 +02:00
Eric Anholt
49e2586bfc broadcom/vc5: Fix a typo in memcmp for sig unpack checking.
This shockingly ended up working out, because only the first byte of *sig
is used and (sizeof(*sig) != 0) == 1.  Fixes a compiler warning.

Link: https://bugs.freedesktop.org/show_bug.cgi?id=104183
2017-12-14 14:36:24 -08:00
Eric Anholt
1171f1749d broadcom/vc5: Enable NIR txd lowering on all txd instructions.
Fixes almost all of piglit's arb_shader_texture_lod grad tests, except for
the base -texgrad/texgradcube ones which fail on what appear to be
precision problems.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-12-14 14:36:17 -08:00
Eric Anholt
0bead224fe nir: Add a new lowering option to lower all txd to txl.
VC5 requires that all txd are lowered in the shader.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-12-14 14:36:17 -08:00
Eric Anholt
b08b628994 nir: Fix interaction of GL_CLAMP lowering with texture offsets.
We want the clamping of the coordinate to apply after the offset, so we
need to do math to lower the offset out of the instruction.  Fixes texwrap
offset cases for GL_CLAMP with GL_NEAREST on vc5.

Note: I moved the get_texture_size() verbatim, so that it was defined
before use.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-12-14 14:36:17 -08:00
Eric Anholt
52f024b052 broadcom/vc5: Fix shader input/outputs for gallium's new NIR linking. 2017-12-14 14:36:17 -08:00
Roland Scheidegger
1ae48963f7 gallivm: implement accurate corner behavior for textureGather with cube maps
The spec says the missing texel (when we wrap around both x and y axis)
should be synthesized as the average of the 3 other texels. For bilinear
filtering however we instead adjusted the filter weights (because, while
the complexity looks similar, there would be 4 times as many color values
to fix up than weights). Obviously this could not work for gather (hence
accurate corner filtering was disabled with gather).
Implement this by just doing it as the spec implies - calculate the 4th
texel as the average of the other 3. With gather of course there's only
one color to worry about, so it's not all that many instructions neither
(albeit surely the whole cube map filtering is hilariously complex).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-12-14 22:59:55 +01:00
Roland Scheidegger
a485ad0bcd gallivm: fix an issue with NaNs with seamless cube filtering
Cube texture wrapping is a bit special since the values (post face
projection) always are within [0,1], so we took advantage of that and
omitted some clamps.
However, we can still get NaNs (either because the coords already had NaNs,
or the face projection generated them), and in fact we didn't handle them
quite safely. I've seen -INT_MAX + 1 been propagated through as the final int
coord value, albeit I didn't observe a crash. (Not quite a coincidence, since
any stride mul with -INT_MAX or -INT_MAX+1 will turn up as a small positive
number - nevertheless, I'd rather not try my luck, I'm not entirely sure it
can't really turn up negative neither due to seamless coord swapping, plus
ifloor of a NaN is not guaranteed to return -INT_MAX by any standard. And
we kill off NaNs similarly with ordinary texture wrapping too.)
So kill off the NaNs by using the common max against zero method.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-12-14 22:59:55 +01:00
Jason Ekstrand
4b8c9ea46b intel/tools: Convert aubinator over to the common framework
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:24 -08:00
Jason Ekstrand
35f9c27be3 intel/batch-decoder: Decode registers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:22 -08:00
Jason Ekstrand
81e4ecbc19 intel/batch-decoder: Decode dynamic state
Unfortunately, in aubinator and aubinator_error_decode we don't always
know how many of a given state we have, so we must guess.  One day,
we'll come up with a way to annotate the batch to solve this problem.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:20 -08:00
Jason Ekstrand
4ac2ee9001 intel/batch-decoder: Decode constants, binding tables, and samplers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:18 -08:00
Jason Ekstrand
d374423eab intel/tools: Switch aubinator_error_decode over to the gen_print_batch
The shared framework can now do everything that aubinator_error_decode
ever did and more.  It's time to make the switch.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:16 -08:00
Jason Ekstrand
c86671c438 intel/batch-decoder: Decode graphics shaders
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:15 -08:00
Jason Ekstrand
d4081fb778 intel/batch-decoder: Decode vertex and index buffers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:13 -08:00
Jason Ekstrand
e27ec208ed intel/batch-decoder: Decode MEDIA_INTERFACE_DESCRIPTOR_LOAD
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:12 -08:00
Jason Ekstrand
be20043d00 intel/tools: Add the start of a generic batch decoder
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:10 -08:00
Jason Ekstrand
4cb96fbd91 intel/decoder: Expose the raw field value in the iterator
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:09 -08:00
Jason Ekstrand
79269e8f4b intel/disasm: Take a devinfo in gen_disasm_create
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:06 -08:00
Jason Ekstrand
a7ae72032f intel/decoder: Take a bit offset in gen_print_group
Previously, if a group was nested in another group such that it didn't
start on a dword boundary, we would decode it as if it started at the
start of its first dword.  This changes things to work even more in
terms of bits so that we can properly decode these structs.  This
affects MOCS, attribute swizzles, and several other things.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:04 -08:00
Jason Ekstrand
dca8f466ee intel/decoder: Stop rounding down to the nearest dword
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:03 -08:00
Jason Ekstrand
f264640693 intel/decoder: Convert the iterator to work entirely in bits
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:27:01 -08:00
Jason Ekstrand
ada705b671 intel/decoder: Drop gen_field_decode helper
It's unused

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-12-14 13:26:44 -08:00
Samuel Pitoiset
225b198802 amd/common: add ac_build_waitcnt()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:24:44 +01:00
Samuel Pitoiset
24601810e9 amd/common: more use of i32_1
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:24:42 +01:00
Samuel Pitoiset
ec4e566560 amd/common: more use of i32_0
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:24:41 +01:00
Samuel Pitoiset
d43e72fd8c radeonsi: make use of ac_build_fdiv()
And move the comment to amd/common.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:24:38 +01:00
Samuel Pitoiset
88522e2bcd radv: export SampleMask from pixel shaders at full rate
Use 16_ABGR instead of 32_ABGR if Z isn't written.

Ported from RadeonSI.

No CTS regressions on Polaris.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:23:28 +01:00
Samuel Pitoiset
45872a0a6d radeonsi: make use of ac_get_spi_shader_z_format()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:23:25 +01:00
Samuel Pitoiset
91f4d746e4 amd/common: add ac_get_spi_shader_z_format()
ac_shader_util.c will contain shader helpers for RadeonSI
and RADV.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:23:23 +01:00
Samuel Pitoiset
90c3bf0789 radv: do not load the local invocation index when it's unused
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:22:26 +01:00
Samuel Pitoiset
2294d35b24 radv: do not load unused gl_LocalInvocationID/gl_WorkGroupID components
We should also not load the input SGPRs and VGPRS, but
let's start with this for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:22:06 +01:00
Samuel Pitoiset
e001944410 amd/common: scan which components of gl_LocalInvocationID are used
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:22:04 +01:00
Samuel Pitoiset
42285ed8c3 amd/common: scan which components of gl_WorkGroupID are used
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:22:02 +01:00
Samuel Pitoiset
5a761167f5 radv: set FORCE_SIMD_DIST(1) for compute when profitable
Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:20:59 +01:00
Samuel Pitoiset
75b1c4997f radv: calculate best compute resource limits
Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:20:57 +01:00
Samuel Pitoiset
9fdc1437ba radv: store the dispatch initiator into the device
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:20:55 +01:00
Samuel Pitoiset
2e58ef46a8 radv: replace grid_components_used by uses_grid_size
Use a boolean instead because the number of needed SGPRs
is always 3.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:19:42 +01:00