Commit graph

18 commits

Author SHA1 Message Date
Matt Turner
b541945c20 i965/fs: Unpack count argument to 64-bit shift ops on Atom
64-bit operations on Atom parts have additional restrictions over their
big-core counterparts (validated by later patches).

Specifically, the restriction that "Source and Destination horizontal
stride must be aligned to the same qword" is violated by most shift
operations since NIR uses a 32-bit value as the shift count argument,
and this causes instructions like

   shl(8)          g19<1>Q         g5<4,4,1>Q      g23<4,4,1>UD

where src1 has a 32-bit stride, but the dest and src0 have a 64-bit
stride.

This caused ~4 pixels in the ARB_shader_ballot piglit test
fs-readInvocation-uint.shader_test to be incorrect. Unfortunately no
ARB_gpu_shader_int64 test hit this case because they operate on
uniforms, and their scalar regions are an exception to the restriction.

We work around this by effectively unpacking the shift count, so that we
can read it with a 64-bit stride in the shift instruction. Unfortunately
the unpack (a MOV with a dst stride of 2) is a partial write, and cannot
be copy-propagated or CSE'd.

Bugzilla: https://bugs.freedesktop.org/101984
2017-10-04 14:08:54 -07:00
Iago Toral Quiroga
47e527bd81 i965/fs: force pull model for 64-bit GS inputs
Triggering the push model when 64-bit inputs are involved is not easy due to
the constrains on the maximum number of registers that we allow for this mode,
however, for GS with 'points' primitive type and just a couple of double
varyings we can trigger this and it just doesn't work because the
implementation is not 64-bit aware at all. For now, let's make sure that we
don't attempt this model whith 64-bit inputs and we always fall back to pull
model for them.

Also, don't enable the VUE handles in the thread payload on the fly when we
find an input for which we need the pull model, this is not safe: if we need
to resort to the pull model we need to account for that when we setup the
thread payload so we compute the first non-payload register properly. If we
didn't do that correctly and we enable it on-the-fly here then we will end up
VUE handles on the first non-payload register which will probably lead to
GPU hangs. Instead, always enable the VUE handles for the pull model so we
can safely use them when needed. The GS is going to resort to pull model
almost in every situation anyway, so this shouldn't make a significant
difference and it makes things easier and safer.

v2: Always enable the VUE handles for pull model, this is easier and safer
    and the GS is going to fallback to pull model almost always anyway (Ken)

v3: Only clamp the URB read length if we are over the maximum reserved for
    push inputs as we were doing in the original code (Ken).

v4: No need to clamp the urb read length if invocations > 1

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-09-29 08:18:25 +02:00
Matt Turner
069bf7c907 i965/fs: Match destination type to size for ballot
No use in taking a 64-bit value when we know the high 32-bits are zero.
2017-07-20 16:56:50 -07:00
Matt Turner
782ef30451 i965/fs: Implement ARB_shader_ballot operations
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner
43ef75b394 nir: Add system values from ARB_shader_ballot
We already had a channel_num system value, which I'm renaming to
subgroup_invocation to match the rest of the new system values.

Note that while ballotARB(true) will return zeros in the high 32-bits on
systems where gl_SubGroupSizeARB <= 32, the gl_SubGroup??MaskARB
variables do not consider whether channels are enabled. See issue (1) of
ARB_shader_ballot.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner
ee9fa4ac18 i965/fs: Implement ARB_shader_group_vote operations
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Kenneth Graunke
b2da123801 i965: Use pushed UBO data in the scalar backend.
This actually takes advantage of the newly pushed UBO data, avoiding
pull loads.

Improves performance in GLBenchmark Manhattan 3.1 by:

   HSW: ~1%, BDW/SKL/KBL GT2: 3-4%, SKL GT4: 7-8%, APL: 4-5%.
   (thanks to Eero Tamminen for these numbers)

shader-db results on Skylake, ignoring programs with spill/fill changes:

   total instructions in shared programs: 13963994 -> 13651893 (-2.24%)
   instructions in affected programs: 4250328 -> 3938227 (-7.34%)
   helped: 28527
   HURT: 0

   total cycles in shared programs: 179808608 -> 172535170 (-4.05%)
   cycles in affected programs: 79720410 -> 72446972 (-9.12%)
   helped: 26951
   HURT: 1248

   LOST:   46
   GAINED: 21

Many "Deus Ex: Mankind Divided" shaders which already spilled end up
spill a lot more (about 240 programs hurt, 9 helped).  The cycle
estimator suggests this is still overall a win (-0.23% in cycle counts)
presumably because we trade pull loads for fills.

v2: Drop "PULL" environment variable left in for initial debugging
    (caught by Matt).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-13 20:18:54 -07:00
Lionel Landwerlin
226fae7849 intel/compiler: no need to check unsigned is >= 0
CID: 1338342
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-07-13 22:50:45 +01:00
Lionel Landwerlin
030abc6109 intel: compiler/i965: fix is_broxton checks
In 5f2fe9302c is_geminilake was introduced for the differenciate
broxton from geminilake. Unfortunately I failed as verifying that
is_broxton is throughout the code base to mean Gen9lp.

Fixes: 5f2fe9302c ("intel: common: add flag to identify platforms by name")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-20 23:26:42 +01:00
Jason Ekstrand
e31042ab40 i965/fs: Move remapping of gl_PointSize to the NIR level
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:08:06 -07:00
Jason Ekstrand
ca4d192802 i965/fs: Lower gl_VertexID and friends to inputs at the NIR level
NIR calls these system values but they come in from the VF unit as
vertex data.  It's terribly convenient to just be able to treat them as
such in the back-end.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Jason Ekstrand
5e832302dc i965: Move multiply by 4 for VS ATTR setup into the scalar backend.
The vec4 backend will want to count in units of vec4s, not scalar
components.  The simplest solution is to move the multiplication by 4
into the scalar backend.  This also improves consistency with how we
count varyings.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Jason Ekstrand
b86dba8a0e nir: Embed the shader_info in the nir_shader again
Commit e1af20f18a changed the shader_info
from being embedded into being just a pointer.  The idea was that
sharing the shader_info between NIR and GLSL would be easier if it were
a pointer pointing to the same shader_info struct.  This, however, has
caused a few problems:

 1) There are many things which generate NIR without GLSL.  This means
    we have to support both NIR shaders which come from GLSL and ones
    that don't and need to have an info elsewhere.

 2) The solution to (1) raises all sorts of ownership issues which have
    to be resolved with ralloc_parent checks.

 3) Ever since 00620782c9, we've been
    using nir_gather_info to fill out the final shader_info.  Thanks to
    cloning and the above ownership issues, the nir_shader::info may not
    point back to the gl_shader anymore and so we have to do a copy of
    the shader_info from NIR back to GLSL anyway.

All of these issues go away if we just embed the shader_info in the
nir_shader.  There's a little downside of having to copy it back after
calling nir_gather_info but, as explained above, we have to do that
anyway.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Jason Ekstrand
3503b2714b i965/fs: Always provide a default LOD of 0 for TXS and TXL
We already provide a default LOD for textureQueryLevels and texture() on
non-fragment stages.  However, there are more cases where one is needed
such as textureSize(gsampler2DMS*) in SPIR-V.  Instead of trying to list
out all of the cases one at a time, just provide the default for all TXS
and TXL operations.  This fixes a shader validation error in the new
Sascha deferredmultisampling demo which uses textureSize(gsampler2DMS).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100391
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-04 18:33:35 -07:00
Jason Ekstrand
762a6333f2 nir: Rework conversion opcodes
The NIR story on conversion opcodes is a mess.  We've had way too many
of them, naming is inconsistent, and which ones have explicit sizes was
sort-of random.  This commit re-organizes things and makes them all
consistent:

 - All non-bool conversion opcodes now have the explicit size in the
   destination and are named <src_type>2<dst_type><size>.

 - Integer <-> integer conversion opcodes now only come in i2i and u2u
   forms (i2u and u2i have been removed) since the only difference
   between the different integer conversions is whether or not they
   sign-extend when up-converting.

 - Boolean conversion opcodes all have the explicit size on the bool and
   are named <src_type>2<dst_type>.

Making things consistent also allows nir_type_conversion_op to be moved
to nir_opcodes.c and auto-generated using mako.  This will make adding
int8, int16, and float16 versions much easier when the time comes.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-03-14 07:36:40 -07:00
Jason Ekstrand
7107b32155 i965/fs: Re-arrange conversion operations
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-14 07:36:40 -07:00
Jason Ekstrand
b377be9213 i965/fs: Use num_components from the SSA def in image intrinsics
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2017-03-14 07:36:40 -07:00
Jason Ekstrand
700bebb958 i965: Move the back-end compiler to src/intel/compiler
Mostly a dummy git mv with a couple of noticable parts:
 - With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
 - Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
 - brw_util.[ch] is not really compiler specific, so it's moved to i965.

v2:
 - move brw_eu_defines.h instead of brw_defines.h
 - remove no-longer applicable includes
 - add missing vulkan/ prefix in the Android build (thanks Tapani)

v3:
 - don't list brw_defines.h in src/intel/Makefile.sources (Jason)
 - rebase on top of the oa patches

[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00
Renamed from src/mesa/drivers/dri/i965/brw_fs_nir.cpp (Browse further)