Commit graph

4083 commits

Author SHA1 Message Date
Iago Toral Quiroga
aaae24179f intel/compiler: fix ddy for half-float in Broadwell
Broadwell has restrictions that apply to Align16 half-float that
make the Align16 implementation of this invalid for this platform.
Use the gen11 path for this instead, which uses Align1 mode.

The restriction is not present in cherryview, gen9 or gen10, where
the Align16 implementation seems to work just fine.

v2:
 - Rework the comment in the code, move the PRM citation from the
   commit message to the comment in the code (Matt)
 - Cherryview isn't affected, only Broadwell (Matt)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
60c7c6d3ba intel/compiler: fix ddx and ddy for 16-bit float
We were assuming 32-bit elements. Also, In SIMD8 we pack 2 vector components
in a single SIMD register, so for example, component Y of a 16-bit vec2
starts is at byte offset 16B. This means that when we compute the offset of
the elements to be differentiated we should not stomp whatever base offset we
have, but instead add to it.

v2
 - Use byte_offset() helper (Jason)
 - Merge the fix for SIMD8: using byte_offset() fixes that too.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
8f40d392b9 intel/compiler: set correct precision fields for 3-source float instructions
Source0 and Destination extract the floating-point precision automatically
from the SrcType and DstType instruction fields respectively when they are
set to types :F or :HF. For Source1 and Source2 operands, we use the new
1-bit fields Src1Type and Src2Type, where 0 means normal precision and 1
means half-precision. Since we always use the type of the destination for
all operands when we emit 3-source instructions, we only need set Src1Type
and Src2Type to 1 when we are emitting a half-precision instruction.

v2:
 - Set the bit separately for each source based on its type so we can
   do mixed floating-point mode in the future (Topi).

v3:
 - Use regular citation style for the comment referencing the PRM (Matt).
 - Decided not to add asserts in the emission code to check that only
   mixed HF/F types are used since such checks would break negative tests
   for brw_eu_validate.c (Matt)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
e6b7410187 intel/compiler: allow half-float on 3-source instructions since gen8
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
ee049f6b71 intel/compiler: don't compact 3-src instructions with Src1Type or Src2Type bits
We are now using these bits, so don't assert that they are not set. In gen8,
if these bits are set compaction is not possible. On gen9 and CHV platforms
set_3src_control_index() checks these bits (and others) against a table to
validate if the particular bit combination is eligible for compaction or not.

v2
 - Add more detail in the commit message explaining the situation for SKL+
   and CHV (Jason)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
120c970619 intel/compiler: add new half-float register type for 3-src instructions
This is available since gen8.

v2: restore previously existing assertion.

v3: don't use separate tables for gen7 and gen8, just assert that we
    don't use half-float before gen8 (Matt)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
4ab2b97a8f intel/compiler: add instruction setters for Src1Type and Src2Type.
The original SrcType is a 3-bit field that takes a subset of the types
supported for the hardware for 3-source instructions. Since gen8,
when the half-float type was added, 3-source floating point operations
can use use mixed precision mode, where not all the operands have the
same floating-point precision. While the precision for the first operand
is taken from the type in SrcType, the bits in Src1Type (bit 36) and
Src2Type (bit 35) define the precision for the other operands
(0: normal precision, 1: half precision).

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
a8d8b1a139 intel/compiler: drop unnecessary temporary from 32-bit fsign implementation
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
19cd2f5deb intel/compiler: implement 16-bit fsign
v2:
 - make 16-bit be its own separate case (Jason)

v3:
 - Drop the result_int temporary (Jason)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
4588f4a604 intel/compiler: handle extended math restrictions for half-float
Extended math with half-float operands is only supported since gen9,
but it is limited to SIMD8. In gen8 we lower it to 32-bit.

v2: quashed together the following patches (Jason):
  - intel/compiler: allow extended math functions with HF operands
  - intel/compiler: lower 16-bit extended math to 32-bit prior to gen9
  - intel/compiler: extended Math is limited to SIMD8 on half-float

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
  (allow extended math functions with HF operands,
   extended Math is limited to SIMD8 on half-float)
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
114f4e6c29 intel/compiler: lower some 16-bit float operations to 32-bit
The hardware doesn't support half-float for these.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
b6a454791b intel/compiler: assert restrictions on conversions to half-float
There are some hardware restrictions that brw_nir_lower_conversions should
have taken care of before we get here.

v2:
 - rebased on top of regioning lowering pass

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
66806405af intel/compiler: handle b2i/b2f with other integer conversion opcodes
Since we handle booleans as integers this makes more sense.

v2:
 - rebased to incorporate new boolean conversion opcodes

v3:
 - rebased on top regioning lowering pass

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v2)
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
92f4761198 intel/compiler: split float to 64-bit opcodes from int to 64-bit
Going forward having these split is a bit more convenient since these two
groups have different restrictions.

v2:
 - Rebased on top of new regioning lowering pass.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
3e377c68f8 intel/compiler: add a NIR pass to lower conversions
Some conversions are not directly supported in hardware and need to be
split in two conversion instructions going through an intermediary type.
Doing this at the NIR level simplifies a bit the complexity in the backend.

v2:
 - Consider fp16 rounding conversion opcodes
 - Properly handle swizzles on conversion sources.

v3
 - Run the pass earlier, right after nir_opt_algebraic_late (Jason)
 - NIR alu output types already have the bit-size (Jason)
 - Use 'is_conversion' to identify conversion operations (Jason)

v4:
 - Be careful about the intermediate types we use so we don't lose
   range and avoid incorrect rounding semantics (Jason)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-18 11:05:18 +02:00
Juan A. Suarez Romero
b74e605cf4 meson: Add dependency on genxml to anvil genfiles
This fixes a race condition where anv_gen_files are executed before
genxml files, which causes a build failure

v2: add dependency on idep_genxml (Lionel)

Fixes: d1992255bb
       ("meson: Add build Intel "anv" vulkan driver")

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-17 15:49:55 +02:00
Lionel Landwerlin
baf59e40cd intel/perf: constify accumlator parameter
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2019-04-17 14:10:42 +01:00
Lionel Landwerlin
93dbe52ab0 intel/perf: drop counter size field
We can deduct the size from another field, let's just save some space.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2019-04-17 14:10:42 +01:00
Lionel Landwerlin
a646485c28 i965: perf: add mdapi pipeline statistics queries on gen10/11
The Gen10+ expected format adds an additional counter which we can't
disclose yet. We can still make the size of the expected query result
match.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2019-04-17 14:10:42 +01:00
Lionel Landwerlin
d855906366 intel/perf: stub gen10/11 missing definitions
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2019-04-17 14:10:42 +01:00
Lionel Landwerlin
d47cc4acbf i965: move mdapi guid into intel/perf
One more thing we want to share between the different APIs.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2019-04-17 14:10:42 +01:00
Lionel Landwerlin
b48d6d7471 i965: move mdapi result data format to intel/perf
We want to reuse this in Anv.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2019-04-17 14:10:42 +01:00
Lionel Landwerlin
2be07fc751 i965: move brw_timebase_scale to device info
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2019-04-17 14:10:42 +01:00
Lionel Landwerlin
41b54b5faf i965: move OA accumulation code to intel/perf
We'll want to reuse this in our Vulkan extension.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2019-04-17 14:10:42 +01:00
Lionel Landwerlin
f6bba7760f i965: move mdapi data structure to intel/perf
We'll want to reuse those structures later on.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2019-04-17 14:10:42 +01:00
Lionel Landwerlin
134e750e16 i965: extract performance query metrics
We would like to reuse performance query metrics in other APIs. Let's
make the query code dealing with the processing of raw counters into
human readable values API agnostic.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-17 14:10:42 +01:00
Lionel Landwerlin
603ddda622 i965: store device revision in gen_device_info
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-17 14:10:42 +01:00
Topi Pohjolainen
ea42ba36b9 intel/compiler/icl: Use tcs barrier id bits 24:30 instead of 24:27
Similarly to 1cc17fb731

Fixes gpu hangs with dEQP-VK.tessellation.shader_input_output.barrier

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2019-04-17 14:55:49 +03:00
Jason Ekstrand
583a4d9a27 intel/mi_builder: Disable mem_mem tests on IVB
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
2019-04-16 12:59:12 -05:00
Jason Ekstrand
56d9532316 intel/mi_builder: Re-order an initializer
The order doesn't matter in C99 but some C++ compilers seem to care.

Tested-by: Clayton Craft <clayton.a.craft@intel.com>
2019-04-16 12:07:15 -05:00
Kenneth Graunke
fad7801afd i965: Move program key debugging to the compiler.
The i965 driver has a bunch of code to compare two sets of program keys
and print out the differences.  This can be useful for debugging why a
shader needed to be recompiled on the fly due to non-orthogonal state
dependencies.  anv doesn't do recompiles, so we didn't need to share
this in the past - but I'd like to use it in iris.

This moves the bulk of the code to the compiler where it can be reused.
To make that possible, we need to decouple it from i965 - we can't get
at the brw program cache directly, nor use brw_context to print things.
Instead, we use compiler->shader_perf_log(), and simply pass in keys.

We put all of this debugging code in brw_debug_recompile.c, and only
export a single function, for simplicity.  I also tidied the code a
bit while moving it, now that it all lives in one file.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-04-16 09:01:15 -07:00
Tapani Pälli
624789e370 compiler/glsl: handle case where we have multiple users for types
Both Vulkan and OpenGL might be using glsl_types simultaneously or we
can also have multiple concurrent Vulkan instances using glsl_types.
Patch adds a one time init to track number of users and will release
types only when last user calls _glsl_type_singleton_decref().

This change fixes glsl_type memory leaks we have with anv driver.

v2: reuse hash_mutex, cleanup, apply fix also to radv driver and
    rename helper functions (Jason)

v3: move init, destroy to happen on GL context init and destroy

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-16 12:58:00 +03:00
Danylo Piliaiev
04508f57d1 intel/compiler: Do not reswizzle dst if instruction writes to flag register
If we write to the flag register changing the swizzle would change
what channels are written to the flag register.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110201
Fixes: 4cd1a0be
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: <ian.d.romanick@intel.com>
2019-04-16 09:42:08 +00:00
Dylan Baker
95aefc94a9 Delete autotools
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Matt Turner <mattst88@gmail.com>
2019-04-15 13:44:29 -07:00
Jason Ekstrand
90108deb27 anv: Update to use the new features struct names
These were updated in version 1.1.106 of vulkan.h to make more sense
with the extension names.  We may as well keep with the times.

Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-15 13:25:43 +00:00
Kenneth Graunke
8bf9b7b5b6 intel: Emit 3DSTATE_VF_STATISTICS dynamically
Pipeline statistics queries should not count BLORP's rectangles.

    (23) How do operations like Clear, TexSubImage, etc. affect the
         results of the newly introduced queries?

      DISCUSSION: Implementations might require "helper" rendering
      commands be issued to implement certain operations like Clear,
      TexSubImage, etc.

      RESOLVED: They don't. Only application submitted rendering
      commands should have an effect on the results of the queries.

Piglit's arb_pipeline_statistics_query-vert_adj exposes this bug when
the driver is hacked to always perform glBufferData via a GPU staging
copy (for debugging purposes).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-14 19:58:04 -07:00
Karol Herbst
14531d676b nir: make nir_const_value scalar
v2: remove & operator in a couple of memsets
    add some memsets
v3: fixup lima

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
2019-04-14 22:25:56 +02:00
Jason Ekstrand
9b1e4bab6b nir/builder: Add a nir_imm_zero helper
v2: replace nir_zero_vec with nir_imm_zero (Karol Herbst)

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-04-14 22:25:56 +02:00
Karol Herbst
daaf777376 nir/builder: Move nir_imm_vec2 from blorp into the builder
While we're here, fix a typo which caused it to actually return a vec4
with the third and fourth components zero.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-14 22:25:56 +02:00
Karol Herbst
bbf2ecaf35 intel/nir: use nir_src_is_const and nir_src_as_uint
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-14 22:25:56 +02:00
Jason Ekstrand
6b1c398bcb intel/nir: Take a nir_tex_instr and src index in brw_texture_offset
This makes things a bit simpler and it's also more robust because it no
longer has a hard dependency on the offset being a 32-bit value.
2019-04-14 22:25:56 +02:00
Lionel Landwerlin
9e7b0988d6 anv: leave the top 4Gb of the high heap VMA unused
In 628c9ca908 I forgot to apply the same -4Gb of the high address
of the high heap VMA. This was previously computed in the
HIGH_HEAP_MAX_ADDRESS.

Many thanks to James for pointing this out.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Xiong, James <james.xiong@intel.com>
Fixes: 628c9ca908 ("anv: store heap address bounds when initializing physical device")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-13 12:08:23 +00:00
Sagar Ghuge
066d2aebc0 intel/fs: Remove unused condition from opt_algebraic case
We will never hit a condition where we have src1 and src2 as immediate
operands.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-12 13:47:57 -07:00
Jason Ekstrand
7eaaff18cb anv/pipeline: Fix MEDIA_VFE_STATE::PerThreadScratchSpace on gen7
We were always programming it with the Broadwell convention which is too
large by a factor of two on Haswell and just plain wrong on IVB and BYT.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2019-04-12 16:08:35 +00:00
Karol Herbst
4a3c04a11f glsl/nir: add support for lowering bindless images_derefs
v2: handle atomics as well
    make use of nir_rewrite_image_intrinsic
v3: remove call to nir_remove_dead_derefs
v4: (Timothy Arceri) dont actually call lowering yet

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v3)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Timothy Arceri
035759b61b nir/i965/freedreno/vc4: add a bindless bool to type size functions
This required to calculate sizes correctly when we have bindless
samplers/images.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Karol Herbst
3b2a9ffd60 nir: move brw_nir_rewrite_image_intrinsic into common code
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Lionel Landwerlin
628c9ca908 anv: store heap address bounds when initializing physical device
We can then reuse those bounds to initialize the VMA heaps at logical
device creation.

This fixes an issue on EHL which has only 36bits of VMA. We were
incorrectly using the fixed 48bits upper bound to initialize the
logical device heap, resulting in addresses beyong the device's
limits.

v2: Don't confuse heap size (limited by system memory) and VMA size
   (limited by number of addressing bits the platform has)

v3: Fix low heap vma_size :( (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: James Xiong <james.xiong@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
2019-04-11 22:56:43 +01:00
Jason Ekstrand
316a98dec9 intel/common: Support bigger right-shifts with mi_builder
Because why not?
2019-04-11 18:04:09 +00:00
Jason Ekstrand
0d6dea0ac8 anv/cmd_buffer: Use gen_mi_sub instead of gen_mi_add with a negative
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-11 18:04:09 +00:00