Commit graph

92185 commits

Author SHA1 Message Date
Kenneth Graunke
6142c3e298 i965/drm: Remove dead return in brw_bo_busy()
If ret is 0, we return.  If ret is not 0, we return.  This is dead.

CID: 1405013 (Structurally dead code (UNREACHABLE))

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-16 22:58:22 -07:00
Mauro Rossi
8c79dbe94e android: amd/addrlib: trivial fix for gfx9 support
Fixes the following build error:

external/mesa/src/amd/addrlib/gfx9/gfx9addrlib.cpp:36:10: fatal error: 'gfx9_gb_reg.h' file not found
         ^
1 error generated.

Fixes: 7f160ef "amd/addrlib: import gfx9 support"
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-17 14:04:21 +10:00
Jason Ekstrand
4cf079f7f2 nir: Add GLSL_TYPE_[U]INT64 to some switch statements
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-04-16 20:14:42 -07:00
Marek Olšák
2769dadb0f gallium/radeon: always flush asynchronously and wait after begin_new_cs
This hides the overhead of everything in the driver after the CS flush and
before returning from pipe_context::flush.
Only microbenchmarks will benefit.

+2% FPS for glxgears.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-17 01:22:11 +02:00
Marek Olšák
f05f0bb5cb radeonsi: remove local variable 'mod' from si_compile_tgsi_shader
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-17 01:22:11 +02:00
Marek Olšák
bd2cde0c25 radeonsi: add si_shader_selector::vs_needs_prolog
cleanup

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-17 01:22:11 +02:00
Marek Olšák
777f305840 radeonsi: don't set VGT_GS_MODE as part of the GS state
The VS state sets it.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-17 01:22:11 +02:00
Marek Olšák
5438e39fae radeonsi: don't allow user indices with indirect draws
Not possible with GL and it will make future gallium rework easier.
(also it's something I wouldn't like to support)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-17 01:22:11 +02:00
Marek Olšák
1c94d29984 radeonsi: merge two if (indirect) statements
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-17 01:22:11 +02:00
Marek Olšák
bdd6449769 radeonsi: don't mark non-dirty textures with CMASK as compressed
because the compression is skipped with non-dirty textures.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-17 01:22:11 +02:00
Bas Nieuwenhuizen
566f2ed571 docs: Document interaction Fixes tag and stable branches.
For the next time I forget.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-15 11:37:46 +02:00
Timothy Arceri
9f0dd85aa6 glsl: don't run the GLSL pre-processor when we are skipping compilation
This moves the hashing of shader source for the cache lookup to before
the preprocessor.  In our experience, shaders are unlikely to hash the
same after preprocessing if they didn't hash the same before, so we can
skip preprocessing for cache hits.

Improves Deus Ex start-up times with a warm cache from ~30 seconds to
~22 seconds.

Also fixes the leaking of state.

V2: fix indentation

v3: add the value of MESA_EXTENSION_OVERRIDE to the hash of the shader.

Tested-by (v2): Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-04-15 11:36:52 +10:00
Timothy Arceri
c2bc0aa7b1 glsl: delay optimisations on individual shaders when cache is available
Due to a max limit of 65,536 entries on the index table that we use to
decide if we can skip compiling individual shaders, it is very likely
we will have collisions.

To avoid doing too much work when the linked program may be in the
cache this patch delays calling the optimisations until link time.

Improves cold cache start-up times on Deus Ex by ~20 seconds.

When deleting the cache index to simulate a worst case scenario
of collisions in the index, warm cache start-up time improves by
~45 seconds.

V2: fix indentation, make sure to call optimisations on cache
fallback, make sure optimisations get called for XFB.

Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-15 11:36:44 +10:00
Jason Ekstrand
d2d6cf6c83 anv: Add the pci_id into the shader cache UUID
This prevents a user from using a cache created on one hardware
generation on a different one.  Of course, with Intel hardware, this
requires moving their drive from one machine to another but it's still
possible and we should prevent it.

Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
2017-04-14 17:41:07 -07:00
Philipp Zabel
36f2101723 etnaviv: native fence fd support
This adds native fence fd support to etnaviv, similarly to commit
0b98e84e9b ("freedreno: native fence fd"), enabled for kernel
driver version 1.1 or later.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-15 01:47:18 +02:00
Francisco Jerez
96dfc014fd docs: mark GL_ARB_vertex_attrib_64bit and OpenGL 4.2 as supported by i965/gen7+
v2 (Andreas Boll):
- Mark GL 4.1 as supported by i965/gen7+
- Mark GL_ARB_shader_precision as supported by i965/gen7+
- Update release notes

Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 16:13:21 -07:00
Juan A. Suarez Romero
1877982aca i965: enable OpenGL 4.2 in Ivybridge
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 16:13:21 -07:00
Samuel Iglesias Gonsálvez
92d4dc76ea i965: enable ARB_shader_precision in gen7+
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 16:13:21 -07:00
Juan A. Suarez Romero
0aed1212ae i965: enable ARB_vertex_attrib_64bit for gen7+
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 16:13:21 -07:00
George Kyriazis
b9d4256e11 swr: Fix swr osmesa build
Use GALLIUM_SWR to standardize

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-14 18:03:40 -05:00
Wladimir J. van der Laan
6a8d5ab932 etnaviv: SINGLE_BUFFER support on GC3000
This patch adds support for the SINGLE_BUFFER feature on GC3000
GPUs, which allows rendering to a single buffer using multiple pixel
pipes.

This feature is always used when it is available, which means that
multi-tiled formats are no longer being used in that case, and all
buffers will be normal (super)tiled. This mimics the behavior of the
blob on GC3000.

- Because the same format can be used to render to and texture from,
  this avoids an extra resolve pass when rendering to texture.

- i.MX6qp includes a PRE which can scan-out directly from tiled formats,
  avoiding untiling overhead.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-15 00:34:13 +02:00
Wladimir J. van der Laan
1dcb1d4925 etnaviv: Update includes from rnndb
Update to etna_viv commit 8486a97.

austriancoder: changed patch to include isa redefinition fix.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-15 00:34:08 +02:00
Wladimir J. van der Laan
9e4d049f40 etnaviv: Add chipMinorFeatures4 and 5
Request chipMinorFeatures bitfields 4 and 5 from the
drm driver.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-15 00:34:03 +02:00
Philipp Zabel
dda956340c etnaviv: resolve tile status when flushing resource
When passing render buffers from EGL clients to a wayland compositor,
the resource tile status must be resolved because otherwise the tile
status is lost in the transfer and cleared parts of the buffer will
contain old contents.

The same applies when sampling directly from a renderable resource.

lst: Add seqno tracking, to skip flush when not needed.

Fixes: aadcb5e94b35 ("etnaviv: enable TS, but disable autodisable")
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-15 00:15:30 +02:00
Philipp Zabel
f30aab7696 etnaviv: stop repeatedly resolving an unchanged resource into its scanout prime buffer
Before resolving a resource into its scanout prime buffer, check that
the prime resource is actually older. If it is not, the resolve is an
expensive no-op, and we better skip it.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-15 00:15:27 +02:00
George Kyriazis
d7a1f01db3 swr: Add polygon stipple support
Add polygon stipple functionality to the fragment shader.

Explicitly turn off polygon stipple for lines and points, since we
do them using tris.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-14 17:08:12 -05:00
Samuel Iglesias Gonsálvez
8973ae3162 docs/relnotes: add GL_ARB_gpu_shader_fp64 support on i965/ivybridge
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:10 -07:00
Samuel Iglesias Gonsálvez
ef49dda2df docs: mark GL_ARB_gpu_shader_fp64 and OpenGL 4.0 as supported by i965/gen7+
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:10 -07:00
Samuel Iglesias Gonsálvez
a494afdb8e i965: enable OpenGL 4.0 to Ivybridge/Baytrail
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:10 -07:00
Samuel Iglesias Gonsálvez
cd0a6b2fc2 i965: enable ARB_gpu_shader_fp64 for Ivybridge/Baytrail
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:09 -07:00
Matt Turner
2eeb1b0ad9 i965: Use correct VertStride on align16 instructions.
In commit c35fa7a, we changed the "width" of DF source registers to 2,
which is conceptually fine. Unfortunately a VertStride of 2 is not
allowed by align16 instructions on IVB/BYT, and the regular VertStride
of 4 works fine in any case.

See generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/vs-round-double.shader_test
for example:

cmp.ge.f0(8)    g18<1>DF        g1<0>.xyxyDF    -g8<2>DF        { align16 1Q };
        ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed
cmp.ge.f0(8)    g19<1>DF        g1<0>.xyxyDF    -g9<2>DF        { align16 2N };
        ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed

v2:
- Add spec quote (Curro).
- Change the condition to only BRW_VERTICAL_STRIDE_2 (Curro)

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez
d8441e2276 i965/vec4/dce: improve track of partial flag register writes
This is required for correctness in presence of multiple 4-wide flag
writes (e.g. 4-wide instructions with a conditional mod set) which
update a different portion of the same 8-bit flag subregister.

Right now we keep track of flag dataflow with 8-bit granularity and
consider flag writes to have killed any previous definition of the
same subregister even if the write was less than 8 channels wide,
which can cause live flag register updates to be dead
code-eliminated incorrectly.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez
c1fc8fad47 i965/vec4: don't do horizontal stride on some register file types
horiz_offset() shouldn't be doing anything for scalar registers,
because all channels of any SIMD instructions will end up reading or
writing the same component of the register, so shifting the register
offset would be wrong.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Re-implement in terms of is_uniform() for
  simplicity.  Pass argument by const reference.  Clarify commit
  message. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:09 -07:00
Matt Turner
21e8e3a848 i965/vec4: Fix exec size for MOVs {SET,PICK}_{HIGH,LOW}_32BIT.
Otherwise for a pack_double_2x32_split opcode, we emit:

   vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134
mov(8)          g5<1>UD         g5<4>.xUD                       { align16 1Q compacted };
mov(8)          g7<2>UD         g5<4,4,1>UD                     { align1 1Q };
        ERROR: When the destination spans two registers, the source must span two registers
               (exceptions for scalar source and packed-word to packed-dword expansion)
mov(8)          g8<2>UD         g5.4<4,4,1>UD                   { align1 2N };
        ERROR: The offset from the two source registers must be the same
mov(8)          g5<1>UD         g6<4>.xUD                       { align16 1Q compacted };
mov(8)          g7.1<2>UD       g5<4,4,1>UD                     { align1 1Q };
        ERROR: When the destination spans two registers, the source must span two registers
               (exceptions for scalar source and packed-word to packed-dword expansion)
mov(8)          g8.1<2>UD       g5.4<4,4,1>UD                   { align1 2N };
        ERROR: The offset from the two source registers must be the same

The intention was to emit mov(4)s for the instructions that have ERROR
annotations.

See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test
for example.

v2 (Samuel):
- Instead of setting the exec size to a fixed value, don't double it
(Curro).
- Add PICK_{HIGH,LOW}_32BIT to the condition.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Trivial rebase changes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez
f030aaf2fb i965/vec4: use vec4_builder to emit instructions in setup_imm_df()
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Drop useless vec4_visitor dependencies.  Demote to
  static stand-alone function.  Don't write unused components in the
  result.  Use vec4_builder interface for register allocation. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:09 -07:00
Juan A. Suarez Romero
a907c91e93 i965/vec4: consider subregister offset in live variables
Take into account offset values less than a full register (32 bytes)
when getting the var from register.

This is required when dealing with an operation that writes half of the
register (like one d2x in IVB/BYT, which uses exec_size == 4).

v2:
- Take in account this offset < 32 in liveness analysis too (Curro)

v3:
- Change formula in var_from_reg() (Curro)
- Remove useless changes (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Francisco Jerez
92649a3e67 i965/vec4: fix assert to detect SIMD lowered DF instructions in IVB
On IVB, DF instructions have lowered the SIMD width to 4 but the
exec_size will be later doubled. Fix the assert to avoid crashing in
this case.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Simplify assert.  Except for the 'inst->group % 4
  == 0' part the assertion was redundant with the previous assertion. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez
6e3265eae5 i965/vec4: split VEC4_OPCODE_FROM_DOUBLE into one opcode per destination's type
This way we can set the destination type as double to all these new opcodes,
avoiding any optimizer's confusion that was happening before.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Drop no_spill workaround originally needed due to
  the bogus destination type of VEC4_OPCODE_FROM_DOUBLE. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez
50a5217637 i965/vec4: split d2x conversion and data gathering from one opcode to two explicit ones
When doing a 64-bit to a smaller data type size conversion, the destination should
be aligned to 64-bits. Because of that, we need to gather the data after the
actual conversion.

Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but
now we split them explicitely in two different instructions:
VEC4_OPCODE_FROM_DOUBLE just do the conversion and
VEC4_OPCODE_PICK_LOW_32BIT will gather the data.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero
cfaf14a126 i965/vec4: fix VEC4_OPCODE_FROM_DOUBLE for IVB/BYT
In the generator we must generate slightly different code for
Ivybridge/Baytrail, because of the way the stride works in
this hardware.

v2:
- Use stride and don't need to fix dst (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero
be445d3ea3 i965/vec4: keep original type when dealing with null registers
Keep the original type when dealing with null registers. Especially
because we do no want to introduce an implicit conversion between
types that could affect the conditional flags.

This affects especially when the original type is DF, and we are working
on Ivybridge/Baytrail.

v2 (Curro)
- Fix typo.
- Use retype() instead of applying the type directly.
- Remove unneeded retype.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez
a21dc2b500 i965/vec4: split DF instructions and later double its execsize in IVB/BYT
We need to split DF instructions in two on IVB/BYT as it needs an
execsize 8 to process 4 DF values (one GRF in total).

v2:
- Rename helper and make it static inline function (Matt).
- Fix indention and add braces (Matt).

v3:
- Don't edit IR instruction when doubling exec_size (Curro)
- Add comment into the code (Curro).
- Manage ARF registers like the others (Curro)

v4:
- Add get_exec_type() function and use it to calculate the execution
  size.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check.  Take
  destination type as execution type where there is no valid source.
  Assert-fail if the deduced execution type is byte.  Clarify comment
  in get_lowered_simd_width().  Move SIMD width workaround outside of
  'if (...inst->size_written > REG_SIZE)' conditional block, since the
  problem should be independent of whether the amount of data written
  by the instruction is greater or lower than a GRF.  Drop redundant
  is_ivb_df definition.  Drop bogus inst->exec_size < 8 check.
  Simplify channel group assertion. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez
a5399e8b1c i965/fs: lower all non-force_writemask_all DF instructions to SIMD4 on IVB/BYT
The hardware applies the same channel enable signals to both halves of
the compressed instruction which will be just wrong under non-uniform
control flow. Fix this by splitting those instructions to SIMD4.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Francisco Jerez
ebfb703d44 i965/fs: Get 64-bit indirect moves working on IVB.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-04-14 14:56:08 -07:00
Matt Turner
630b84cdc8 i965: Use source region <1,2,0> when converting to DF.
Doing so allows us to use a single MOV in VEC4_OPCODE_TO_DOUBLE instead
of two.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero
3198ce3f96 i965/fs: fix lower SIMD width for IVB/BYT's MOV_INDIRECT
According to the IVB and HSW PRMs:

"2.When the destination requires two registers and the sources are
 indirect, the sources must use 1x1 regioning mode."

So for DF instructions the execution size is not limited by the number
of address registers that are available, but by the EU decompression
logic not handling VxH indirect addressing correctly.

This patch limits the SIMD width to 4 in this case.

v2:
- Fix typo (Matt).
- Fix condition (Curro)

v3:
- Add spec quote (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero
571cbd05eb i965/fs: fix dst stride in IVB/BYT type conversions
When converting a DF to 32-bit conversions, we set dst stride to 2,
to fulfill alignment restrictions because the upper Dword of every
Qword will be written with undefined value.

But in IVB/BYT, this is not necessary, as each DF conversion already
writes 2, the first one the real value, and the second one a 0.
That is, IVB/BYT already set stride = 2 implicitly, so we must set it to
1 explicitly to avoid ending up with stride = 4.

v2:
- Fix typo (Matt)

v3:
- Fix stride in the destination's brw_reg, don't modity IR (Curro)

v4:
- Remove 'is_dst' argument of brw_reg_from_fs_reg() (Curro)
- Fix comment (Curro).
- Relax hstride assert (Curro)

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Minor spelling fixes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez
af6fc3a8ea i965/fs: rename lower_d2x to lower_conversions
v2:
- Change the name to lower_conversions.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez
dee31311eb Revert "i965/fs: Don't emit SEL instructions for type-converting MOVs."
This reverts commit 7dccd38b40.

d2x pass fixes SEL instructions when there is a type conversion
by doing a SEL without type conversion and then convert the result.
This pass also takes into account the non-uniform control flow.

Then, 7dccd38b40 is not needed anymore.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez
aeecc82d05 i965/fs: generalize the legalization d2x pass
Generalize it to lower any unsupported narrower conversion.

v2 (Curro):
- Add supports_type_conversion()
- Reuse existing intruction instead of cloning it.
- Generalize d2x to narrower and equal size conversions.

v3 (Curro):
- Make supports_type_conversion() const and improve it.
- Use foreach_block_and_inst to process added instructions.
- Simplify code.
- Add assert and improve comments.
- Remove redundant mov.
- Remove useless comment.
- Remove saturate == false assert and add support for saturation
  when fixing the conversion.
- Add get_exec_type() function.

v4 (Curro):
- Use get_exec_type() function to get sources' type.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00