Commit graph

116 commits

Author SHA1 Message Date
Jason Ekstrand
f58e0405b6 intel/fs: Drop the gl_program from fs_visitor
It's not used by anything anymore now that so much lowering has been
moved into NIR.  Sadly, we still need on in brw_compile_gs() for
geometry shaders on Sandy Bridge.  Short of a lot of pointless work,
that one's probably not going away.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-25 01:02:52 -05:00
Jason Ekstrand
134607760a intel/compiler: Fill a compiler statistics struct
This commit is all annoying plumbing work which just adds support for a
new brw_compile_stats struct.  This struct provides a binary driver
readable form of the same statistics we dump out to stderr when we
INTEL_DEBUG is set with a shader stage.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Matt Turner
dabb5d4bee i965/fs: Add a shader_stats struct.
It'll grow further, and we'd like to avoid adding an additional
parameter to fs_generator() for each new piece of data.

v2 (idr): Rebase on 17 months.  Track a visitor instead of a cfg.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-30 14:35:43 -07:00
Jason Ekstrand
c84b8eeeac intel/compiler: Be more conservative about subgroup sizes in GL
The rules for gl_SubgroupSize in Vulkan require that it be a constant
that can be queried through the API.  However, all GL requires is that
it's a uniform.  Instead of always claiming that the subgroup size in
the shader is 32 in GL like we have to do for Vulkan, claim 8 for
geometry stages, the maximum for fragment shaders, and the actual size
for compute.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-24 12:55:40 -05:00
Jason Ekstrand
f62227f2b7 intel/nir: Make brw_nir_apply_sampler_key more generic
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-24 12:55:40 -05:00
Jason Ekstrand
14781e2122 intel/compiler: Add a "base class" for program keys
Right now, all keys have two things in common: a program string ID and a
sampler_prog_key_data.  I'd like to add another thing or two and need a
place to put it.  This commit adds a new brw_base_prog_key struct which
contains those two common bits.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-10 19:35:55 +00:00
Ian Romanick
47c2aa5b48 intel/vec4: Reswizzle VF immediates too
Previously, an instruction like

mul(8) vgrf29.xy:F, vgrf25.yxxx:F, [-1F, 1F, 0F, 0F]

would get rewritten as

mul(8) vgrf0.yz:F, vgrf25.yyxx:F, [-1F, 1F, 0F, 0F]

The latter does not produce the correct result.  The VF immediate in the
second should be either [-1F, -1F, 1F, 1F] or [0F, -1F, 1F, 0F].  This
commit produces the former.

Fixes: 1ee1d8ab46 ("i965/vec4: Reswizzle sources when necessary.")
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-08 11:30:10 -07:00
Jason Ekstrand
bb67a99a2d intel/nir: Stop returning the shader from helpers
Now that NIR_TEST_* doesn't swap the shader out from under us, it's
sufficient to just modify the shader rather than having to return in
case we're testing serialization or cloning.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-05 20:07:28 +00:00
Danylo Piliaiev
04508f57d1 intel/compiler: Do not reswizzle dst if instruction writes to flag register
If we write to the flag register changing the swizzle would change
what channels are written to the flag register.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110201
Fixes: 4cd1a0be
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: <ian.d.romanick@intel.com>
2019-04-16 09:42:08 +00:00
Mark Janes
2393cc7f00 intel/common: move gen_debug to intel/dev
libintel_common depends on libintel_compiler, but it contains debug
functionality that is needed by libintel_compiler.  Break the circular
dependency by moving gen_debug files to libintel_dev.

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-10 13:15:33 -07:00
Brian Paul
0de83bacf0 intel/compiler: silence unitialized variable warning in opt_vector_float()
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-08 10:23:11 -07:00
Jason Ekstrand
e8f863e718 intel/compiler: Re-prefix non-logical surface opcodes with VEC4
The scalar back-end uses SHADER_OPCODE_SEND for all surface messages so
we no longer need the non-logical opcodes there.  Prefix them VEC4 so
it's clear that they're only used by the vec4 back-end.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand
aeaba24fcb intel/compiler: Drop unused surface opcodes
The unused typed surface read/write support in the vec4 back-end has
been dropped and the fs back-end now uses SHADER_OPCODE_SEND for all
image and buffer ops.  There's no reason to keep these opcodes around
anymore.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Matt Turner
7e4e9da90d intel/compiler: Prevent warnings in the following patch
The next patch replaces an unsigned bitfield with a plain unsigned,
which triggers gcc to begin warning on signed/unsigned comparisons.

Keeping this patch separate from the actual move allows bisectablity and
generates no additional warnings temporarily.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-01-09 16:42:41 -08:00
Ian Romanick
9a83c3d3b3 i965/fs: Eliminate unary op on operand of compare-with-zero
The (-abs(x) >= 0) => (x == 0) optimization is removed from the vec4 and
scalar parts. In the VS part, adding the new pattern was not
helpful. The pattern that is removed is really old, and it has been
handled by NIR for ages.

All Gen7+ platforms had similar results. (Broadwell shown)
total instructions in shared programs: 14715715 -> 14715709 (<.01%)
instructions in affected programs: 474 -> 468 (-1.27%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.12% max: 1.35% x̄: 1.28% x̃: 1.35%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.40% -1.15%
Instructions are helped.

total cycles in shared programs: 559569911 -> 559569809 (<.01%)
cycles in affected programs: 5963 -> 5861 (-1.71%)
helped: 6
HURT: 0
helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17
helped stats (rel) min: 1.45% max: 1.88% x̄: 1.73% x̃: 1.85%
95% mean confidence interval for cycles value: -18.15 -15.85
95% mean confidence interval for cycles %-change: -1.95% -1.51%
Cycles are helped.

Iron Lake and Sandy Bridge had similar results. (Iron Lake shown)
total instructions in shared programs: 7780915 -> 7780913 (<.01%)
instructions in affected programs: 246 -> 244 (-0.81%)
helped: 2
HURT: 0

total cycles in shared programs: 177876108 -> 177876106 (<.01%)
cycles in affected programs: 3636 -> 3634 (-0.06%)
helped: 1
HURT: 0

GM45
total instructions in shared programs: 4799152 -> 4799151 (<.01%)
instructions in affected programs: 126 -> 125 (-0.79%)
helped: 1
HURT: 0

total cycles in shared programs: 122052654 -> 122052652 (<.01%)
cycles in affected programs: 3640 -> 3638 (-0.05%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Kenneth Graunke
562448b75a i965: Do NIR shader cloning in the caller.
This moves nir_shader_clone() to the driver-specific compile function,
rather than the shared src/intel/compiler code.  This allows i965 to do
key-specific passes before calling brw_compile_*.  Vulkan should not
need this cloning as it doesn't compile multiple variants.

We do need to continue cloning in the compute shader code because we
lower various things in NIR based on the SIMD width.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2018-11-20 15:53:46 -08:00
Jason Ekstrand
4ba445e011 intel: Don't propagate conditional modifiers if a UD source is negated
This fixes a bug uncovered by my NIR integer division by constant
optimization series.

Fixes: 19f9cb72c8 "i965/fs: Add pass to propagate conditional..."
Fixes: 627f94b72e "i965/vec4: adding vec4_cmod_propagation..."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-10-10 13:13:12 -05:00
Dylan Baker
8396043f30 Replace uses of _mesa_bitcount with util_bitcount
and _mesa_bitcount_64 with util_bitcount_64. This fixes a build problem
in nir for platforms that don't have popcount or popcountll, such as
32bit msvc.

v2: - Fix additional uses of _mesa_bitcount added after this was
      originally written

Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-09-07 10:21:26 -07:00
Caio Marcelo de Oliveira Filho
7df5f62768 intel/compiler: silence -Wclass-memaccess warnings
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2018-07-18 08:29:51 -07:00
Ian Romanick
f8e54d02f7 intel/compiler: Relax mixed type restriction for saturating immediates
At the time of commit 7bc6e455e2 (i965: Add support for saturating
immediates.) we thought mixed type saturates would be impossible.  We
were only thinking about type converting moves from D to F, for
example.  However, type converting moves w/saturate from F to DF are
definitely possible.  This change minimally relaxes the restriction to
allow cases that I have been able trigger via piglit tests.

Fixes new piglit tests:
 - arb_gpu_shader_fp64/execution/built-in-functions/fs-sign-sat-neg-abs.shader_test
 - arb_gpu_shader_fp64/execution/built-in-functions/vs-sign-sat-neg-abs.shader_test

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-06 16:20:10 -07:00
Ian Romanick
fb6dc8e894 intel/compiler: Silence unused parameter warnings brw_nir.c
src/intel/compiler/brw_nir.c: In function ‘brw_nir_lower_vue_outputs’:
src/intel/compiler/brw_nir.c:464:32: warning: unused parameter ‘is_scalar’ [-Wunused-parameter]
                           bool is_scalar)
                                ^~~~~~~~~
src/intel/compiler/brw_nir.c: In function ‘lower_bit_size_callback’:
src/intel/compiler/brw_nir.c:610:57: warning: unused parameter ‘data’ [-Wunused-parameter]
 lower_bit_size_callback(const nir_alu_instr *alu, void *data)
                                                         ^~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-02 16:17:19 -07:00
Francisco Jerez
5b6e91dd35 intel/fs: Remove program key argument from generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Ian Romanick
22f9fbc0d9 i965/vec4: Optimize OR with 0 into a MOV
All of the affected shaders are geometry shaders... the same ones from
the similar fs changes.

The "No changes on any other platforms" comment below is not quite
right.  Without the previous change to register coalescing, this
optimization caused quite a few regressions in tests that either used
gl_ClipVertex or used different interpolation modes.  I observed that
with both patches applied,
glsl-1.10/execution/interpolation/interpolation-none-gl_BackSecondaryColor-smooth-vertex.shader_test
was one instruction shorter.  I suspect other shaders would be similarly
affected.  Since this is all based on NOS, shader-db does not reflect
it.

Haswell
total instructions in shared programs: 12954955 -> 12954918 (<.01%)
instructions in affected programs: 3603 -> 3566 (-1.03%)
helped: 37
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.21% max: 2.50% x̄: 1.99% x̃: 2.50%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.30% -1.69%
Instructions are helped.

total cycles in shared programs: 410012108 -> 410012098 (<.01%)
cycles in affected programs: 3540 -> 3530 (-0.28%)
helped: 5
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.28% max: 0.28% x̄: 0.28% x̃: 0.28%
95% mean confidence interval for cycles value: -2.00 -2.00
95% mean confidence interval for cycles %-change: -0.28% -0.28%
Cycles are helped.

Ivy Bridge
total instructions in shared programs: 11679387 -> 11679351 (<.01%)
instructions in affected programs: 3292 -> 3256 (-1.09%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.21% max: 2.50% x̄: 2.04% x̃: 2.50%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.34% -1.74%
Instructions are helped.

No changes on any other platforms.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-15 17:22:27 -07:00
Ian Romanick
e6a9bd97b9 i965/vec4: Don't register coalesce into source of VS_OPCODE_UNPACK_FLAGS_SIMD4X2
This prevents regressions in a bunch of clipping and interpolation tests
caused by the next patch (i965/vec4: Optimize OR with 0 into a MOV).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-15 17:22:27 -07:00
Antia Puentes
3a1df14a7b intel: activate the gl_BaseVertex lowering
Surplus code related to the basevertex is removed.

The Vertex Elements contain now:
* VE 1: <firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <DrawID, is_indexed_draw, 0, 0>

Also fixes unreachable message.

Fixes OpenGL CTS tests:
* KHR-GL46.shader_draw_parameters_tests.ShaderDrawArraysInstancedParameters
* KHR-GL46.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters
* KHR-GL46.shader_draw_parameters_tests.MultiDrawArraysIndirectCountParameters
* KHR-GL46.shader_draw_parameters_tests.ShaderDrawArraysParameters
* KHR-GL46.shader_draw_parameters_tests.ShaderMultiDrawArraysIndirectParameters

Fixes Piglit tests:
* arb_shader_draw_parameters-drawid-indirect baseinstance
* arb_shader_draw_parameters-basevertex

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102678
2018-05-02 11:24:46 +02:00
Antia Puentes
0cbf29fa55 intel: emit is_indexed_draw in the same VE than gl_DrawID
The Vertex Elements are now:
* VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <DrawID, is-indexed-draw, 0, 0>

VE1 is it kept as it was before, VE2 additionally contains the new
system value.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-02 11:23:34 +02:00
Antia Puentes
6ba9088d9c intel/compiler: Add uses_is_indexed_draw flag
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-02 11:20:48 +02:00
Antia Puentes
c32e1035cb intel: Handle firstvertex in an identical way to BaseVertex
Until we set gl_BaseVertex to zero for non-indexed draw calls
both have an identical value.

The Vertex Elements are kept like that:
* VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <Draw ID, 0, 0, 0>

v2 (idr): Mark nir_intrinsic_load_first_vertex as "unreachable" in
emit_system_values_block and fs_visitor::nir_emit_vs_intrinsic.
2018-04-19 15:57:45 -07:00
Neil Roberts
0c8395e15d intel/compiler: Add a uses_firstvertex flag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-19 15:57:45 -07:00
Jason Ekstrand
2b977989f3 intel/vec4: Set channel_sizes for MOV_INDIRECT sources
Otherwise, any indirect push constant access results in an assertion
failure when we start digging through the channel_sizes array.  This
fixes dEQP-VK.pipeline.push_constant.graphics_pipeline.dynamic_index_vert
on Haswell.  It should be a harmless no-op for GL since indirect push
constants aren't used there.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: e69e5c7006 "i965/vec4: load dvec3/4 uniforms first in the..."
2018-03-30 17:20:27 -07:00
Ian Romanick
91225cb33f i965/vec4: Fix null destination register in 3-source instructions
A recent commit (see below) triggered some cases where conditional
modifier propagation and dead code elimination would cause a MAD
instruction like the following to be generated:

    mad.l.f0  null, ...

Matt pointed out that fs_visitor::fixup_3src_null_dest() fixes cases
like this in the scalar backend.  This commit basically ports that code
to the vec4 backend.

NOTE: I have sent a couple tests to the piglit list that reproduce this
bug *without* the commit mentioned below.  This commit fixes those
tests.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Fixes: ee63933a7 ("nir: Distribute binary operations with constants into bcsel")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105704
2018-03-26 08:50:44 -07:00
Ian Romanick
8f83eea71e i965: Add negative_equals methods
This method is similar to the existing ::equals methods.  Instead of
testing that two src_regs are equal to each other, it tests that one is
the negation of the other.

v2: Simplify various checks based on suggestions from Matt.  Use
src_reg::type instead of fixed_hw_reg.type in a check.  Also suggested
by Matt.

v3: Rebase on 3 years.  Fix some problems with negative_equals with VF
constants.  Add fs_reg::negative_equals.

v4: Replace the existing default case with BRW_REGISTER_TYPE_UB,
BRW_REGISTER_TYPE_B, and BRW_REGISTER_TYPE_NF.  Suggested by Matt.
Expand the FINISHME comment to better explain why it isn't already
finished.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-03-26 08:50:43 -07:00
Kenneth Graunke
70de61594d i965/fs: Add infrastructure for generating CSEL instructions.
v2 (idr): Don't allow CSEL with a non-float src2.

v3 (idr): Add CSEL to fs_inst::flags_written.  Suggested by Matt.

v4 (idr): Only set BRW_ALIGN_16 on Gen < 10 (suggested by Matt).  Don't
reset the access mode afterwards (suggested by Samuel and Matt).  Add
support for CSEL not modifying the flags to more places (requested by
Matt).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-03-08 15:26:26 -08:00
Kenneth Graunke
9fa95359df intel: Drop program size pointer from vec4/fs assembly getters.
These days, we're just passing a pointer to a prog_data field, which
we already have access to.  We can just use it directly.

(In the past, it was a pointer to a separate value.)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-03-02 14:20:22 -08:00
Francisco Jerez
cc0fc8b8ac intel/ir: Allow representing additional flag subregisters in the IR.
This allows representing conditional mods and predicates on f1.0-f1.1
at the IR level by adding an extra bit to the flag_subreg
backend_instruction field.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-02 11:28:56 -08:00
Timothy Arceri
f63e05ae9e compiler: tidy up double_inputs_read uses
First we move double_inputs_read into a vs struct in the union,
double_inputs_read is only used for vs inputs so this will
save space and also allows us to add a new double_inputs field.

We add the new field because c2acf97fcc changed the behaviour
of double_inputs_read, and while it's no longer used to track
actual reads in i965 we do still want to track this for gallium
drivers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-30 09:08:47 +11:00
Kenneth Graunke
74e1d6e20c i965: Drop support for the legacy SNORM -> Float equation.
Older OpenGL defines two equations for converting from signed-normalized
to floating point data.  These are:

    f = (2c + 1)/(2^b - 1)                (equation 2.2)
    f = max{c/2^(b-1) - 1), -1.0}         (equation 2.3)

Both OpenGL 4.2+ and OpenGL ES 3.0+ mandate that equation 2.3 is to be
used in all scenarios, and remove equation 2.2.  DirectX uses equation
2.3 as well.  Intel hardware only supports equation 2.3, so Gen7.5+
systems that use the vertex fetcher hardware to do the conversions
always get formula 2.3.

This can make a big difference for 10-10-10-2 formats - the 2-bit value
can represent 0 with equation 2.3, and cannot with equation 2.2.

Ivybridge and older were using equation 2.2 for OpenGL, and 2.3 for ES.
Now that Ivybridge supports OpenGL 4.2, this is wrong - we need to use
the new rules, at least in core profile.  That would leave Gen4-6 doing
something different than all other hardware, which seems...lame.

With context version promotion, applications that requested a pre-4.2
context may get promoted to 4.2, and thus get the new rules.  Zero cases
have been reported of this being a problem.  However, we've received a
report that following the old rules breaks expectations.  SuperTuxKart
apparently renders the cars red when following equation 2.2, and works
correctly when following equation 2.3:

https://github.com/supertuxkart/stk-code/issues/2885#issuecomment-353858405

So, this patch deletes the legacy equation 2.2 support entirely, making
all hardware and APIs consistently use the new equation 2.3 rules.

If we ever find an application that truly requires the old formula, then
we'd likely want that application to work on modern hardware, too.  We'd
likely restore this support as a driconf option.  Until then, drop it.

This commit will regress Piglit's draw-vertices-2101010 test on
pre-Haswell without the corresponding Piglit patch to accept either
formula (commit 35daaa1695ea01eb85bc02f9be9b6ebd1a7113a1):

    draw-vertices-2101010: Accept either SNORM conversion formula.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2018-01-02 16:51:42 -08:00
Kenneth Graunke
a1afef8de0 i965: Combine {VS,FS}_OPCODE_GET_BUFFER_SIZE opcodes.
These are the same, we don't need a separate opcode enum per backend.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-30 20:30:34 -08:00
Iago Toral Quiroga
f1873956db i965/vec4: fix splitting of interleaved attributes
When we split an instruction that reads an uniform value
(vstride 0) we need to respect the vstride on the second
half of the instruction (that is, the second half should
read the same region as the first).

We were doing this already, but we didn't account for
stages that have interleaved input attributes which also
have a vstride of 0 and need the same treatment.

Fixes the following on Haswell:
KHR-GL45.enhanced_layouts.varying_locations
KHR-GL45.enhanced_layouts.varying_array_locations
KHR-GL45.enhanced_layouts.varying_structure_locations

Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Andres Gomez <agomez@igalia.com>
2017-11-24 09:24:06 +01:00
Jordan Justen
3dcbc5cdaa intel/compiler: Remove final_program_size from brw_compile_*
The caller can now use brw_stage_prog_data::program_size which is set
by the brw_compile_* functions.

Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-10-31 23:36:54 -07:00
Carl Worth
540636045f intel/compiler: add new field for storing program size
This will be used by the on disk shader cache.

v2:
 * Set in brw_compile_* rather than brw_codegen_*. (Jason)

Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
[jordan.l.justen@intel.com: Only add to brw_stage_prog_data]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-10-31 23:36:54 -07:00
Eric Engestrom
ceaad79f85 i965: remove unused variable
Fixes: 2c873060d3 "i965: Delete unused
       brw_vs_prog_data::nr_attributes field."
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2017-10-30 16:32:05 +00:00
Kenneth Graunke
2c873060d3 i965: Delete unused brw_vs_prog_data::nr_attributes field.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-10-27 02:53:38 -07:00
Jason Ekstrand
29737eac98 intel: Allocate prog_data::[pull_]param deeper inside the compiler
Now that we're always growing the param array as-needed, we can
allocate the param array in common code and stop repeating the
allocation everywere.  In order to keep things sane, we ralloc the
[pull_]param array off of the compile context and then steal it back
to a NULL context later.  This doesn't get us all the way to where
prog_data::[pull_]param is purely an out parameter of the back-end
compiler but it gets us a lot closer.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-10-12 22:39:31 -07:00
Jason Ekstrand
2975e4c56a intel: Rewrite the world of push/pull params
This moves us away to the array of pointers model and onto a model where
each param is represented by a generic uint32_t handle.  We reserve 2^16
of these handles for builtins that get generated by somewhere inside the
compiler and have well-defined meanings.  Generic params have handles
whose meanings are defined by the driver.

The primary downside to this new approach is that it moves a little bit
of the work that we would normally do at compile time to draw time.  On
my laptop this hurts OglBatch6 by no more than 1% and doesn't seem to
have any measurable affect on OglBatch7.  So, while this may come back
to bite us, it doesn't look too bad.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-10-12 22:39:29 -07:00
Matt Turner
f30902629c i965/vec4: Use 'class' src_reg, rather than 'struct' src_reg
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-08-21 14:45:44 -07:00
Matt Turner
6a2471b501 i965: Move brw_reg_type_letters() as well
And add "to_" to the name for consistency with the other functions in
this file.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner
8815b9677f i965: Reorder brw_reg_type enum values
These vaguely corresponded to the hardware encodings, but that is purely
historical at this point. Reorder them so we stop making things "almost
work" when mixing enums.

The ordering has been closen so that no enum value is the same as a
compatible hardware encoding.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Kenneth Graunke
4f586cd8f1 i965: Push UBO data, but don't use it just yet.
This patch starts uploading UBO data via 3DSTATE_CONSTANT_* packets,
and updates the compiler to know that there's extra payload data, so
things continue working.  However, it still issues pull loads for all
data.  I wanted to separate the two aspects for greater bisectability.

v2: Update for new intel_bufferobj_buffer parameter.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-13 20:18:30 -07:00
Lionel Landwerlin
030abc6109 intel: compiler/i965: fix is_broxton checks
In 5f2fe9302c is_geminilake was introduced for the differenciate
broxton from geminilake. Unfortunately I failed as verifying that
is_broxton is throughout the code base to mean Gen9lp.

Fixes: 5f2fe9302c ("intel: common: add flag to identify platforms by name")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-20 23:26:42 +01:00