Commit graph

56 commits

Author SHA1 Message Date
Brian Paul
0de83bacf0 intel/compiler: silence unitialized variable warning in opt_vector_float()
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-08 10:23:11 -07:00
Jason Ekstrand
e8f863e718 intel/compiler: Re-prefix non-logical surface opcodes with VEC4
The scalar back-end uses SHADER_OPCODE_SEND for all surface messages so
we no longer need the non-logical opcodes there.  Prefix them VEC4 so
it's clear that they're only used by the vec4 back-end.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand
aeaba24fcb intel/compiler: Drop unused surface opcodes
The unused typed surface read/write support in the vec4 back-end has
been dropped and the fs back-end now uses SHADER_OPCODE_SEND for all
image and buffer ops.  There's no reason to keep these opcodes around
anymore.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Matt Turner
7e4e9da90d intel/compiler: Prevent warnings in the following patch
The next patch replaces an unsigned bitfield with a plain unsigned,
which triggers gcc to begin warning on signed/unsigned comparisons.

Keeping this patch separate from the actual move allows bisectablity and
generates no additional warnings temporarily.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-01-09 16:42:41 -08:00
Ian Romanick
9a83c3d3b3 i965/fs: Eliminate unary op on operand of compare-with-zero
The (-abs(x) >= 0) => (x == 0) optimization is removed from the vec4 and
scalar parts. In the VS part, adding the new pattern was not
helpful. The pattern that is removed is really old, and it has been
handled by NIR for ages.

All Gen7+ platforms had similar results. (Broadwell shown)
total instructions in shared programs: 14715715 -> 14715709 (<.01%)
instructions in affected programs: 474 -> 468 (-1.27%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.12% max: 1.35% x̄: 1.28% x̃: 1.35%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.40% -1.15%
Instructions are helped.

total cycles in shared programs: 559569911 -> 559569809 (<.01%)
cycles in affected programs: 5963 -> 5861 (-1.71%)
helped: 6
HURT: 0
helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17
helped stats (rel) min: 1.45% max: 1.88% x̄: 1.73% x̃: 1.85%
95% mean confidence interval for cycles value: -18.15 -15.85
95% mean confidence interval for cycles %-change: -1.95% -1.51%
Cycles are helped.

Iron Lake and Sandy Bridge had similar results. (Iron Lake shown)
total instructions in shared programs: 7780915 -> 7780913 (<.01%)
instructions in affected programs: 246 -> 244 (-0.81%)
helped: 2
HURT: 0

total cycles in shared programs: 177876108 -> 177876106 (<.01%)
cycles in affected programs: 3636 -> 3634 (-0.06%)
helped: 1
HURT: 0

GM45
total instructions in shared programs: 4799152 -> 4799151 (<.01%)
instructions in affected programs: 126 -> 125 (-0.79%)
helped: 1
HURT: 0

total cycles in shared programs: 122052654 -> 122052652 (<.01%)
cycles in affected programs: 3640 -> 3638 (-0.05%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Kenneth Graunke
562448b75a i965: Do NIR shader cloning in the caller.
This moves nir_shader_clone() to the driver-specific compile function,
rather than the shared src/intel/compiler code.  This allows i965 to do
key-specific passes before calling brw_compile_*.  Vulkan should not
need this cloning as it doesn't compile multiple variants.

We do need to continue cloning in the compute shader code because we
lower various things in NIR based on the SIMD width.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2018-11-20 15:53:46 -08:00
Jason Ekstrand
4ba445e011 intel: Don't propagate conditional modifiers if a UD source is negated
This fixes a bug uncovered by my NIR integer division by constant
optimization series.

Fixes: 19f9cb72c8 "i965/fs: Add pass to propagate conditional..."
Fixes: 627f94b72e "i965/vec4: adding vec4_cmod_propagation..."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-10-10 13:13:12 -05:00
Dylan Baker
8396043f30 Replace uses of _mesa_bitcount with util_bitcount
and _mesa_bitcount_64 with util_bitcount_64. This fixes a build problem
in nir for platforms that don't have popcount or popcountll, such as
32bit msvc.

v2: - Fix additional uses of _mesa_bitcount added after this was
      originally written

Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-09-07 10:21:26 -07:00
Caio Marcelo de Oliveira Filho
7df5f62768 intel/compiler: silence -Wclass-memaccess warnings
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2018-07-18 08:29:51 -07:00
Ian Romanick
f8e54d02f7 intel/compiler: Relax mixed type restriction for saturating immediates
At the time of commit 7bc6e455e2 (i965: Add support for saturating
immediates.) we thought mixed type saturates would be impossible.  We
were only thinking about type converting moves from D to F, for
example.  However, type converting moves w/saturate from F to DF are
definitely possible.  This change minimally relaxes the restriction to
allow cases that I have been able trigger via piglit tests.

Fixes new piglit tests:
 - arb_gpu_shader_fp64/execution/built-in-functions/fs-sign-sat-neg-abs.shader_test
 - arb_gpu_shader_fp64/execution/built-in-functions/vs-sign-sat-neg-abs.shader_test

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-06 16:20:10 -07:00
Ian Romanick
fb6dc8e894 intel/compiler: Silence unused parameter warnings brw_nir.c
src/intel/compiler/brw_nir.c: In function ‘brw_nir_lower_vue_outputs’:
src/intel/compiler/brw_nir.c:464:32: warning: unused parameter ‘is_scalar’ [-Wunused-parameter]
                           bool is_scalar)
                                ^~~~~~~~~
src/intel/compiler/brw_nir.c: In function ‘lower_bit_size_callback’:
src/intel/compiler/brw_nir.c:610:57: warning: unused parameter ‘data’ [-Wunused-parameter]
 lower_bit_size_callback(const nir_alu_instr *alu, void *data)
                                                         ^~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-02 16:17:19 -07:00
Francisco Jerez
5b6e91dd35 intel/fs: Remove program key argument from generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Ian Romanick
22f9fbc0d9 i965/vec4: Optimize OR with 0 into a MOV
All of the affected shaders are geometry shaders... the same ones from
the similar fs changes.

The "No changes on any other platforms" comment below is not quite
right.  Without the previous change to register coalescing, this
optimization caused quite a few regressions in tests that either used
gl_ClipVertex or used different interpolation modes.  I observed that
with both patches applied,
glsl-1.10/execution/interpolation/interpolation-none-gl_BackSecondaryColor-smooth-vertex.shader_test
was one instruction shorter.  I suspect other shaders would be similarly
affected.  Since this is all based on NOS, shader-db does not reflect
it.

Haswell
total instructions in shared programs: 12954955 -> 12954918 (<.01%)
instructions in affected programs: 3603 -> 3566 (-1.03%)
helped: 37
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.21% max: 2.50% x̄: 1.99% x̃: 2.50%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.30% -1.69%
Instructions are helped.

total cycles in shared programs: 410012108 -> 410012098 (<.01%)
cycles in affected programs: 3540 -> 3530 (-0.28%)
helped: 5
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.28% max: 0.28% x̄: 0.28% x̃: 0.28%
95% mean confidence interval for cycles value: -2.00 -2.00
95% mean confidence interval for cycles %-change: -0.28% -0.28%
Cycles are helped.

Ivy Bridge
total instructions in shared programs: 11679387 -> 11679351 (<.01%)
instructions in affected programs: 3292 -> 3256 (-1.09%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.21% max: 2.50% x̄: 2.04% x̃: 2.50%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.34% -1.74%
Instructions are helped.

No changes on any other platforms.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-15 17:22:27 -07:00
Ian Romanick
e6a9bd97b9 i965/vec4: Don't register coalesce into source of VS_OPCODE_UNPACK_FLAGS_SIMD4X2
This prevents regressions in a bunch of clipping and interpolation tests
caused by the next patch (i965/vec4: Optimize OR with 0 into a MOV).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-15 17:22:27 -07:00
Antia Puentes
3a1df14a7b intel: activate the gl_BaseVertex lowering
Surplus code related to the basevertex is removed.

The Vertex Elements contain now:
* VE 1: <firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <DrawID, is_indexed_draw, 0, 0>

Also fixes unreachable message.

Fixes OpenGL CTS tests:
* KHR-GL46.shader_draw_parameters_tests.ShaderDrawArraysInstancedParameters
* KHR-GL46.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters
* KHR-GL46.shader_draw_parameters_tests.MultiDrawArraysIndirectCountParameters
* KHR-GL46.shader_draw_parameters_tests.ShaderDrawArraysParameters
* KHR-GL46.shader_draw_parameters_tests.ShaderMultiDrawArraysIndirectParameters

Fixes Piglit tests:
* arb_shader_draw_parameters-drawid-indirect baseinstance
* arb_shader_draw_parameters-basevertex

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102678
2018-05-02 11:24:46 +02:00
Antia Puentes
0cbf29fa55 intel: emit is_indexed_draw in the same VE than gl_DrawID
The Vertex Elements are now:
* VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <DrawID, is-indexed-draw, 0, 0>

VE1 is it kept as it was before, VE2 additionally contains the new
system value.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-02 11:23:34 +02:00
Antia Puentes
6ba9088d9c intel/compiler: Add uses_is_indexed_draw flag
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-02 11:20:48 +02:00
Antia Puentes
c32e1035cb intel: Handle firstvertex in an identical way to BaseVertex
Until we set gl_BaseVertex to zero for non-indexed draw calls
both have an identical value.

The Vertex Elements are kept like that:
* VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <Draw ID, 0, 0, 0>

v2 (idr): Mark nir_intrinsic_load_first_vertex as "unreachable" in
emit_system_values_block and fs_visitor::nir_emit_vs_intrinsic.
2018-04-19 15:57:45 -07:00
Neil Roberts
0c8395e15d intel/compiler: Add a uses_firstvertex flag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-04-19 15:57:45 -07:00
Jason Ekstrand
2b977989f3 intel/vec4: Set channel_sizes for MOV_INDIRECT sources
Otherwise, any indirect push constant access results in an assertion
failure when we start digging through the channel_sizes array.  This
fixes dEQP-VK.pipeline.push_constant.graphics_pipeline.dynamic_index_vert
on Haswell.  It should be a harmless no-op for GL since indirect push
constants aren't used there.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: e69e5c7006 "i965/vec4: load dvec3/4 uniforms first in the..."
2018-03-30 17:20:27 -07:00
Ian Romanick
91225cb33f i965/vec4: Fix null destination register in 3-source instructions
A recent commit (see below) triggered some cases where conditional
modifier propagation and dead code elimination would cause a MAD
instruction like the following to be generated:

    mad.l.f0  null, ...

Matt pointed out that fs_visitor::fixup_3src_null_dest() fixes cases
like this in the scalar backend.  This commit basically ports that code
to the vec4 backend.

NOTE: I have sent a couple tests to the piglit list that reproduce this
bug *without* the commit mentioned below.  This commit fixes those
tests.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Fixes: ee63933a7 ("nir: Distribute binary operations with constants into bcsel")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105704
2018-03-26 08:50:44 -07:00
Ian Romanick
8f83eea71e i965: Add negative_equals methods
This method is similar to the existing ::equals methods.  Instead of
testing that two src_regs are equal to each other, it tests that one is
the negation of the other.

v2: Simplify various checks based on suggestions from Matt.  Use
src_reg::type instead of fixed_hw_reg.type in a check.  Also suggested
by Matt.

v3: Rebase on 3 years.  Fix some problems with negative_equals with VF
constants.  Add fs_reg::negative_equals.

v4: Replace the existing default case with BRW_REGISTER_TYPE_UB,
BRW_REGISTER_TYPE_B, and BRW_REGISTER_TYPE_NF.  Suggested by Matt.
Expand the FINISHME comment to better explain why it isn't already
finished.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-03-26 08:50:43 -07:00
Kenneth Graunke
70de61594d i965/fs: Add infrastructure for generating CSEL instructions.
v2 (idr): Don't allow CSEL with a non-float src2.

v3 (idr): Add CSEL to fs_inst::flags_written.  Suggested by Matt.

v4 (idr): Only set BRW_ALIGN_16 on Gen < 10 (suggested by Matt).  Don't
reset the access mode afterwards (suggested by Samuel and Matt).  Add
support for CSEL not modifying the flags to more places (requested by
Matt).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-03-08 15:26:26 -08:00
Kenneth Graunke
9fa95359df intel: Drop program size pointer from vec4/fs assembly getters.
These days, we're just passing a pointer to a prog_data field, which
we already have access to.  We can just use it directly.

(In the past, it was a pointer to a separate value.)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-03-02 14:20:22 -08:00
Francisco Jerez
cc0fc8b8ac intel/ir: Allow representing additional flag subregisters in the IR.
This allows representing conditional mods and predicates on f1.0-f1.1
at the IR level by adding an extra bit to the flag_subreg
backend_instruction field.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-02 11:28:56 -08:00
Timothy Arceri
f63e05ae9e compiler: tidy up double_inputs_read uses
First we move double_inputs_read into a vs struct in the union,
double_inputs_read is only used for vs inputs so this will
save space and also allows us to add a new double_inputs field.

We add the new field because c2acf97fcc changed the behaviour
of double_inputs_read, and while it's no longer used to track
actual reads in i965 we do still want to track this for gallium
drivers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-30 09:08:47 +11:00
Kenneth Graunke
74e1d6e20c i965: Drop support for the legacy SNORM -> Float equation.
Older OpenGL defines two equations for converting from signed-normalized
to floating point data.  These are:

    f = (2c + 1)/(2^b - 1)                (equation 2.2)
    f = max{c/2^(b-1) - 1), -1.0}         (equation 2.3)

Both OpenGL 4.2+ and OpenGL ES 3.0+ mandate that equation 2.3 is to be
used in all scenarios, and remove equation 2.2.  DirectX uses equation
2.3 as well.  Intel hardware only supports equation 2.3, so Gen7.5+
systems that use the vertex fetcher hardware to do the conversions
always get formula 2.3.

This can make a big difference for 10-10-10-2 formats - the 2-bit value
can represent 0 with equation 2.3, and cannot with equation 2.2.

Ivybridge and older were using equation 2.2 for OpenGL, and 2.3 for ES.
Now that Ivybridge supports OpenGL 4.2, this is wrong - we need to use
the new rules, at least in core profile.  That would leave Gen4-6 doing
something different than all other hardware, which seems...lame.

With context version promotion, applications that requested a pre-4.2
context may get promoted to 4.2, and thus get the new rules.  Zero cases
have been reported of this being a problem.  However, we've received a
report that following the old rules breaks expectations.  SuperTuxKart
apparently renders the cars red when following equation 2.2, and works
correctly when following equation 2.3:

https://github.com/supertuxkart/stk-code/issues/2885#issuecomment-353858405

So, this patch deletes the legacy equation 2.2 support entirely, making
all hardware and APIs consistently use the new equation 2.3 rules.

If we ever find an application that truly requires the old formula, then
we'd likely want that application to work on modern hardware, too.  We'd
likely restore this support as a driconf option.  Until then, drop it.

This commit will regress Piglit's draw-vertices-2101010 test on
pre-Haswell without the corresponding Piglit patch to accept either
formula (commit 35daaa1695ea01eb85bc02f9be9b6ebd1a7113a1):

    draw-vertices-2101010: Accept either SNORM conversion formula.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2018-01-02 16:51:42 -08:00
Kenneth Graunke
a1afef8de0 i965: Combine {VS,FS}_OPCODE_GET_BUFFER_SIZE opcodes.
These are the same, we don't need a separate opcode enum per backend.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-30 20:30:34 -08:00
Iago Toral Quiroga
f1873956db i965/vec4: fix splitting of interleaved attributes
When we split an instruction that reads an uniform value
(vstride 0) we need to respect the vstride on the second
half of the instruction (that is, the second half should
read the same region as the first).

We were doing this already, but we didn't account for
stages that have interleaved input attributes which also
have a vstride of 0 and need the same treatment.

Fixes the following on Haswell:
KHR-GL45.enhanced_layouts.varying_locations
KHR-GL45.enhanced_layouts.varying_array_locations
KHR-GL45.enhanced_layouts.varying_structure_locations

Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Andres Gomez <agomez@igalia.com>
2017-11-24 09:24:06 +01:00
Jordan Justen
3dcbc5cdaa intel/compiler: Remove final_program_size from brw_compile_*
The caller can now use brw_stage_prog_data::program_size which is set
by the brw_compile_* functions.

Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-10-31 23:36:54 -07:00
Carl Worth
540636045f intel/compiler: add new field for storing program size
This will be used by the on disk shader cache.

v2:
 * Set in brw_compile_* rather than brw_codegen_*. (Jason)

Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
[jordan.l.justen@intel.com: Only add to brw_stage_prog_data]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-10-31 23:36:54 -07:00
Eric Engestrom
ceaad79f85 i965: remove unused variable
Fixes: 2c873060d3 "i965: Delete unused
       brw_vs_prog_data::nr_attributes field."
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2017-10-30 16:32:05 +00:00
Kenneth Graunke
2c873060d3 i965: Delete unused brw_vs_prog_data::nr_attributes field.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-10-27 02:53:38 -07:00
Jason Ekstrand
29737eac98 intel: Allocate prog_data::[pull_]param deeper inside the compiler
Now that we're always growing the param array as-needed, we can
allocate the param array in common code and stop repeating the
allocation everywere.  In order to keep things sane, we ralloc the
[pull_]param array off of the compile context and then steal it back
to a NULL context later.  This doesn't get us all the way to where
prog_data::[pull_]param is purely an out parameter of the back-end
compiler but it gets us a lot closer.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-10-12 22:39:31 -07:00
Jason Ekstrand
2975e4c56a intel: Rewrite the world of push/pull params
This moves us away to the array of pointers model and onto a model where
each param is represented by a generic uint32_t handle.  We reserve 2^16
of these handles for builtins that get generated by somewhere inside the
compiler and have well-defined meanings.  Generic params have handles
whose meanings are defined by the driver.

The primary downside to this new approach is that it moves a little bit
of the work that we would normally do at compile time to draw time.  On
my laptop this hurts OglBatch6 by no more than 1% and doesn't seem to
have any measurable affect on OglBatch7.  So, while this may come back
to bite us, it doesn't look too bad.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-10-12 22:39:29 -07:00
Matt Turner
f30902629c i965/vec4: Use 'class' src_reg, rather than 'struct' src_reg
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-08-21 14:45:44 -07:00
Matt Turner
6a2471b501 i965: Move brw_reg_type_letters() as well
And add "to_" to the name for consistency with the other functions in
this file.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Matt Turner
8815b9677f i965: Reorder brw_reg_type enum values
These vaguely corresponded to the hardware encodings, but that is purely
historical at this point. Reorder them so we stop making things "almost
work" when mixing enums.

The ordering has been closen so that no enum value is the same as a
compatible hardware encoding.

Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
2017-08-21 14:05:23 -07:00
Kenneth Graunke
4f586cd8f1 i965: Push UBO data, but don't use it just yet.
This patch starts uploading UBO data via 3DSTATE_CONSTANT_* packets,
and updates the compiler to know that there's extra payload data, so
things continue working.  However, it still issues pull loads for all
data.  I wanted to separate the two aspects for greater bisectability.

v2: Update for new intel_bufferobj_buffer parameter.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-13 20:18:30 -07:00
Lionel Landwerlin
030abc6109 intel: compiler/i965: fix is_broxton checks
In 5f2fe9302c is_geminilake was introduced for the differenciate
broxton from geminilake. Unfortunately I failed as verifying that
is_broxton is throughout the code base to mean Gen9lp.

Fixes: 5f2fe9302c ("intel: common: add flag to identify platforms by name")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-20 23:26:42 +01:00
Anuj Phogat
f9e31a26d4 i965/cnl: Make URB {VS, GS, HS, DS} sizes non multiple of 3
v1: By Ben Widawsky <benjamin.widawsky@intel.com>
v2: v1 had an assert only for VS. Add the restriction for GS, HS and
    DS as well and make sure the allocated sizes are not multiple of 3.
v3: Move the entry_size checks in to compiler code (Ken)

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-09 16:02:59 -07:00
Samuel Iglesias Gonsálvez
e69e5c7006 i965/vec4: load dvec3/4 uniforms first in the push constant buffer
Reorder the uniforms to load first the dvec4-aligned variables in the
push constant buffer and then push the vec4-aligned ones. It takes
into account that the relocated uniforms should be aligned to their
channel size.

This fixes a bug were the dvec3/4 might be loaded one part on a GRF and
the rest in next GRF, so the region parameters to read that could break
the HW rules.

v2:
- Fix broken logic.
- Add a comment to explain what should be needed to optimise the usage
  of the push constant buffer slots, as this patch does not pack the
  uniforms.

v3:
- Implemented the push constant buffer usage optimization.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Acked-by: Francisco Jerez <currojerez@riseup.net>
2017-05-18 06:49:54 +02:00
Jason Ekstrand
2e9916ea04 i965/vec4: Use NIR to do GS input remapping
We're already doing this in the FS back-end.  This just does the same
thing in the vec4 back-end.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:08:07 -07:00
Jason Ekstrand
0d5f89cdc3 i965/vec4: Use NIR remapping for VS attributes
The NIR pass already handles remapping system values to attributes for
us so we delete the system value code as part of the conversion.

We also change nir_lower_vs_inputs to take an explicit inputs_read
bitmask and pass in the inputs_read from prog_data instead from pulling
it out of NIR.  This is because the version in prog_data may get
EDGEFLAG added to it on some old platforms.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:08:06 -07:00
Jason Ekstrand
80aa6e9d32 intel/compiler/vs: Move inputs_read handling to generic code
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:08:03 -07:00
Jason Ekstrand
d2fe804d18 i965/vec4: Set VERT_BIT_EDGEFLAG based on the VUE map
We also add a nice little comment to make it more clear exactly what
happens with the edge flag copy.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Jason Ekstrand
24e6fba500 i965/vs: Set uses_vertexid and friends from brw_compile_vs
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Jason Ekstrand
b86dba8a0e nir: Embed the shader_info in the nir_shader again
Commit e1af20f18a changed the shader_info
from being embedded into being just a pointer.  The idea was that
sharing the shader_info between NIR and GLSL would be easier if it were
a pointer pointing to the same shader_info struct.  This, however, has
caused a few problems:

 1) There are many things which generate NIR without GLSL.  This means
    we have to support both NIR shaders which come from GLSL and ones
    that don't and need to have an info elsewhere.

 2) The solution to (1) raises all sorts of ownership issues which have
    to be resolved with ralloc_parent checks.

 3) Ever since 00620782c9, we've been
    using nir_gather_info to fill out the final shader_info.  Thanks to
    cloning and the above ownership issues, the nir_shader::info may not
    point back to the gl_shader anymore and so we have to do a copy of
    the shader_info from NIR back to GLSL anyway.

All of these issues go away if we just embed the shader_info in the
nir_shader.  There's a little downside of having to copy it back after
calling nir_gather_info but, as explained above, we have to do that
anyway.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Samuel Iglesias Gonsálvez
aaeb1c99be i965/vec4: fix register width for DF VGRF and UNIFORM
On gen7, the swizzles used in DF align16 instructions works for element
size of 32 bits, so we can address only 2 consecutive DFs. As we assumed that
in the rest of the code and prepare the instructions for this (scalarize_df()),
we need to set it to two again.

However, for DF align1 instructions, a width of 2 is wrong as we are not
reading the data we want. For example, an uniform would have a region of
<0, 2, 1> so it would repeat the first 2 DFs, when we wanted to access
to the first 4.

This patch sets the default one to 4 and then modifies the width of
align16 instruction's DF sources when we translate the logical swizzle
to the physical one.

v2:
- Remove conditional (Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-05-03 15:32:39 +02:00
Samuel Iglesias Gonsálvez
7f728bce81 i965/vec4: fix vertical stride to avoid breaking region parameter rule
From IVB PRM, vol4, part3, "General Restrictions on Regioning
Parameters":

  "If ExecSize = Width and HorzStride ≠ 0, VertStride must
   be set to Width * HorzStride."

In next patch, we are going to modify the region parameter for
uniforms and vgrf. For uniforms that are the source of
DF align1 instructions, they will have <0, 4, 1> regioning and
the execsize for those instructions will be 4, so they will break
the regioning rule. This will be the same for VGRF sources where
we use the vstride == 0 exploit.

As we know we are not going to cross the GRF boundary with that
execsize and parameters (not even with the exploit), we just fix
the vstride here.

v2:
- Move is_align1_df() (Curro)
- Refactor exec_size == width calculation (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-05-03 15:32:39 +02:00