Commit graph

2290 commits

Author SHA1 Message Date
Marek Olšák
ad40715f35 nir/serialize: support any num_components for remaining instructions
Only NPOT vectors greater than vec4 use the extra uint32.

This is for instructions that share the dest code.
load_const and undef already support 1-16 in the header.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
c028449c01 nir/serialize: use 3 unused bits in intrinsic for packed_const_indices
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
3d44aed09e nir/serialize: don't serialize redundant nir_intrinsic_instr::num_components
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
a2df670b14 nir/serialize: serialize writemask for vec8 and vec16
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
a5c5388234 nir/serialize: serialize swizzles for vec8 and vec16
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
f1a48d54ea nir/serialize: reuse the writemask field for 2 src X swizzles of SSA ALU
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
487a495cc0 nir/serialize: remove up to 3 consecutive equal ALU instruction headers
vec4 scalarized ALUs typically have 4 equal instruction headers, so remove
the last 3.

There are no bits left in the ALU header for more flags, so future
extensions of NIR will have to use something like instr_type == 15
to describe more complex ALU instructions.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
c3fa9de2a9 nir/serialize: try to pack both deref array src into 32 bits
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
ed6b01d5e0 nir/serialize: cleanup - fold nir_deref_type_var cases into switches
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
a0cd67d292 nir/serialize: try to put deref->var index into the unused bits of the header
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
ca201bfe70 nir/serialize: don't serialize mode for deref non-cast instructions
It can be derived from src and var. This frees 10 bits in the header
that will be used later.

"mode" is moved in the structure, because those bits will be used for
something else later.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
2286340fde nir/serialize: don't store deref types if not needed
- type_cast: deduplicate types if the last one is the same
- derive the type from the parent for other derefs

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
70a7f85149 nir/serialize: try to pack two alu srcs into 1 uint32
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
ef4630cf4f nir/serialize: pack nir_intrinsic_instr::const_index[] better
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
d3346b275a nir/serialize: pack 1-component constants into 20 bits if possible
The majority of constants can be packed like this.

v2: - use enum for the packing encoding,
    - trim packed_value to 20 bits add 1 bit to last_component,
      which simplifies a later commit

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
75f7c38863 nir/serialize: pack load_const with non-64-bit constants better
v2: use blob_write_uint8/16

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
a572ba673b nir/serialize: try to store a diff in var data locations instead of var data
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
c8314678ee nir/serialize: deduplicate serialized var types by reusing the last unique one
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
545415f45f nir/serialize: don't serialize var->data for temporaries
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
c358c2b2bf nir/serialize: pack src better and limit the object count to 1M from 1G
We need to limit the object count to 1M to free 10 bits for the src
modifiers.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
35655865cb nir/serialize: pack instructions better
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Ian Romanick
ca353285cb nir/range_analysis: Make sure the table validation only occurs once
All of the tables are static const, so they only need to be validated
once.  As noted in the previous commit, the compiler should be able to
eliminate all of this code when the assertions would pass.  Even with
the help of the previous commit, this does not always occur.

-Og: -95.688 +/- 3.91935 (-24.9562% +/- 1.0222%) N=5
-O1: No difference proven at 95.0% confidence. N=5
-O2: -1.962 +/- 0.85001 (-0.860013% +/- 0.372589%) N=5

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-22 08:16:06 -08:00
Ian Romanick
ccefce46cb nir/range-analysis: Add pragmas to help loop unrolling
I was pretty liberal with these assertions when I wrote this code
because I had assumed that GCC would unroll the loops, inline the look ups
of static const arrays with now constant indices, and then elmininate
all the actuall assertions.  It seems none of this happens even at -O3.

Adding the pragmas helps encourage loop unrolling at some optimization
levels.  I tested by running shader-db with NIR_VALIDATE=false on a Core
i7 Haswell desktop system.

-Og: No difference proven at 95.0% confidence. N=5
-O1: -48.304 +/- 1.221 (-16.3343% +/- 0.412888%) N=5
-O2: -49.94 +/- 1.23521 (-17.9634% +/- 0.444303%) N=5

v2: Add a _Pragma to an inner loop that was accidentally dropped during
a rebase.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-22 08:16:06 -08:00
Alyssa Rosenzweig
deaebc82a7 nir: Add load_sampler_lod_paramaters_pan intrinsic
This loads in the <min_lod, max_lod, lod_bias> settings for a given
sampler, which is necessary for lowering clamps/biases on certain
Midgard chips.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-11-22 05:07:19 +00:00
Marek Olšák
0b1452ffdd nir/serialize: do ctx = {0} instead of manual initializations
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-21 18:49:57 -05:00
Marek Olšák
ff71fae440 nir: strip as we serialize to remove the nir_shader_clone call
Serializing stripped NIR is faster now.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-21 18:49:57 -05:00
Dave Airlie
cce07ea835 nir: fix deref offset builder
Use the correct bit size

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-22 04:37:41 +10:00
Dave Airlie
7325f6ac98 vtn/opencl: add clz support
This is needed for OpenCL

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-22 04:37:41 +10:00
Dave Airlie
d0d96053e6 nir: add 64-bit ufind_msb lowering support. (v2)
This adds the option to lower 64-bit ufind_msb opcodes.

v2: use split_x/y removes component loops (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-22 04:37:37 +10:00
Dave Airlie
12913bcf86 spirv/nir/opencl: handle some multiply instructions.
This adds support for some missing 24-bit and hi multiply
variants.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-22 04:37:25 +10:00
Karol Herbst
5934a53bfe nir/validate: validate num_components on registers and intrinsics
also make 8 and 16 compoments invalid. We will enable that later again
when we actually support it.

v2: fix validation of nir_intrinsic_instr::num_components
    correct validation of instr->num_components

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-21 01:10:24 +01:00
Rhys Perry
ca2de7ae9c nir/large_constants: use nir_index_vars and nir_variable::index
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-20 15:05:42 +00:00
Rhys Perry
9f92e8b721 nir: add nir_variable::index and nir_index_vars
This will be useful as a deterministic identifier/index for the variable.

v2: fix comment style

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v1)
2019-11-20 15:05:42 +00:00
Rhys Perry
45a0b53490 nir: make nir_variable::{num_members,num_state_slots} a uint16_t
Doesn't shrink it (at least, on x86-64) and leaves space for more members.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-20 15:05:42 +00:00
Neil Roberts
f6b5abe91a nir/lower_alu_to_scalar: Support lowering 8- and 16-bit reduce ops
Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-20 14:09:43 +01:00
Neil Roberts
634eb9c04b nir: Add a 8-bit bool type
Adds nir_type_bool8 as well as 8-bit versions of all the bool
opcodes.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-20 14:09:43 +01:00
Neil Roberts
0f5640c577 nir: Add a 16-bit bool type
Adds nir_type_bool16 as well as 16-bit versions of all the bool
opcodes.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-20 14:09:43 +01:00
Neil Roberts
2ec97e78a9 nir/opcodes: Add a helper function to generate reduce opcodes
Adds binop_reduce_all_sizes which generates both 1-bit and 32-bit
versions of the reduce operation. This reduces the code duplication a
bit and will make it easier to later add 16-bit versions as well.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-20 14:09:43 +01:00
Neil Roberts
9a96afb97e nir/opcodes: Add a helper function to generate the comparison binops
Adds binop_compare_all_sizes which generates both 1-bit and 32-bit
versions of the comparison operation. This reduces the code
duplication a bit and will make it easier to later add 16-bit versions
as well.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-20 14:09:43 +01:00
Marek Olšák
654efd38bb nir: don't use GLenum16 in nir.h
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-19 18:20:12 -05:00
Marek Olšák
ec7d37c9c0 nir: move data.descriptor_set above data.index for better packing
4 bytes down

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-19 18:20:10 -05:00
Marek Olšák
193e2c9625 nir/print: only print image.format for image variables
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-19 18:20:07 -05:00
Marek Olšák
ebe7579655 nir: move data.image.access to data.access
The size of the data structure doesn't change.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-19 18:20:05 -05:00
Dave Airlie
1468a4f1f3 nir/serialize: fix serializing functions with no implementations.
Store a flag stating if there was an implmentation, and use
fxn->impl as a temporary flag between deserializsation stages.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-19 09:30:32 +10:00
Dave Airlie
0fd6b8aa98 nir/serialize: pack function has name and entry point into flags.
Suggested by Jason.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-19 09:30:12 +10:00
Jason Ekstrand
7260df5894 nir: Validate that variables are in the right lists
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-18 16:15:30 -06:00
Connor Abbott
f9fd04aca1 nir: Fix non-determinism in lower_global_vars_to_local
Using a hash-table walk means that variables will get inserted in
different orders on different runs. Just walk the list of globals
instead, even if some of them can't be turned into locals.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-14 13:10:58 +00:00
Brian Paul
bd49dedae0 nir: fix a couple signed/unsigned comparison warnings in nir_builder.h
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-12 11:44:02 -07:00
Rhys Perry
c877f4d320 nir/divergence: improve DA of shuffle
If the data is uniform, then it's really a uniform copy. If the index is
uniform, then it's really a read_invocation.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-12 17:21:38 +00:00
Erik Faye-Lund
9b8964d064 nir: patch up deref-vars when lowering clip-planes
Otherwise, we fail validation and potentially generate invalid code.
Let's fix up the mode of the accesses to the variable.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-11-12 09:13:22 +01:00