To have uniform behavior while disassembling send(c) instruction use
register type of unsigned doubleword for src1 when message descriptor is
immediate value. Bspec does not specifiy anything for src1 immediate
default type.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
This is valid NIR but you can't actually hit this case today. GLSL IR
doesn't have a bool to double opcode; it does f2d(b2f(x)). In SPIR-V we
don't have any to/from bool conversion opcodes at all. However, the
next commit will make us start generating it so we should be ready.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Several of the Atom GPUs have additional restrictions on alignment when
moving < 64-bit source to a 64-bit destination. All of the nir_op_*2*64
code generation paths respected this, but nir_op_b2[fi] did not.
Previous to commit a68dd47b91 it was not possible to generate such an
instruction from the GLSL path. It may have been possible from SPIR-V,
but it's not clear. The aforementioned patch converts a 64-bit
nir_op_fsign into a sequence of operations including a nir_op_b2f with a
64-bit result. This "just works" everywhere except these Atom parts.
This problem was not detected during normal CI testing because the Atom
parts are not included in developer builds.
v2 (idr): Make the patch compile, and make some cosmetic changes. Add a
commit message.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108319
Fixes: a68dd47b91 "nir/algebraic: Simplify fsat of fsign"
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Whiskey Lake uses the same gen graphics as Coffe Lake, including some
ids that were previously marked as reserved on Coffe Lake, but that
now are moved to WHL page.
This follows the ids and approach used on kernel's commit
b9be78531d27 ("drm/i915/whl: Introducing Whiskey Lake platform")
and commit c1c8f6fa731b ("drm/i915: Redefine some Whiskey Lake SKUs")
v2: Lionel noticed that GT{1,2,3} on kernel wasn't following
spec when looking to number of EUs, so kernel has been updated.
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This fixes a bug uncovered by my NIR integer division by constant
optimization series.
Fixes: 19f9cb72c8 "i965/fs: Add pass to propagate conditional..."
Fixes: 627f94b72e "i965/vec4: adding vec4_cmod_propagation..."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
No shader-db or CI changes on any Intel platform.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
This fixes a bunch of Vulkan subgroup tests on little core platforms.
Fixes: 4150920b95 "intel/fs: Add a helper for emitting scan operations"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
No shader-db changes on any Intel platform... which probably explains
why no bugs have been bisected to this problem since it landed in Mesa
18.1. :( The commit mentioned below is in 18.2, so 18.1 would need a
slightly different fix (due to code refactoring).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 77f269bb56 "i965/fs: Refactor propagation of conditional modifiers from compares to adds"
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (reviewed the original patch)
Cc: Matt Turner <mattst88@gmail.com> (reviewed the original patch)
and _mesa_bitcount_64 with util_bitcount_64. This fixes a build problem
in nir for platforms that don't have popcount or popcountll, such as
32bit msvc.
v2: - Fix additional uses of _mesa_bitcount added after this was
originally written
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It was very inconsistently handled; the only things that made use of it
were glsl_to_nir, glspirv, and nir_gather_info. In particular,
nir_lower_io completely ignored it so anyone using nir_lower_io on
64-bit vertex attributes was going to be in for a shock. Also, as of
the previous commit, it's set by every driver that supports 64-bit
vertex attributes. There's no longer any reason to have it be an option
so let's just delete it.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Unused since 09f1de97a7 "anv,i965: Lower away image derefs in
the driver".
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This is the second patch needed to fix the following piglit tests:
tests/spec/arb_gl_spirv/linker/uniform/multisampler.shader_test
tests/spec/arb_gl_spirv/linker/uniform/multisampler-array.shader_test
Although in this case it doesn't affect so many borrowed tests, as
there aren't too many tests using multisamplers on Intel.
It is worth to note that this patch is also needed when those tests
are run on GLSL mode (using the --glsl option). Although most Intel
drivers would not be able to run/execute tests using multisamplers, as
GL_MAX_IMAGE_SAMPLES is zero, technically those tests are expected to
link correctly, so linking tests should pass.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
To brw_nir_lower_gl_images, as it will be also used on the
ARB_gl_spirv codepath, that doesn't involves GLSL at all. So the
lowering is about images following the OpenGL semantics. In any case
"brw_nir_lower_opengl_images" seemed too long to me, so I just used
gl. That shortening is already used on other parts of the code.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
As of 07a2098a70, brw_nir_optimize calls nir_remove_dead_variables as
the last optimization. Doing it again is just pointless.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
We're hitting an assert in gfxbench because one of the local variable
is a sampler (according to Jason this isn't valid) :
testfw_app: ../src/compiler/nir_types.cpp:551: void glsl_get_natural_size_align_bytes(const glsl_type*, unsigned int*, unsigned int*): Assertion `!"type does not have a natural size"' failed.
Since this particular variable isn't used, it can be eliminated by
removing unused local variables at the end of the optimization loop.
This makes sense also for valid local variables.
v2: Move additional local variable removal out of optimization loop,
but before large constant removal (Jason/Lionel)
v3: Move the removal at the end of brw_nir_optimize()
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107806
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Page 190 of "Volume 7: 3D Media GPGPU Engine (Haswell)" says the valid
range of the offset is [0, 0FFFFFFFh].
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Fixes failure in the new piglit test
tes-patch-input-array-vec2-index-invalid-rd.shader_test.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Amber Lake uses the same gen graphics as Kaby Lake, including a id
that were previously marked as reserved on Kaby Lake, but that
now is moved to AML page.
This follows the ids and approach used on kernel's commit
e364672477a1 ("drm/i915/aml: Introducing Amber Lake platform")
Reported-by: Timo Aaltonen <timo.aaltonen@canonical.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This fixes the GL_ARB_fragment_shader_interlock piglit test on gen8
platforms where the lack of metadata dirtying was causing another pass
to accidentally delete a much needed loop.
https://bugs.freedesktop.org/show_bug.cgi?id=107745
Fixes: 37f7983bcc "intel/compiler: Do image load/store lowering..."
Jason Ekstrand <jason@jlekstrand.net> writes:
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Now that the drivers are lowering to surface indices themselves, we no
longer need to push the surface index into the shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, the back-end compiler turn image access into magic uniform
reads and there was a complex contract between back-end compiler and
driver about setting up and filling out those params. As of this
commit, both drivers now lower image_deref_load_param_intel intrinsics
to load_uniform intrinsics controlled by the driver and lower the other
image_deref_* intrinsics to image_* intrinsics which take an actual
binding table index. There are still "magic" uniforms but they are now
added and controlled entirely by the driver and that contract no longer
spans components.
This also has the side-effect of making most image use compile-time
binding table indices. Previously, all image access pulled the binding
table index from a uniform. Part of the reason for this was that the
magic uniforms made it difficult to decouple binding table indices from
the uniforms and, since they are indexed completely differently
(especially in Vulkan), it was hard to pull them apart. Now that the
driver is handling both, it's trivial to decouple the two and provide
actual binding table indices.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166872 -> 15164293 (-0.02%)
instructions in affected programs: 115834 -> 113255 (-2.23%)
helped: 191
HURT: 0
total cycles in shared programs: 571311495 -> 571196465 (-0.02%)
cycles in affected programs: 4757115 -> 4642085 (-2.42%)
helped: 73
HURT: 67
total spills in shared programs: 10951 -> 10926 (-0.23%)
spills in affected programs: 742 -> 717 (-3.37%)
helped: 7
HURT: 0
total fills in shared programs: 22226 -> 22201 (-0.11%)
fills in affected programs: 1146 -> 1121 (-2.18%)
helped: 7
HURT: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit expands the current memory access enum to contain the extra
two bits provided for images. We choose to follow the SPIR-V convention
of NonReadable and NonWriteable because readonly implies that you *can*
read so readonly + writeonly doesn't make as much sense as NonReadable +
NonWriteable.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Having the array length component stored in .z was a small convenience
for the ISL image param filling code and an annoyance in the NIR
lowering code. The only convenience of treating 1D arrays like 2D
arrays in the lowering code is in the address calculation code so let's
put all the complexity there as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit moves our storage image format conversion codegen into NIR
instead of doing it in the back-end. This has the advantage of letting
us run it through NIR's optimizer which is pretty effective at shrinking
things down. In the common case of rgba8, the number of instructions
emitted after NIR is done with it is half of what it was with the
lowering happening in the back-end. On the downside, the back-end's
lowering is able to directly use predicates and the NIR lowering has to
use IFs.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166910 -> 15166872 (<.01%)
instructions in affected programs: 5895 -> 5857 (-0.64%)
helped: 15
HURT: 0
Clearly, we don't have that much image_load_store happening in the
shaders in shader-db....
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No shader-db changes on any Intel platform.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This greatly simplifies the next patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
All of the other brw_*_desc functions take a devinfo parameter, and all
of the others at least have an assert that uses it. Keep the parameter,
but mark it as unused.
Silences 37 warnings like:
In file included from src/intel/common/gen_disasm.c:27:0:
src/intel/compiler/brw_eu.h: In function ‘brw_pixel_interp_desc’:
src/intel/compiler/brw_eu.h:377:53: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
brw_pixel_interp_desc(const struct gen_device_info *devinfo,
^~~~~~~
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Adds suppport for INTEL_fragment_shader_ordering. We achieve
the fragment ordering by using the same instruction as for
beginInvocationInterlockARB() which is by issuing a memory
fence via sendc.
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
INTEL_DEBUG=hex prints 32 bit hex value and due to endianness of CPU
byte order is reversed. In order to disassemble binary files, print
each byte instead of 32 bit hex value.
v2: Print blank spaces in order to vertically align output of compacted
instructions hex value with uncompacted instructions hex value.
(Matt Turner)
v3: Fix line wrap at correct length
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We have to be a bit careful with this one because we want it to run in
the optimization loop but only in the first brw_nir_optimize call.
Later calls assume that we've lowered away copy_deref instructions and
we don't want to introduce any more.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15176942 -> 15176942 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
In spite of the lack of any shader-db improvement, this patch completely
eliminates spilling in the Batman: Arkham City tessellation shaders.
This is because we are now able to detect that the temporary array
created by DXVK for storing TCS inputs is a copy of the input arrays and
use indirect URB reads instead of making a copy of 4.5 KiB of input data
and then indirecting on it with if-ladders.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Shader-db results on Kaby Lake:
total instructions in shared programs: 15177605 -> 15176765 (<.01%)
instructions in affected programs: 4259 -> 3419 (-19.72%)
helped: 1
HURT: 0
total spills in shared programs: 10954 -> 10855 (-0.90%)
spills in affected programs: 295 -> 196 (-33.56%)
helped: 1
HURT: 0
total fills in shared programs: 22222 -> 22117 (-0.47%)
fills in affected programs: 417 -> 312 (-25.18%)
helped: 1
HURT: 0
The helped shader is from the OglCSDof synmark test. On my Kaby Lake
laptop, the actual framerate of the benchmark didn't appear to improve
beyond the noise.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We call structure splitting once because it is guaranteed to split all
the structures in the entire shader in one go. We call array splitting
in the loop in case future optimizations turn indirects into direct
dereferences and we can split more arrays.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15177605 -> 15177605 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
This is unsurprising because nir_lower_vars_to_ssa already effectively
does structure and array splitting internally. It doesn't actually
split the variables but it's ability to reason about aliasing in the
presence of arrays and structures and pick out scalars or vectors to be
lowered to SSA values is fairly advanced.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
v2: Split changes to the message type field to another patch. Suggested
by Caio.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This is necessary for a new Gen9 message type that will be added in the
next patch. There are also Gen8 message types that need the extra bit
(mostly for bindless).
v2: Split off from the next patch. Suggested by Caio.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Commit 4434591bf5 caused substantially more URB messages in
geometry and tessellation shaders. Before we can really enable this
sort of optimization, We either need some way of combining them back
together into vectors or we need to do cross-stage vector element
elimination without splitting everything into scalars.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510
Fixes: 4434591bf5 "intel/nir: Call nir_lower_io_to_scalar_early"
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Now that all the build scripts are compatible with both Python 2 and 3,
we can flip the switch and tell Meson to use the latter.
Since Meson already depends on Python 3 anyway, this means we don't need
two different Python stacks to build Mesa.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
When the SIMD16 Gen4-5 fragment shader payload contains source depth
(g2-3), destination stencil (g4), and destination depth (g5-6), the
single register of stencil makes the destination depth unaligned.
We were generating this instruction in the RT write payload setup:
mov(16) m14<1>F g5<8,8,1>F { align1 compr };
which is illegal, instructions with a source region spanning more than
one register need to be aligned to even registers. This is because the
hardware implicitly does (nr | 1) instead of (nr + 1) when splitting the
compressed instruction into two mov(8)'s.
I believe this would cause the hardware to load g5 twice, replicating
subspan 0-1's destination depth to subspan 2-3. This showed up as 2x2
artifact blocks in both TIS-100 and Reicast.
Normally, we rely on the register allocator to even-align our virtual
GRFs. But we don't control the payload, so we need to lower SIMD widths
to make it work. To fix this, we teach lower_simd_width about the
restriction, and then call it again after lower_load_payload (which is
what generates the offending MOV).
Fixes: 8aee87fe4c (i965: Use SIMD16 instead of SIMD8 on Gen4 when possible.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107212
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=13728
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Diego Viola <diego.viola@gmail.com>
During code review, Jason pointed out that:
2b3064c073 "i965, anv: Use INTEL_DEBUG for disk_cache driver flags"
Didn't account for INTEL_SCALER_* environment variables.
To fix this, let the compiler return the disk_cache driver flags.
Another possible fix would be to pull the INTEL_SCALER_* into
INTEL_DEBUG bits, but as we are currently using 41 of 64 bits, I
didn't think it was a good use of 4 more of these bits. (5 since
INTEL_PRECISE_TRIG needs to be accounted for as well.)
Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>