Drop it from x11_anv_wsi_image_create and x11_anv_wsi_image_free. The
functions are used by Wayland WSI too.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This commit just exposes the memory handle type. There's interesting we
need to do here for images. So long as the user doesn't set any crazy
environment variables such as INTEL_DEBUG=nohiz, all of the compression
formats etc. should "just work" at least for opaque handle types.
v2 (chadv):
- Rebase.
- Fix vkGetPhysicalDeviceImageFormatProperties2KHR when
handleType == 0.
- Move handleType-independency comments out of handleType-switch, in
vkGetPhysicalDeviceExternalBufferPropertiesKHX. Reduces diff in
future dma_buf patches.
Co-authored-with: Chad Versace <chadversary@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This cache allows us to easily ensure that we have a unique anv_bo for
each gem handle. We'll need this in order to support multiple-import of
memory objects and semaphores.
v2 (Jason Ekstrand):
- Reject BO imports if the size doesn't match the prime fd size as
reported by lseek().
Reviewed-by: Chad Versace <chadversary@chromium.org>
This is the trivial implementation that just exposes the extension
string but exposes zero external handle types.
Reviewed-by: Chad Versace <chadversary@chromium.org>
This is a complete but trivial implementation. It's trivial becasue We
support no external memory capabilities yet. Most of the real work in
this commit is in reworking the UUIDs advertised by the driver.
v2 (chadv):
- Fix chain traversal in vkGetPhysicalDeviceImageFormatProperties2KHR.
Extract VkPhysicalDeviceExternalImageFormatInfoKHX from the chain of
input structs, not the chain of output structs.
- In vkGetPhysicalDeviceImageFormatProperties2KHR, iterate over the
input chain and the output chain separately. Reduces diff in future
dma_buf patches.
Co-authored-with: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We're about to have more UUIDs for different things so this one really
needs to be properly labeled.
Reviewed-by: Chad Versace <chadversary@chromium.org>
The command is really operating on a Queue not a command buffer and the
nearest object to that with an allocator is VkDevice.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: "17.0 17.1" <mesa-dev@lists.freedesktop.org>
This fixes rendering corruptions in DOOM. Hopefully, it will also make
Jenkins a bit more stable as we've been seeing some random failures and
GPU hangs ever since turning on 48bit.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100620
Fixes: 651ec926fc "anv: Add support for 48-bit addresses"
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Just return earlier in that case. Also set prefix to an empty string, so
we don't get to use it undefined.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to emit BLEND_STATE, which size is 1 + 2 * nr_draw_buffers
dwords (on gen8+), but the BLEND_STATE struct length is always 17. By
marking it size 1, which is actually the size of the struct minus the
BLEND_STATE_ENTRY's, we can emit a BLEND_STATE of variable number of
entries.
For gen6 and gen7 we set length to 0, since it only contains
BLEND_STATE_ENTRY's, and no other data.
With this change, we also change the code for blorp and anv to emit only
the needed BLEND_STATE_ENTRY's, instead of always emitting 16 dwords on
gen6-7 and 17 dwords on gen8+.
v2:
- Use designated initializers on blorp and remove 0 from
initialization (Jason)
- Default entries to disabled on Vulkan (Jason)
- Rebase code.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the 'dwords' dict is empty, max(dwords.keys()) throws an exception.
This case could happen when we have an instruction that is only an array
of other structs, with variable length.
v2:
- Add another clause for empty dwords and make it work with python 3
(Dylan)
- Set the length to 0 if dwords is empty, and do not declare dw
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
'start' parameter from Group.emit_pack_function() is useless.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before this commit, when a group with count="0" is found, only one field
is added to the struct representing the instruction. This causes only
one entry to be printed by aubinator, for variable length groups.
With this commit we "detect" that there's a variable length group
(count="0") and store the offset of the last entry added to the struct
when reading the xml. When finally reading the aubdump file, we check
the size of the group and whether we have variable number of elements,
and in that case, reuse the last field to add the remaining elements.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The section of the PRM mentioned in the code comment above this table
says that this format supports the render target write message. Internal
documentation says that this format also supports alpha blending. As a
side effect, this allows CCS_D buffers to be created for images with
this format.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Until now the spilling cost calculation was neglecting the amount of
data read from the register during the spilling cost calculation.
This caused it to make suboptimal decisions in some cases leading to
higher memory bandwidth usage than necessary.
Improves Unigine Heaven performance by ~4% on BDW, reversing an
unintended FPS regression from my previous commit
147e71242c with n=12 and statistical
significance 5%. In addition SynMark2 OglCSDof performance is
improved by an additional ~5% on SKL, and a Kerbal Space Program
apitrace around the Moho planet I can provide on request improves by
~20%.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is what we use later on to compute the number of registers that
will actually get spilled to memory, so it's more likely to match
reality than the current open-coded approximation.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Curro pointed out that I should not just check for MACH, but use
the reads_accumulator_implicitly() helper, which would also prevent
the same bug with MAC and SADA2 (if we ever decide to use them).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This shuffles constants down in the reverse of what the previous
patch does and applies some simpilifications that may be made
possible from doing so.
Shader-db results BDW:
total instructions in shared programs: 12980814 -> 12977822 (-0.02%)
instructions in affected programs: 281889 -> 278897 (-1.06%)
helped: 1231
HURT: 128
total cycles in shared programs: 246562852 -> 246567288 (0.00%)
cycles in affected programs: 11271524 -> 11275960 (0.04%)
helped: 1630
HURT: 1378
V2: mark float opts as inexact
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
opt_register_coalesce() was optimizing sequences such as:
mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D
mach(8) vgrf5.xy:D, attr18.xyyy:D, attr19.xyyy:D
mov(8) m4.zw:F, vgrf5.xxxy:F
into:
mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D
mach(8) m4.zw:D, attr18.xxxy:D, attr19.xxxy:D
This doesn't work - if we're going to reswizzle MACH, we'd need to
reswizzle the MUL as well. Here, the MUL fills the accumulator's .zw
components with attr18.yy * attr19.yy. But the MACH instruction expects
.z to contain attr18.x * attr19.x. Bogus results ensue.
No change in shader-db on Haswell. Prevents regressions in Timothy's
patches to use enhanced layouts for varying packing (which rearrange
code just enough to trigger this pre-existing bug, but were fine
themselves).
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This breaks the guts of MI_MATH (the instruction part) out into its own
structure with proper named values.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed by: Iago Toral Quiroga <itoral@igalia.com>
The description under RENDER_SURFACE_STATE::RedClearColor says,
For Sampling Engine Multisampled Surfaces and Render Targets:
Specifies the clear value for the red channel.
For Other Surfaces:
This field is ignored.
This means that the sampler on BDW doesn't support CCS.
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Starting positions >= 32 are not part of the header, rather than >.
Caught by Coverity, which found that "bits <<= field->start" may shift
by 32, which has undefined behavior.
CID: 1404968
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This prevents a user from using a cache created on one hardware
generation on a different one. Of course, with Intel hardware, this
requires moving their drive from one machine to another but it's still
possible and we should prevent it.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
In commit c35fa7a, we changed the "width" of DF source registers to 2,
which is conceptually fine. Unfortunately a VertStride of 2 is not
allowed by align16 instructions on IVB/BYT, and the regular VertStride
of 4 works fine in any case.
See generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/vs-round-double.shader_test
for example:
cmp.ge.f0(8) g18<1>DF g1<0>.xyxyDF -g8<2>DF { align16 1Q };
ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed
cmp.ge.f0(8) g19<1>DF g1<0>.xyxyDF -g9<2>DF { align16 2N };
ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed
v2:
- Add spec quote (Curro).
- Change the condition to only BRW_VERTICAL_STRIDE_2 (Curro)
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This is required for correctness in presence of multiple 4-wide flag
writes (e.g. 4-wide instructions with a conditional mod set) which
update a different portion of the same 8-bit flag subregister.
Right now we keep track of flag dataflow with 8-bit granularity and
consider flag writes to have killed any previous definition of the
same subregister even if the write was less than 8 channels wide,
which can cause live flag register updates to be dead
code-eliminated incorrectly.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
horiz_offset() shouldn't be doing anything for scalar registers,
because all channels of any SIMD instructions will end up reading or
writing the same component of the register, so shifting the register
offset would be wrong.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Re-implement in terms of is_uniform() for
simplicity. Pass argument by const reference. Clarify commit
message. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Otherwise for a pack_double_2x32_split opcode, we emit:
vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134
mov(8) g5<1>UD g5<4>.xUD { align16 1Q compacted };
mov(8) g7<2>UD g5<4,4,1>UD { align1 1Q };
ERROR: When the destination spans two registers, the source must span two registers
(exceptions for scalar source and packed-word to packed-dword expansion)
mov(8) g8<2>UD g5.4<4,4,1>UD { align1 2N };
ERROR: The offset from the two source registers must be the same
mov(8) g5<1>UD g6<4>.xUD { align16 1Q compacted };
mov(8) g7.1<2>UD g5<4,4,1>UD { align1 1Q };
ERROR: When the destination spans two registers, the source must span two registers
(exceptions for scalar source and packed-word to packed-dword expansion)
mov(8) g8.1<2>UD g5.4<4,4,1>UD { align1 2N };
ERROR: The offset from the two source registers must be the same
The intention was to emit mov(4)s for the instructions that have ERROR
annotations.
See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test
for example.
v2 (Samuel):
- Instead of setting the exec size to a fixed value, don't double it
(Curro).
- Add PICK_{HIGH,LOW}_32BIT to the condition.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Trivial rebase changes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Drop useless vec4_visitor dependencies. Demote to
static stand-alone function. Don't write unused components in the
result. Use vec4_builder interface for register allocation. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Take into account offset values less than a full register (32 bytes)
when getting the var from register.
This is required when dealing with an operation that writes half of the
register (like one d2x in IVB/BYT, which uses exec_size == 4).
v2:
- Take in account this offset < 32 in liveness analysis too (Curro)
v3:
- Change formula in var_from_reg() (Curro)
- Remove useless changes (Curro)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
On IVB, DF instructions have lowered the SIMD width to 4 but the
exec_size will be later doubled. Fix the assert to avoid crashing in
this case.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Simplify assert. Except for the 'inst->group % 4
== 0' part the assertion was redundant with the previous assertion. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This way we can set the destination type as double to all these new opcodes,
avoiding any optimizer's confusion that was happening before.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Drop no_spill workaround originally needed due to
the bogus destination type of VEC4_OPCODE_FROM_DOUBLE. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When doing a 64-bit to a smaller data type size conversion, the destination should
be aligned to 64-bits. Because of that, we need to gather the data after the
actual conversion.
Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but
now we split them explicitely in two different instructions:
VEC4_OPCODE_FROM_DOUBLE just do the conversion and
VEC4_OPCODE_PICK_LOW_32BIT will gather the data.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
In the generator we must generate slightly different code for
Ivybridge/Baytrail, because of the way the stride works in
this hardware.
v2:
- Use stride and don't need to fix dst (Curro)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Keep the original type when dealing with null registers. Especially
because we do no want to introduce an implicit conversion between
types that could affect the conditional flags.
This affects especially when the original type is DF, and we are working
on Ivybridge/Baytrail.
v2 (Curro)
- Fix typo.
- Use retype() instead of applying the type directly.
- Remove unneeded retype.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We need to split DF instructions in two on IVB/BYT as it needs an
execsize 8 to process 4 DF values (one GRF in total).
v2:
- Rename helper and make it static inline function (Matt).
- Fix indention and add braces (Matt).
v3:
- Don't edit IR instruction when doubling exec_size (Curro)
- Add comment into the code (Curro).
- Manage ARF registers like the others (Curro)
v4:
- Add get_exec_type() function and use it to calculate the execution
size.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Take
destination type as execution type where there is no valid source.
Assert-fail if the deduced execution type is byte. Clarify comment
in get_lowered_simd_width(). Move SIMD width workaround outside of
'if (...inst->size_written > REG_SIZE)' conditional block, since the
problem should be independent of whether the amount of data written
by the instruction is greater or lower than a GRF. Drop redundant
is_ivb_df definition. Drop bogus inst->exec_size < 8 check.
Simplify channel group assertion. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The hardware applies the same channel enable signals to both halves of
the compressed instruction which will be just wrong under non-uniform
control flow. Fix this by splitting those instructions to SIMD4.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
According to the IVB and HSW PRMs:
"2.When the destination requires two registers and the sources are
indirect, the sources must use 1x1 regioning mode."
So for DF instructions the execution size is not limited by the number
of address registers that are available, but by the EU decompression
logic not handling VxH indirect addressing correctly.
This patch limits the SIMD width to 4 in this case.
v2:
- Fix typo (Matt).
- Fix condition (Curro)
v3:
- Add spec quote (Curro)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When converting a DF to 32-bit conversions, we set dst stride to 2,
to fulfill alignment restrictions because the upper Dword of every
Qword will be written with undefined value.
But in IVB/BYT, this is not necessary, as each DF conversion already
writes 2, the first one the real value, and the second one a 0.
That is, IVB/BYT already set stride = 2 implicitly, so we must set it to
1 explicitly to avoid ending up with stride = 4.
v2:
- Fix typo (Matt)
v3:
- Fix stride in the destination's brw_reg, don't modity IR (Curro)
v4:
- Remove 'is_dst' argument of brw_reg_from_fs_reg() (Curro)
- Fix comment (Curro).
- Relax hstride assert (Curro)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Minor spelling fixes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2:
- Change the name to lower_conversions.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>