We need to split DF instructions in two on IVB/BYT as it needs an
execsize 8 to process 4 DF values (one GRF in total).
v2:
- Rename helper and make it static inline function (Matt).
- Fix indention and add braces (Matt).
v3:
- Don't edit IR instruction when doubling exec_size (Curro)
- Add comment into the code (Curro).
- Manage ARF registers like the others (Curro)
v4:
- Add get_exec_type() function and use it to calculate the execution
size.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Take
destination type as execution type where there is no valid source.
Assert-fail if the deduced execution type is byte. Clarify comment
in get_lowered_simd_width(). Move SIMD width workaround outside of
'if (...inst->size_written > REG_SIZE)' conditional block, since the
problem should be independent of whether the amount of data written
by the instruction is greater or lower than a GRF. Drop redundant
is_ivb_df definition. Drop bogus inst->exec_size < 8 check.
Simplify channel group assertion. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The hardware applies the same channel enable signals to both halves of
the compressed instruction which will be just wrong under non-uniform
control flow. Fix this by splitting those instructions to SIMD4.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
According to the IVB and HSW PRMs:
"2.When the destination requires two registers and the sources are
indirect, the sources must use 1x1 regioning mode."
So for DF instructions the execution size is not limited by the number
of address registers that are available, but by the EU decompression
logic not handling VxH indirect addressing correctly.
This patch limits the SIMD width to 4 in this case.
v2:
- Fix typo (Matt).
- Fix condition (Curro)
v3:
- Add spec quote (Curro)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When converting a DF to 32-bit conversions, we set dst stride to 2,
to fulfill alignment restrictions because the upper Dword of every
Qword will be written with undefined value.
But in IVB/BYT, this is not necessary, as each DF conversion already
writes 2, the first one the real value, and the second one a 0.
That is, IVB/BYT already set stride = 2 implicitly, so we must set it to
1 explicitly to avoid ending up with stride = 4.
v2:
- Fix typo (Matt)
v3:
- Fix stride in the destination's brw_reg, don't modity IR (Curro)
v4:
- Remove 'is_dst' argument of brw_reg_from_fs_reg() (Curro)
- Fix comment (Curro).
- Relax hstride assert (Curro)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Minor spelling fixes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2:
- Change the name to lower_conversions.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This reverts commit 7dccd38b40.
d2x pass fixes SEL instructions when there is a type conversion
by doing a SEL without type conversion and then convert the result.
This pass also takes into account the non-uniform control flow.
Then, 7dccd38b40 is not needed anymore.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Generalize it to lower any unsupported narrower conversion.
v2 (Curro):
- Add supports_type_conversion()
- Reuse existing intruction instead of cloning it.
- Generalize d2x to narrower and equal size conversions.
v3 (Curro):
- Make supports_type_conversion() const and improve it.
- Use foreach_block_and_inst to process added instructions.
- Simplify code.
- Add assert and improve comments.
- Remove redundant mov.
- Remove useless comment.
- Remove saturate == false assert and add support for saturation
when fixing the conversion.
- Add get_exec_type() function.
v4 (Curro):
- Use get_exec_type() function to get sources' type.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
On HSW+, scalar DF sources can be accessed using the normal <0,1,0>
region, but on IVB and BYT DF regions must be programmed in terms of
floats. A <0,2,1> region accomplishes this.
v2:
- Apply region <0,2,1> in brw_reg_from_fs_reg() (Curro).
v3:
- Added comment explaining the reason (Curro).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Then the SIMD lowering pass will get rid of any compressed instructions with scalar
source (whether force_writemask_all or not) and we avoid hitting the Gen7 region
decompression bug.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
In IVB and BYT, both regioning parameters and execution sizes are measured as
32-bits element size.
So when we have something like:
mov(8) g2<1>DF g3<4,4,1>DF
We are not actually moving 8 doubles (our intention), but 4 doubles.
We need to double the parameters to cope with this issue. However,
horizontal strides don't behave as they're supposed to on IVB
for DF regions, they will cause each 32-bit half of DF sources to be
strided individually, and doubling the value won't make any difference.
v2:
- Use devinfo directly (Matt).
- Use Baytrail instead of Valleview (Matt).
- Use IvyBridge instead of Ivy (Matt)
- Double the exec_size in code emission (Curro)
v3:
- Change hstride doubling by an assert and fix commit log (Curro).
- Substitute remaining compiler->devinfo by devinfo (Curro).
v4:
- Fix comment (Curro).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The execution data size is the biggest type size of any instruction
operand.
We will use it to know if the instruction deals with DF, because in Ivy
we need to double the execution size and regioning parameters.
v2:
- Fix typo in commit log (Matt)
- Use static inline function instead of fs_inst's method (Curro).
- Define the result as a constant (Curro).
- Fix indentation (Matt).
- Add braces to nested control flow (Matt).
v3 (Curro):
- Add get_exec_type() and other auxiliary functions and use them to
calculate its size.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Fix deduced
execution type for integer vector types. Take destination type as
execution type where there is no valid source. Assert-fail if the
deduced execution type is byte. Move into brw_ir_fs.h header for
consistency with the VEC4 back-end. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
On IVB/BYT, region parameters and execution size for DF are in terms of
32-bit elements, so they are doubled. For evaluating the validity of an
instruction, we halve them.
v2 (Sam):
- Add comments.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
4-wide DF operations where NibCtrl applies require and execsize of 8
in IvyBridge/BayTrail.
v2:
- Refactor NibCtrl printing (Matt)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Fix the accounting for memory usage of userptr buffers, which has been wrong
forever (or at least for a long time).
Also initialize flags. Without this initialization, the sparse buffer flag
might end up being set, which leads to staging buffers being used unnecessarily
(and incorrectly) in transfers to or from userptr buffers.
This works around VM faults that occur with the radeon kernel module when
running piglit ./bin/amd_pinned_memory decrement-offset map-buffer -auto
Fixes: e077c5fe65 ("gallium/radeon: transfers and invalidation for sparse buffers")
Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Use push descriptors instead of temp descriptor sets.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This allows meta to use push descriptors without disturbing user
push descriptors.
radv_meta_push_descriptor_set differs from vkCmdPushDescriptorSetKHR
in that partial updates are not supported; all descriptors used in
subsequent draw commands must be pushed at the same time.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The Vulkan driver was originally written under the assumption that
VK_ATTACHMENT_UNUSED was basically just for depth-stencil attachments.
However, the way things fell together, VK_ATTACHMENT_UNUSED can be used
anywhere in the subpass description. The blorp-based clear and resolve
code has a bunch of places where we walk lists of attachments and we
weren't handling VK_ATTACHMENT_UNUSED everywhere. This commit should
fix all of them.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
We're about to start requiring it in yet another case and calculating
exactly when one is needed is starting to get prohibitively expensive.
A single surface state doesn't take up that much space so we may as well
create one all the time.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Depending on pipe caps they can be writable in all vertex processing
stages, but only the output of the last stage counts.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Enable code sanitizers by adding -fsanitize=$foo flags for the compiler
and linker.
In addition, this also disables checking for undefined symbols: running
the address sanitizer requires additional symbols which should be provided
by a preloaded libasan.so (preloaded for hooking into malloc & friends
globally), and the undefined symbols check gets tripped up by that.
Running the tests works normally via `make check`, but shows additional
failures with the address sanitizer due to memory leaks that seem to be
mostly leaks in the tests themselves. I believe those failures should
really be fixed. In the mean-time, you can set
export ASAN_OPTIONS=detect_leaks=0
to only check for more serious error types.
v2:
- fail reasonably when an unsupported sanitize flag is given (Eric Engestrom)
Reviewed-by: Bartosz Tomczyk <bartosz.tomczyk86@gmail.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This makes it much easier to throw together a bit of dynamic state. It
also automatically handles flushing so you don't accidentally forget.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This patch enables multisample antialiasing in the OpenSWR software renderer.
MSAA is a proof-of-concept/work-in-progress with bug fixes and performance
on the way. We wanted to get the changes out now to allow several customers
to begin experimenting with MSAA in a software renderer. So as not to
impact current customers, MSAA is turned off by default - previous
functionality and performance remain intact. It is easily enabled via
environment variables, as described below.
It has only been tested with the glx-lib winsys. The intention is to
enable other state-trackers, both Windows and Linux and more fully support
FBOs.
There are 2 environment variables that affect behavior:
* SWR_MSAA_FORCE_ENABLE - force MSAA on, for apps that are not designed
for MSAA... Beware, results will vary. This is mainly for testing.
* SWR_MSAA_MAX_SAMPLE_COUNT - sets maximum supported number of
samples (1,2,4,8,16), or 0 to disable MSAA altogether.
(The default is currently 0.)
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
Removed unnecessary and probably wrong PIPE_BIND_SCANOUT and PIPE_BIND_SHARED
flags in favor of check on single PIPE_BIND_DISPLAY_TARGET flag.
Reference llvmpipe change <bee4c7718a3bd57e3d99f0913d9081cd13fe5fd>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
The context now contains SIMD vectors which must be aligned (specifically
samplePositions in the rastState in the derived state). Failure to align
can result in segv crash on unaligned memory access in vector
instructions.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
The kernel returns frequency in kHz, so to convert to nanosecond
interval that Vulkan uses the dividend should be 1000000.0 and not
100000.0.
This fixes the GPU graph in DOOM and matches the amdgpu-pro blob.
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
These can operate on MEMORY[], in addition to BUFFER[] and IMAGE[]
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Probably should have flipped the switch a long time ago, since it
doesn't seem to cause any problems and is a nice perf boost in a number
of cases.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Want to move one of these under ir3_block, so that gives a reason to
migrate the remaining malloc/realloc to ralloc.
Signed-off-by: Rob Clark <robdclark@gmail.com>
When publishing this spec on the OpenGL ES registry, Jon Leech noticed
that it didn't actually mention what the ES dependencies and
interactions were. I looked at extensions_table.h and noted that we
expose it in ES 3.0 contexts, and he added the obvious spec texts.
The updated copy also contains our official extension number.
https://github.com/KhronosGroup/OpenGL-Registry/issues/3
Acked-by: Matt Turner <mattst88@gmail.com>