compute.c: In function ‘launch_grid’:
compute.c:435:20: warning: assignment discards ‘const’ qualifier from pointer target type [enabled by default]
info.input = input;
^
Maybe the pipe_grid_info::input field should be const void *?
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This fixes many CTS cases, but will require an update to the kernel
command parser register whitelist. (The CS GPRs and TIMESTAMP
registers need to be whitelisted.)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
From Section 4.4.5 (Uniform and Shader Storage Block Layout
Qualifiers) of the OpenGL 4.50 spec:
"The align qualifier makes the start of each block member have a
minimum byte alignment. It does not affect the internal layout
within each member, which will still follow the std140 or std430
rules. The specified alignment must be a power of 2, or a
compile-time error results.
The actual alignment of a member will be the greater of the
specified align alignment and the standard (e.g., std140) base
alignment for the member's type. The actual offset of a member is
computed as follows: If offset was declared, start with that
offset, otherwise start with the next available offset. If the
resulting offset is not a multiple of the actual alignment,
increase it to the first offset that is a multiple of the actual
alignment. This results in the actual offset the member will have.
When align is applied to an array, it affects only the start of
the array, not the array's internal stride. Both an offset and an
align qualifier can be specified on a declaration.
The align qualifier, when used on a block, has the same effect as
qualifying each member with the same align value as declared on
the block, and gets the same compile-time results and errors as if
this had been done. As described in general earlier, an individual
member can specify its own align, which overrides the block-level
align, but just for that member.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The old comment was for the location not the offset, we now use
the field for block members so mention that also.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
In this patch we also copy the offset value from the ast and
implement offset linking rules by adding it to the record_compare()
function.
From Section 4.4.5 (Uniform and Shader Storage Block Layout Qualifiers)
of the GLSL 4.50 spec:
"Two blocks linked together in the same program with the same block
name must have the exact same set of members qualified with
offset and their integral-constant-expression values must be the
same, or a link-time error results."
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
This implements the rules for the offset qualifier on block members.
From Section 4.4.5 (Uniform and Shader Storage Block Layout Qualifiers)
of the GLSL 4.50 spec:
"The offset qualifier can only be used on block members of blocks
declared with std140 or std430 layouts."
...
"It is a compile-time error to specify an offset that is smaller than
the offset of the previous member in the block or that lies within the
previous member of the block."
...
"The specified offset must be a multiple of the base alignment of the
type of the block member it qualifies, or a compile-time error results."
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Global in validation is already handled, this will do the validation
for variables, blocks and block members.
This fixes some CTS tests for the new enhanced layouts transform
feedback qualifiers.
V2: add some more valid input flags
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Previously interface blocks were giving the global default flags of
uniform blocks. This meant we could not check for invalid qualifiers
on interface blocks because they always contained invalid flags.
This changes parsing so that interface blocks now get an empty
set of layouts.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
If the following patch we will stop setting these layouts by default
on interface blocks, so we need to do this to avoid hitting the
assert.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
v2: Subtract the baseMipLevel and baseArrayLayer (Jason)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
When BaseLevel > 0, we magnify the dimensions to fill out the size of
miplevels [0..BaseLevel). In particular, this was magnifying depth,
thinking that the depth doubles at each level. This is perfectly
reasonable for 3D textures, but dead wrong for array textures.
Changing the depth != 1 condition to a target == GL_TEXTURE_3D check
should make this only happen in the appropriate cases.
Fixes about 32 dEQP tests:
- dEQP-GLES31.functional.texture.gather.*.level_{1,2}
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
opt_vector_float() transforms several scalar MOV operations to a single
vectorial MOV.
This is done when those MOV covers all the components of the destination
register. So something like:
mov vgrf3.0.xy:D, 0D
mov vgrf3.0.w:D, 1065353216D
mov vgrf3.0.z:D, 0D
is transformed in:
mov vgrf3.0:F, [0F, 0F, 0F, 1F]
But there are cases where not all the components are written. For
example, in:
mov vgrf2.0.x:D, 1073741824D
mov vgrf3.0.xy:D, 0D
mov vgrf3.0.w:D, 1065353216D
mov vgrf4.0.xy:D, 1065353216D
mov vgrf4.0.w:D, 0D
mov vgrf6.0:UD, u4.xyzw:UD
Nor vgrf3 nor vgrf4 .z components are written, so the optimization is
not applied.
But it could be applied anyway with the components covered, using a
writemask to select the ones written. So we could transform it in:
mov vgrf2.0.x:D, 1073741824D
mov vgrf3.0.xyw:F, [0F, 0F, 0F, 1F]
mov vgrf4.0.xyw:F, [1F, 1F, 0F, 0F]
mov vgrf6.0:UD, u4.xyzw:UD
This commit does precisely that: opportunistically apply
opt_vector_float() when possible.
total instructions in shared programs: 7124660 -> 7114784 (-0.14%)
instructions in affected programs: 443078 -> 433202 (-2.23%)
helped: 4998
HURT: 0
total cycles in shared programs: 64757760 -> 64728016 (-0.05%)
cycles in affected programs: 1401686 -> 1371942 (-2.12%)
helped: 3243
HURT: 38
v2: change vectorize_mov() signature (Matt).
v3: take in account predicates (Juan).
v4 [mattst88]: Update shader-db numbers. Fix some whitespace issues.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
screen may still be used by other resources that are not yet freed.
To correctly fix this there will be a need to account for resources
differently, but this quick fix is not any worse than the original
code that leaked screens anyway.
Reviewed-by: Brian Paul <brianp@vmware.com>
Match the comment stated above the assignment.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
No shader-db changes, but does recognize some extract_u16 which enables
the next patch to optimize some code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Two shaders that appear in Unigine benchmarks (Heaven and Valley) unpack
three bytes from an integer and convert each into a float:
float((val >> 16u) & 0xffu)
float((val >> 8u) & 0xffu)
float((val >> 0u) & 0xffu)
Instead of shifting, masking, and type converting like this:
shr(8) g15<1>UD g25<8,8,1>UD 0x00000010UD
and(8) g16<1>UD g15<8,8,1>UD 0x000000ffUD
mov(8) g17<1>F g16<8,8,1>UD
shr(8) g18<1>UD g25<8,8,1>UD 0x00000008UD
and(8) g19<1>UD g18<8,8,1>UD 0x000000ffUD
mov(8) g20<1>F g19<8,8,1>UD
and(8) g21<1>UD g25<8,8,1>UD 0x000000ffUD
mov(8) g22<1>F g21<8,8,1>UD
i965 can simply extract a byte and convert to float in a single
instruction:
mov(8) g17<1>F g25.2<32,8,4>UB
mov(8) g20<1>F g25.1<32,8,4>UB
mov(8) g22<1>F g25.0<32,8,4>UB
This patch implements the first step: recognizing byte extraction. A
later patch will optimize out the conversion to float.
instructions in affected programs: 28568 -> 27450 (-3.91%)
helped: 7
cycles in affected programs: 210076 -> 203144 (-3.30%)
helped: 7
This patch decreases the number of instructions in the two Unigine
programs by:
#1721: 4520 -> 4374 instructions (-3.23%)
#1706: 3752 -> 3582 instructions (-4.53%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
sample_c is backwards from what GL and Vulkan expect.
See intel_state.c in i965.
v2: Drop unused vk_to_gen_compare_op.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This resolves some order dependencies between the already existing
callback the newly created one.
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is a limit of 10 display connections, which was a
problem for apps/tests that were continuously opening/closing display
connections.
This fix uses XAddExtension() and XESetCloseDisplay() to keep track
of the status of the display connections from the X server, freeing
mesa-related data as X displays get destroyed by the X server.
Poster child is the VTK "TimingTests"
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>