Stop using brw_compiler to lower the final binding table indices for
surface access. This is done by simply not setting the
'prog_data->binding_table.*_start' fields. Then make the driver
perform this lowering.
This is a better place to perfom the binding table assignments, since
the driver has more information and will also later consume those
assignments to upload resources.
This also prepares us for two changes: use ibc without having to
implement binding table logic there; and remove unused entries from
the binding table.
Since the `block` field in brw_ubo_range now refers to the final
binding table index, we need to adjust it before using to index
shs->constbuf.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We'll change iris to perform lowering of the binding table indices
earlier (before the backend kick in), but the backend compiler uses
the result of the analysis to identify load_ubo intrinsics, so we do
the analysis after the lowering to have the right indices.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Fix comparing addresses from formats that have more than one
component by using nir_ball_iequal(). (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Program parser allocates parameter list.
In case of parsing error some variables will not be freed.
Patch adds freeing of it.
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Previously the loop for assigning registers was bailing out early if
the register had a null source. I think the intention is that in this
case it isn’t necessary to assign a register. However it was also
missing out the part to fix up the types. This can happen if the
instruction is copy propagated to be a move from a constant half-float
input register. In that case it still needs to fix up the types.
Fixes assert in
dEQP-GLES3.functional.shaders.invariance.highp.subexpression_precision_mediump
when lowering the precision of the variables.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Though it should be fixed in RA pass, it needs to be set correctly from
the beginning according to the bitsize of NIR dest.
v2: Would be better for mad,fddx,fddy to fixup later in RA pass.
[small cleanup of fallout from imov/fmov removal fallout]
Signed-off-by: Rob Clark <robdclark@chromium.org>
It seems to handle only 32-bit values for half constant registers
within floating point opcodes according to the blob driver.
So we need to convert back to 32-bit values from 16-bit values, when a
lower precision pass is in effect.
Signed-off-by: Rob Clark <robdclark@chromium.org>
If the type of dest reg and src reg of absneg opcode are different,
it shouldn't be considered as same type mov.
This patch becomes meaningful when we start to use mediump information for
doing precision lowering to 16bit.
Signed-off-by: Rob Clark <robdclark@chromium.org>
eg. uniform mediump vec4 f;
This patch means nothing since there's no mediump lowering pass for now,
but will be meaningful when the pass land in the near future.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Previously the A5XX_SP_FS_OUTPUT_REG_HALF_PRECISION was set depending
on whether half_precision was set in the shader key. With support for
mediump precision, it is possible to have different outputs use
different precisions. That means we can’t have a global shader state
to specify it. Instead it now tries to copy the half-float-ness
from the nir_variable for the output into the ir3_shader_variant. This
is then used to decide whether to set half-precision for each output.
The a6xx version is copied from the a5xx code but it has not been
tested.
v2. [Hyunjun Ko (zzoon@igalia.com)] There's the half flag recently
added, which represents precision based on IR3_REG_HALF. Now use this
flag to avoid duplication.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Previously the code to load from a constant instruction was always
using the u32 pointer. If the constant is actually a 16-bit source
this would end up with the wrong values because the pointer would be
offset by the wrong size. This fixes it to use the u16 pointer.
Signed-off-by: Rob Clark <robdclark@chromium.org>
The aren't real instructions, and don't change # of live values, so no
point in them competing with real instructions.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For instructions that increase the # of live values, apply a threshold
to avoid scheduling them too early. And factor the net change of # of
live values that would result from scheduling an instruction, to
prioritize instructions that reduce number of live values as the number
of live values increases.
For manhattan:
total instructions in shared programs: 27869 -> 28413 (1.95%)
instructions in affected programs: 26756 -> 27300 (2.03%)
helped: 102
HURT: 87
total full in shared programs: 1903 -> 1719 (-9.67%)
full in affected programs: 1390 -> 1206 (-13.24%)
helped: 124
HURT: 9
The reduction in register usage nets ~20% gain in manhattan. (So
getting mediump support should be a huge win for gles gfxbench.)
Also significantly helps some of the more complex shadertoy shaders,
like IQ's Piano (32 to 18 regs, doubles fps).
The effect is less pronounced on smaller shaders.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Account for shader outputs and values live in any direct/indirect
successor block.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Allows the legacy matrix stacks to be manipulated without disturbing the
matrix mode selector.
Adapted from a patch from Chris Forbes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Split this out from glMatrixMode since we're about to need it
independently for EXT_DSA.
Adapted from Chris Forbes commit.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Commit a1378639ab reordered context functions initializations but broke
sctx->b.resource_copy_region init when using AMD_DEBUG=forcedma.
In this case sctx->dma_copy was assigned a value after being used in:
sctx->b.resource_copy_region = sctx->dma_copy;
This commit moves the FORCE_DMA special case after sctx->dma_copy initialization.
See https://bugs.freedesktop.org/show_bug.cgi?id=110422
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Recently PIPE_CAP_MAX_FRAMES_IN_FLIGHT was changed from 2
to 1:
20909284f2
No driver seems to overwrite the default value.
One user reports severe regressions for some games.
For now, revert to the value 2 for nine.
Cc: "19.1" mesa-stable@lists.freedesktop.org
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Now that we have the util function for the default values, we can get
rid of the boilerplate.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
From the Vulkan spec 1.1.108:
"vkCmdCopyQueryPoolResults is guaranteed to see the effect of
previous uses of vkCmdResetQueryPool in the same queue, without any
additional synchronization."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: c1275052 "nir: add type information to load uniform/input and store output intrinsics"
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Tested-by: Erico Nunes <nunes.erico@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Now that the type gathering function look at instructions that might
have other types, return invalid type instead of crashing. That
invalid will be properly ignored later.
Fixes: c12750527b "nir: add type information to load uniform/input and store output intrinsics"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Removes the bool_to_float logic from the int_to_float pass, so that both
can be used separately. By having separate passes we have better validation
and it makes it possible to use with the lower_ftrunc option (int lowering
generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should
probably be reworked). For now we always expect lower_bool to come after
lower_int.
Also fixes f2i32 to become ftrunc and adds u2f/f2u cases.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It is treated like the vecN instructions which also have no type.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Add a "lower_bitshift" option, which disables optimizations introducing
bitshifts and lowers ishl by constant to a multiply, so that we don't have
to deal with bitshifts in int_to_float lowering.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Consts and undefs can be used as different types (common with "0" constant)
so don't copy types from consts/undefs, only to them. It doesn't entirely
solve the problem that the type given to the const could be wrong , but
now the only realistic case is with "0" which is the same when casted to
float, so it doesn't matter for lower_int_to_float.
The other change is to get type information for load input/uniform and
store output, and use that to get correct results.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This type information will be used by gather_ssa_types to get usable results
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Improvements related to the patch that removed native_integers:
* In glsl_to_nir, special cases for i2f,u2f,etc are no longer needed
* In prog_to_nir, use sge/slt and let lower_scmp lower it if needed
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We could have identical texture state for both VS and FS.. which would
result in VS state getting created first, and FS state mapping to the
identical cmdstream. Resulting in VS state getting emitted twice and no
FS state emitted.
Fixes:
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.basic_array.sampler2D_both
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.struct_in_array.sampler2D_samplerCube_both
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_both
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both
dEQP-GLES31.functional.program_uniform.by_pointer.render.array_in_struct.sampler2D_samplerCube_both
dEQP-GLES31.functional.program_uniform.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both
dEQP-GLES31.functional.program_uniform.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
If we access the address of the UBO indirectly, and there is no higher
const emitted w/ direct access (like an immediate lowered to uniform)
the assembler won't figure out the correct constlen.
Fixes:
dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_vertex
dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_fragment
dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_vertex
dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_fragment
Signed-off-by: Rob Clark <robdclark@chromium.org>
Fixes dEQP-GLES2.functional.multisampled_render_to_texture.readpixels
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
Blob is also setting the .l bit, and it seems to solve some intermittent
failures with a couple of deqp's:
dEQP-GLES31.functional.image_load_store.2d.qualifiers.coherent_r32i
dEQP-GLES31.functional.image_load_store.2d.qualifiers.volatile_r32f
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
It seems like (ei) handling doesn't sync on (ss), so we could end up in
a situation where we release varying storage before an ldlv for flat
shaded varyings completes. Keep track if we've done an (ss) since the
last ldlv, and if not add (ss) flag to last_input which gets (ei).
Noticed with dEQP-GLES3.functional.fragment_out.random.24 and
dEQP-GLES3.functional.fragment_out.random.27, which previously passed by
luck because ir3_sched ordered instructions in a way that resulted in a
lucky (ss).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
The special handling for last_input assumes that all the varying loads
are in the first block. Add an assert to catch if anyone breaks that
assumption.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
While we're here, copy the size table from set.c to get rid of hard tabs
in the hash_table.c version.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Compilation times with my shader-db database:
Difference at 95.0% confidence
-1.22312 +/- 0.726033
-0.283979% +/- 0.168254%
(Student's t, pooled s = 1.02177)
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This should be at least as fast as using fast_idiv_by_const, and has the
advantage that the precomputation is simple enough to be evaluated at
Mesa-compile time for hash tables and sets which have a fixed table of
possible divisors.
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>