The parameter lists were not being created nor filled since i965
doesn't use them. In Gallium they are used for uniform handling, so
add a way to fill them.
The gl_uniform_storage struct got two new fields that let us go
- from a Parameter to the matching UniformStorage and,
- from the variable to the *first* UniformStorage
without relying on names -- since they are optional for ARB_gl_spirv.
Later patches will make use of them.
v2: Do not fill parameters for i965. (Timothy)
Use uint32_t for the new attributes. (Marek)
v3: Serialize the new fields. (Timothy)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The gl_register_file doesn't need 16 bits, so shorten it and use the
extra room for 'Padded' (also mark it as a single bit). This shrinks
the struct size from 32 bytes to 24 bytes.
See also 4794fbc86e ("mesa: reduce the size of gl_program_parameter")
that shrinked from 40 to 24 and later 7536af670b ("glsl: fix shader
cache for packed param list") that added `Padded`.
v2: Use just 5 bits for gl_register_file. (Timothy)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Every uniform that have the "gl_" name also have some state slots. So
use the state_slots like we did in 57b6184931 ("i965: account for NIR
uniforms without name").
This removes the dependency on names, which are optional when using
ARB_gl_spirv.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Don't use the UNMAPPED_UNIFORM_LOC (-1) to set the unsigned
max_uniform_location. Those unmapped uniforms don't have to be
accounted at this point.
Fixes: 7a9e5cdfbb ("nir/linker: Add gl_nir_link_uniforms()")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
v5: - Move is windows check down to make code more robust
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This mirrors the haiku build which uses a platform.
v2: - Fix some rebase problems
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
v4: - Don't wrap a single file in a list to match mesa style
- Use null_dep instead of empty list
Reviewed-by: Eric Anholt <eric@anholt.net> (v3)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
v4: - Don't run checks on Windows that will always fail
Reviewed-by: Eric Anholt <eric@anholt.net> (v3)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Which will allow meson to build a shared glapi build with mingw.
v2: - Add symbol to symbol check test
Reviewed-by: Eric Anholt <eric@anholt.net> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
It doesn't compile due to undefined symbols, which are in
libglapi_static, so I don't understand the problem.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Currently the praser for s expressions assumes that newlines will be \n,
resulting in incorrect parsing on windows, where the newline is \r\n.
This patch just adds \r? to the regular expression used to parse the s
expressions, which fixes at 1 test on windows.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Since the system value refactor, we've accidentally only been setting
cbuf->buffer_size in the UBO case, and not in the uploaded-constants
case. We use cbuf->buffer_size to fill out the SURFACE_STATE entry,
so it needs to be initialized in both cases.
Fixes: 3b6d787e40 ("iris: move sysvals to their own constant buffer")
Documentation list all of those as "UHD".
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111629
BSpec: 33266
Acked-by: Tapani Pälli <tapani.palli@intel.com>
This fixes some interactions when NGG GS is enabled. It fixes:
- dEQP-VK.clipping.user_defined.clip_cull_distance_dynamic_index.*geom*
- dEQP-VK.tessellation.geometry_interaction.passthrough.*
For some reasons, using the computed ESGS ring size randomly hangs
with CTS. For now, just use the maximum LDS size for ESGS.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This shouldn't be in NIR->LLVM because ACO also needs the shader
info. This will also help for computing some NGG values that are
necessary for declaring LDS symbols.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Only the pipeline layout and the shader keys are needed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
ac_surface computes it for amdgpu.
radeon_drm_surface computes it for radeon.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
The VBO module maps a buffer with GL_MAP_FLUSH_EXPLICIT, and keeps
appending data, and calling glFlushMappedBufferRange(). We were
invalidating the VF cache each time it flushed a new range, which
results in a ton of VF flushes.
If the contents of the destination in the target range are undefined
(never even possibly written), this patch makes us assume that it's
likely not in the cache and so cache invalidations are required. If
the destination range is defined, we continue cache flushing as we may
need to expunge stale data.
This eliminates 88% of the VF cache invalidates on Manhattan 3.0.
Improves performance in Manhattan 3.0 on my Icelake 8x8 with the GPU
frequency locked to 700Mhz by 0.376724% +/- 0.0989183% (n=10).
This cuts roughly 85% of the 3DSTATE_SAMPLER_STATE_POINTERS_PS calls in
the J2DBench images test. For some reason, the state tracker is calling
bind_sampler_state with the same sampler state in a bunch of cases.
The line stipple pattern and factor only matter if line stippling is
actually enabled. Otherwise, we can safely ignore it.
PBO upload may give us zero for line stipple information, while normal
drawing tends to give us an actual stipple pattern such as 0xffff. This
was causing us to flag IRIS_DIRTY_LINE_STIPPLE way too often, leading to
useless 3DSTATE_LINE_STIPPLE commands, which are non-pipelined and thus
very expensive.
Improves performance in Manhattan 3.0 on Skylake GT4e by
0.149261% +/- 0.0380796% (n=210). On an Icelake 8x8 with the GPU
frequency locked at 700Mhz, improves by 0.423756% +/- 0.222843% (n=3).