There's special logic around finding gl_FragData. It latches onto any
array with FRAG_RESULT_DATA0. However gl_SecondaryFragDataEXT[], added
by GL_EXT_blend_func_extended, fits those parameters as well. The real
frag data array should have index 0 though, so we can use that to
distinguish them.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96617
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 36ed1b695e)
The start instance is applied as an offset into the buffer directly,
ignoring the divisor, not as an instance id offset that respects the
divisor.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 1f4bca798d)
The generic version gets this right already, but this was using an
incorrect formula in SSE.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 5b0d64886d)
Ever since c2581a9375, the binding table layout has depended on the
pipeline. This means that whenever we change pipelines we also need to
re-emit binding tables for the new layout.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 35b53c8d47)
It's tiny and fully generic so there's really no reason for it to be in a
gen7-specific file.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 2bfe0c3374)
There's no good reason for recompiling it
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 9df4d6bb36)
SPIR-V treats it as an input but NIR wants the system value. This
shouldn't have been too much of a surprise given that we have to do the
same conversion in the GLSL IR to NIR pass.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 295e03c980)
Found by inspection.
Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit f9ddd52317)
As far as I can tell, a sequence of glBitmap followed by texture functions
that refer to a texture bound as the framebuffer is well within what should
be allowed.
Found by inspection.
Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit e7fff3cfe1)
In the unlikely case that a program uses glBitmap to render to a framebuffer
whose texture is bound in a compute shader.
Found by inspection.
Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit c542b7e43d)
Cherryview and Broxton don't support DW x DW multiplication. We have
piles of code to handle this, but apparently weren't retyping in the
immediate case.
For example,
tests/spec/arb_tessellation_shader/execution/dvec3-vs-tcs-tes
makes the simulator angry about instructions such as:
mul(8) r18<1>:D r10.0<8;8,1>:D 0x00000003:D
Just retype to W or UW. It should be safe on all platforms.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit cd89c834a8)
It used to be based on the framebuffer which isn't quite right.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 20e95a746d)
Just setting builder->exact isn't sufficient because that only applies to
instructions that are built with the builder but instructions created
manually and only inserted using the builder are left alone.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit bec07b7292)
This pass is similar to propagate_invariance in the GLSL compiler. The
real "output" of this pass is that any algebraic operations which are
eventually consumed by an invariant variable get marked as "exact".
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 202751fbb7)
While mathematically correct, these two optimizations result in an
expression with substantially lower precision than the original. For any
positive finite floating-point value, log2(x) is well-defined and finite.
More precisely, it is in the range [-150, 150] so any sum of logarithms
log2(a) + log2(b) is also well-defined and finite as long as a and b are
both positive and finite. However, if a and b are either very small or
very large, their product may get flushed to infinity or zero causing
log2(a * b) to be nowhere close to log2(a) + log2(b).
This imprecision was causing incorrect rendering in Talos Principal because
part of its HDR rendering process involves doing 8 texture operations,
clamping the result to [0, 65000], taking a dot-product with a constant,
and then taking the log2. This is done 6 or 8 times and summed to produce
the final result which is written to a red texture. In cases where you
have a region of the screen that is very dark, it can end up getting a
result value of -inf which is not what is intended.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96425
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 68e308d853)
The old calculation treated too many RBs as disabled.
Cc: 11.0 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit c95175581e)
The old limit, introduced in commit afa752d3f0,
was exceeded by 4 SE configurations which hit si_write_harvested_raster_configs.
Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 6c2e636982)
Apparently, these are deprecated. There's some AutoUpgrade feature which
is supposed to promote these to cmp/select, which apparently doesn't work
with jit code. It is possible it's not actually even meant to work (see
the bug filed against llvm which couldn't provide an answer neither)
but in any case this is meant to be only temporary unless the intrinsics
are really illegal. So, just use the fallback code (which should be cmp/select,
we're actually doing cmp/sext/trunc/select, but in any case llvm 3.9 manages
to optimize this back to pmin/pmax in the end).
This addresses https://llvm.org/bugs/show_bug.cgi?id=28176
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Tested-by: Aaron Watry <awatry@gmail.com>
(cherry picked from commit b0cf99165a)
This makes the check match up what we do on nv50 as well - there's no
point in switching over the push path if everything's in managed
buffers. This can happen when a shader uses a vertex without an enabled
array - we end up passing it a constant attribute.
This also has the effect of "fixing" some flickering in Talos. I have no
idea why. I've stared at the push logic forwards, backwards, and
sideways. By always forcing the push path (which is slow), the
flickering also goes away, but other rendering is still wrong
(specifically draw 383068 as identified in the bug). However by not
switching over to the push path, draw 383068 is correct.
Note that other flickering remains in Talos, like the red/green
walls/floors. This takes care of the shadow flickering though.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90513
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 154c0a42a2)
If we have a loop, instructions before the tex might be added as tex
uses, and those may in fact dominate all other uses of the tex results.
This however doesn't mean that we don't need a texbar after the tex.
Only check if uses dominate each other they are dominated by the tex.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96565
Fixes: 7752bbc44 (gk104/ir: simplify and fool-proof texbar algorithm)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 1804aa0b80)
From the Cherryview's PRM, Volume 7, 3D Media GPGPU Engine, Register Region
Restrictions, page 844:
"When source or destination datatype is 64b or operation is integer DWord
multiply, indirect addressing must not be used."
v2:
- Fix it for Broxton too.
v3:
- Simplify code by using subscript() and not creating a new num_components
variable (Kenneth).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit bdab572a86)
From the Cherryview PRM, Volume 7, 3D Media GPGPU Engine,
Register Region Restrictions:
"When source or destination is 64b (...), regioning in Align1
must follow these rules:
1. Source and destination horizontal stride must be aligned to
the same qword.
(...)"
v2:
- Fix it for Broxton too.
v3:
- Remove inst->regs_written change as it is not necessary (Ken)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 0177dbb6c2)
There are quite a few pipelines that desktop applications (including a
bunch of piglit test) can expect to have run but don't meet the GLES
requirements. Instead of failing validation, just emit a debug message.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96358
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit 6bec55a780)
Previously some callers of precision_qualifier_allowed would strip the
arrayness from the type and some would not. As a result, some places
would not notice that float[6], for example, needed a precision
qualifier.
Fixes the new piglit test no-default-float-array-precision.frag.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96358
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit 9c87282041)
We still need to recompile the passthrough shader when this value
changes, as it also affects the output vertex count. But otherwise,
we can eliminate recompiles on Gen8+.
We probably want to do this for Gen7 as well, but that requires
rewriting the input release code to use a loop, which is a trade-off
I'd need to consider in more detail.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit c319512e16)
i965 has no special hardware for this, so the best way to implement
this is to pass it in via a uniform.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 2b867264d2)
Fixes three GL44-CTS.tessellation_shader subtests:
- max_patch_vertices
- single.max_patch_vertices
- tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn
These use gl_PatchVerticesIn in the TES, but don't link against a
TCS (which would allow the linker to lower it to a constant). We had
no handling for the system value in the backend, so it would just
assert fail.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 1bc194cd64)
i965 has no special hardware for this, so we need to pass this value in
as a uniform (unless the TES is linked against a TCS, in which case the
linker can just replace this with a constant).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 0be2105137)
Found using -fsanitize=undefined.
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 6510e07345)
We've had some trouble in the past with copying integers around via
float pointers, as the C compiler sometimes uses x87 floating point
registers to load values on 32-bit systems. Passing the
gl_constant_value union should be safer.
To avoid churn, this patch creates a "GLfloat *value" variable so
existing uses can stay the same.
Not observed to fix anything, but I was in the area adding more integer
state vars, and thought it'd be wise.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 8b408972ff)
When a shader image view into a buffer texture can be written to, the buffer's
valid range must be updated, or subsequent transfers may incorrectly skip
synchronization.
This fixes a bug that was exposed in Xephyr by PBO acceleration for glReadPixels,
reported by Michel Dänzer.
Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit a64c7cd2ba)
Back-ported from commit a64c7cd2ba:
- include util/u_format.h
- code was extracted to si_set_shader_image in master, move it back
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
--
src/gallium/drivers/radeonsi/si_descriptors.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
This effectively limits registers to 32 and 64 for fermi and kepler when
1024 threads are used, but allows the full amount to be used with
smaller thread sizes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 1f895caba0)
The images struct is an uninitialized local variable on the stack. If the
callback returns 0, the struct might not have been updated and so should
be considered uninitialized. Currently the code ignores the return value,
which (depending on stack contents) might end up in reading a non-zero
value from images.image_mask and dereferencing further fields.
Another solution would be to initialize image_mask with 0, but checking
the return value seems more sensible and it is what Gallium is doing.
v2: fix typos in commit message,
fix indentation,
remove unnecessary parentheses and pointer dereference to keep line
length reasonable.
Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit e7ab358e81)
This replaces the current bash generator with a python based generator
using mako. It's quite fast and works with both python 2.7 and python
3.5, and should work with 3.3+ and maybe even 3.2.
It produces an almost identical file except for a minor layout changes,
and the addition of a "generated file, do not edit" warning.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 5a87bc7181)
This fixes a problem with the CE preamble and restoring only stuff in the
preamble when needed.
To illustrate suppose we have two graphics IB's 1 and 2, which are submitted in
that order. Furthermore suppose IB 1 does not use CE ram, but IB 2 does, and we
have a context switch at the start of IB 1, but not between IB 1 and IB 2.
The old code put the CE RAM loads in the preamble of IB 2. As the preamble of
IB 1 does not have the loads and the preamble of IB 2 does not get executed, the
old values are not load into CE RAM.
Fix this by always restoring the entire CE RAM.
v2: - Just load all descriptor set buffers instead of load and store the entire
CE RAM.
- Leave the ce_ram_dirty tracking in place for the non-preamble case.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Note: This commit differs from the one in master - 54f755fa0f
("radeonsi: Reinitialize all descriptors in CE preamble.")
Pulling DF uniforms from pull constant buffer generates messages like:
send(4) g12<1>DF g12<0,1,0>F
sampler ld SIMD4x2 Surface = 1 Sampler = 0 mlen 1 rlen 1
which produces GPU hangs in Cherryview/Braswell:
"For 64-bit Align1 operation or multiplication of dwords in CHV,
source horizontal stride must be aligned to qword."
This seems to be documented in the Cherryview PRM, Volume 7, Page 843:
"When source or destination datatype is 64b or operation is integer
DWord multiply, regioning in Align1 must follow these rules:
1. Source and Destination horizontal stride must be aligned to the
same qword."
We should set the destination type to UD, D, or F so that
the register stride checker doesn't notice. The destination type of
send messages is basically irrelevant anyway.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit a0ed8503b7)
Pulling DF inputs from the URB generates messages like:
send(8) g23<1>DF g1<8,8,1>UD
urb 3 SIMD8 read mlen 1 rlen 2 { align1 1Q };
which makes the simulator angry:
"For 64-bit Align1 operation or multiplication of dwords in CHV,
source horizontal stride must be aligned to qword."
This seems to be documented in the Cherryview PRM, Volume 7, Page 823:
"When source or destination datatype is 64b or operation is integer
DWord multiply, regioning in Align1 must follow these rules:
1. Source and Destination horizontal stride must be aligned to the
same qword."
Setting the source horizontal stride to QWord is insane, as it's the
message header containing 8 URB handles in a single 32-bit DWord.
Instead, we should whack the destination type to UD, D, or F so that
the register stride checker doesn't notice. The destination type of
send messages is basically irrelevant anyway.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit ed3ba651f6)
Cherryview/Broxton annoyingly have a minimum number of VS URB entries
of 34, which is not a multiple of 8. When the VS size is less than 9,
the number of VS entries has to be a multiple of 8.
Notably, BLORP programmed the minimum number of VS URB entries (34), with
a size of 1 (less than 9), which is invalid.
It seemed like this could be a problem in the regular URB code as well,
so I went ahead and updated that to be safe.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 9f37df06da)
Change MESA into Mesa in CL_PLATFORM_VERSION and CL_DEVICE_VERSION. For
both, always append git version suffix from git_sha1.h.
v5: move semicolon to same line as MESA_GIT_SHA1.
v4: drop #ifdef guards.
v3: add missing include.
v2: change CL_DEVICE_VERSION as well.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 4825264f75)
Squashed with commit
clover: Include generated sources in AM_CPPFLAGS
git_sha1.c is generated in $(top_builddir)/src.
Fixes out-of-tree builds since 4825264f75.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96516
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit fafe026dbe)
ISTR having suggested this during review of the recent FP64 changes to
the SIMD lowering pass, but it doesn't look like it was taken into
account in the end. Using the fs_reg::component_size helper instead
of this open-coded variant makes sure that the stride is taken into
account correctly. Fixes at least the following piglit tests with
spilling forced on (since otherwise regs_written would be calculated
incorrectly and the spilling code would be rather confused about how
much data needs to be spilled):
spec.arb_gpu_shader_fp64.shader_storage.layout-std140-fp64-shader
spec.arb_gpu_shader_fp64.shader_storage.layout-std140-fp64-mixed-shader
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit bd9f972651)