This accomodates a streaming pattern where the discard flag is set when the
application wraps back to the beginning of the buffer.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It makes sense to re-use pipe->invalidate_resource for the purpose of
glInvalidateBufferData, but this function is already implemented in vc4
where it doesn't have the expected behavior. So add a capability flag
to indicate that the driver supports the expected behavior.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Change the check to be in line with what the quoted spec fragment says.
I have sent out a piglit test for this as well.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The internal Mesa format used for a texture might not match the one
requested in the internalFormat when the texture was created, for
example if the driver is internally remapping RGB textures to RGBA.
Otherwise it can cause false positives for completeness if one mipmap
image is created as RGBA and the other as RGB because they would both
have an RGBA Mesa format. If we check the InternalFormat instead then
we are directly checking the API usage which I think better matches
the intention of the check.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93700
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
I spotted this while looking for what needs updating in future platforms.
I'm too lazy to go through the git logs, but it was probably missed by Jason
when all the brw refactoring happened.
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
SPIR-V makes a distinction between "modulus" and "remainder" for both
floating-point and signed integer variants. The difference is primarily
one of which source they take their sign from. The "remainder" opcode for
integers is equivalent to the C/C++ "%" operation while the "modulus"
opcode is more mathematically correct (at least for an unsigned divisor).
This commit adds corresponding opcodes to NIR.
Intel/AMD's hardware instructions do not handle arguments of 32.
Constant evaluation should not produce a result different from the
hardware instruction.
The s/1ull/1u/ change is intentional: previously we wanted defined
behavior for the "1 << 32" case, but we're making this case undefined so
we can make it 1u and save ourselves a 64-bit operation.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Shifting into the sign bit is undefined, as is shifting by 32.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If a Python codegen script failed, it would write a zero-byte file,
which on subsequent invocations of make would trick it into thinking the
file was appropriately generated.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We would like to be able to combine
result.x = bitfieldExtract(src0.x, src1.x, src2.x);
result.y = bitfieldExtract(src0.y, src1.y, src2.y);
result.z = bitfieldExtract(src0.z, src1.z, src2.z);
result.w = bitfieldExtract(src0.w, src1.w, src2.w);
into a single ivec4 bitfieldInsert operation. This should be possible
with most drivers.
This patch changes the offset and bits parameters from scalar ints
to ivecN or uvecN. The type of all three operands will be the same,
for simplicity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
We would like to be able to combine
result.x = bitfieldInsert(src0.x, src1.x, src2.x, src3.x);
result.y = bitfieldInsert(src0.y, src1.y, src2.y, src3.y);
result.z = bitfieldInsert(src0.z, src1.z, src2.z, src3.z);
result.w = bitfieldInsert(src0.w, src1.w, src2.w, src3.w);
into a single ivec4 bitfieldInsert operation. This should be possible
with most drivers.
This patch changes the offset and bits parameters from scalar ints
to ivecN or uvecN. The type of all four operands will be the same,
for simplicity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
TGSI doesn't use these - it just translates ir_quadop_bitfield_insert
directly. NIR can handle ir_quadop_bitfield_insert as well.
These opcodes were only used for i965, and with Jason's recent patches,
we can do this lowering in NIR (which also gains us SPIR-V handling).
So there's not much point to retaining this GLSL IR lowering code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
NIR's bfm, like Intel/AMD's hardware instructions and GLSL IR's
ir_binop_bfm takes <bits> as src0 and <offset> as src1.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
A shader in Unreal4 uses the result of divide by zero in its color
output, producing NaN and triggering this assertion since NaN is not
equal to itself.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93560
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
I added this code right at the end, and got it wrong.
Only used by the WGL_ARB_render_texture code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The RGBX surface formats aren't renderable so we internally remap them
to RGBA when rendering. They are retained as RGBX when used as
textures. However since the previous patch fast clears are disabled
for surfaces that use a different format for rendering than for
texturing. To avoid this situation we can just pretend not to support
RGBX formats at all. This will cause the upper layers of mesa to pick
an RGBA format internally instead. This should be safe because we
always override the alpha component to 1.0 for RGBX in the texture
swizzle anyway. We could also do this for all gens except that it's a
bit more difficult when the hardware doesn't support texture
swizzling. Gens using the blorp have further problems because that
doesn't implement this swizzle override.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixes regression on SSO tests that have both non-compute and
compute programs in a program pipeline.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93532
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Commit 8926dc8 added a check where we add packed varyings of output
stage only when we have multiple stages, however duplicates are already
handled by changes in commit 0508d950 and we want to add outputs also in
case where we have only one stage.
Fixes regression caused by 8926dc8 for following test:
ES31-CTS.program_interface_query.separate-programs-vertex
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The trick here is to recognize that in the c + n * dcdx calculations,
not only can the lower FIXED_ORDER bits not change (as the dcdx values
have those all zero) but that this means the sign bit of the calculations
cannot be different as well, that is
sign(c + n*dcdx) == sign((c >> FIXED_ORDER) + n*(dcdx >> FIXED_ORDER)).
That shaves off more than enough bits to never require 64bit masks.
A shifted plane c value could still easily exceed 32 bits, however since we
throw out planes which are trivial accept even before binning (and similarly
don't even get to see tris for which there was a trivial reject plane)) this
is never a problem.
The idea isnt't all that revolutionary, in fact something similar was tried
ages ago (9773722c2b) back when the values were
only 32 bit anyway. I believe now it didn't quite work then because the
adjustment needed for testing trivial reject / partial masks wasn't handled
correctly.
This still keeps the separate 32/64 bit paths for now, as the 32 bit one still
looks minimally simpler (and also because if we'd pass in dcdx/dcdy/eo unscaled
from setup which would be a good reason to ditch the 32 bit path, we'd need to
change the special-purpose rasterization functions for small tris).
This passes piglit triangle-rasterization (-fbo -auto -max_size
-subpixelbits 8) and triangle-rasterization-overdraw (with some hacks
to make it work correctly with large sizes) easily (full piglit as
well of course, but most tests wouldn't use triangles large enough to
be affected, that is tris with a bounding box over 128x128).
The profiler says indeed time spent in rast_tri functions is reduced
substantially, BUT of course only if the tris are large. I measured a 3%
improvement in mesa gloss demo when supersized to twice the screen size...
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Otherwise some planes we get in rasterization have subpixel precision, others
not. Doesn't matter so far, but will soon. (OpenGL actually supports viewports
with subpixel accuracy, so could even do bounding box calcs with that).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is quite a few less instructions, albeit still do the 2 64bit muls
with scalar c code (they'd need way more shuffles, plus fixup for the signed
mul so it totally doesn't seem worth it - x86 can do 32x32->64bit signed
scalar muls natively just fine after all (even on 32bit).
(This still doesn't have a very measurable performance impact in reality,
although profiler seems to say time spent in setup indeed has gone down by
10% or so overall. Maybe good for a 3% or so improvement in openarena.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Discovered by accident, valgrind was complaining (could have possibly caused
us to create redundant geometry shader variants).
v2: convinced by Brian and Jose, just use memset for both gs and vs keys,
just as easy and less error prone.
.length() on an unsized SSBO variable doesn't actually read any data
from the SSBO, and is allowed on variables marked 'writeonly'.
Fixes compute shader compilation in Shadow of Mordor.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>