According to the Vulkan spec, VkPipelineInputAssemblyStateCreateInfo's
primitiveRestartEnable flag should only apply to indexed draws, however
it was being enabled regardless of the type of draw. This could cause
problems for non-indexed draws with >=65535 vertices if the previous
indexed draw used 16-bit indices.
Fixes corruption of the credits text in Mad Max.
v2: Reset primitive restart state after executing a secondary command
buffer.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Makes more sense when we hash the layout for the pipeline cache.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Context: _mesa_add_parameter is sometimes[0] called with a
NULL name as a mean of an unnamed parameter.
Allowing NULL pointer as a name means that it must be NULL checked
each access. So far it isn't always[1] true.
Parameter name is only used for debug purpose (printf) and
to lookup the index/location of the program by the application.
Conclusion, there is no valid reason to use a NULL pointer instead of
an empty string. So it was decided to use an empty string which avoid all
issues related to NULL pointer
[0]: texture gather offsets glsl opcode and st_init_atifs_prog
[1]: at least shader cache, st_nir_lookup_parameter_index and some printfs
Issue found by piglit 'texturegatheroffsets' tests on Nouveau
v4: new patch based on Nicolai/Timothy/ilia discussion
Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
unsigned long is a terrible type for a bitfield - if you need fewer
than 32 bits, it wastes 4 bytes. If you need more, things break on
32-bit builds. Just use unsigned.
Even that's a bit ridiculous as we only have one flag today.
Still, it's at least somewhat better.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The drm_i915_gem_create ioctl structure uses a __u64 for the size,
so we should probably use uint64_t to match. In theory, we could
probably have a BO larger than 4GB, using a 48-bit PPGTT - it just
wouldn't be mappable in the CPU's 32-bit address space.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Theoretically, with a 48-bit address space, we could have buffers
with an alignment of >= 4GB. It's a bit silly, but the exec_object
structs (drm_i915_gem_exec_object2) use a __u64 for this, so we may
as well use the same type as the kernel API.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
struct drm_i915_gem_set_tiling's stride field is a __u32.
intel_mipmap_tree::stride is a uint32_t. Using unsigned long just
doesn't make sense. Switching also lets us drop many pointless
locals that only existed to deal with the type mismatch.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The ioctl structs contain __u64 offset and size fields, so make them
uint64_t rather than unsigned long.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
For some reason we passed tiling by pointer, through several layers,
even though the functions only read the initial value, and never
actually change it. We even had a do-while loop that executed until
the tiling mode matched - except it always did, so it only ran once.
We then had bogus error handling in case it changed the tiling mode
to something nonsensical...which it never did.
Drop all this nonsense.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
These calls look like leftover from fallback texture support first
being added to the st in 8f6d9e12be and then later being added
to core mesa in 00e203fe17.
The piglit test fp-incomplete-tex continues to work with this
change.
Reviewed-by: Brian Paul <brianp@vmware.com>
The individual branches of an if/else/endif construct will be executed
some unknown number of times between 0 and 1 relative to the parent
block. Use some factor in between as weight while approximating the
cost of spill/fill instructions within a conditional if-else branch.
This favors spilling registers used within conditional branches which
are likely to be executed less frequently than registers used at the
top level.
Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on
my SKL GT4e. Should have a comparable effect on other platforms. No
significant regressions.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We check these bitfields when computing the Haswell max GL version.
We need to set them ahead of time, or they won't exist, and all our
checks will fail. That sets the max core profile GL version to 4.2.
This introduces the bizarre situation where asking for a GL context
with version 4.3+ fails, but asking for a GL core profile context
with version <= 4.2 actually promotes you a 4.5 context.
GLX_MESA_query_renderer also reported the bogus 4.2 value.
Now it shows 4.5.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reported-and-tested-by: Rafael Ristovski <rafael.ristovski@gmail.com>
Autodisable seems to cause missed rendering in some cases, but
otherwise TS seems to work properly.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Fixes a performance issue with imported winsys buffers as those are
marked with binding sampler view.
This might require a TS flush on single pipe chips that directly
sample from the rendered buffer, but otherwise seems to work fine.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
The TS surface gets cleared by a tiled RS fill. If the chip has
more than 1 pixel pipe the size of the TS surface needs to be
aligned so that each pipe address matches a tile start, otherwise
the RS will hang.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
The TS is only valid after it has been initialized by a fast
clear, so it should not be taken into account when blitting
resources that haven't been cleared. Also the blit itself
invalidates the destination TS, as it's not updated and will
retain data from the previous rendering after the blit.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
For both consistency and new bindless sampler types.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
For both consistency and new bindless sampler types.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
For both consistency and new bindless sampler types.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The devil is in the shader again, otherwise this is
fairly straightforward.
The CTS contains no pipeline statistics copy to buffer
testcases, so I did a basic smoketest.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
For using them with both occlusion and pipeline statistics queries.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The buffer sizes are specified just a few lines earlier, so don't
repeat ourselves.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Use the new occlusion query copy shader.
We don't use the shader for the waiting as a polling loop ineracts badly
with having caching enabled. I noticed on my GPU (Tonga) that the values
are written out in order, so I just use a WAIT_REG_MEM on the last value.
If it turns out other chips don't do that we may need to look a bit more
into this. Having 8 WAIT_REG_MEM packets per query doesn't sound ideal.
This also restricts the availability word in the pool to timestamp queries
only, as occlusion queries don't use it, and pipeline statistic queries
likely won't either.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Adds a shader for writing occlusion query results to a buffer, as the
CP packet isn't support on SI or secondary buffers, and doesn't handle
the availability bit (or partial results) nor truncation to 32-bit.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>