This commits implements code generation of the geometry shaders in
the SOA paths. All the code is there but bugs are likely present.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Allows executing gs on up to 4 primitives at a time. Will also be
required by the llvm code because there we definitely don't want
to flush with just a single primitive.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
To be able to add llvm paths later on we need to have some common
interface for them.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The member was never used and we'll need to handle it differently
because gs will also need samplers/textures setup.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
A few tests were missing this crucial property.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Same as on r600, trace cs execution by writting cs offset after each
states, this allow to pin point lockup inside command stream and
narrow down the scope of lockup investigation.
v2: Use WRITE_DATA packet instead of WRITE_MEM
v3: Remove useless nop packet
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Fixes various test fallout from 90b5a2425a on Pineview, which claims to
support ARB_internalformat_query but doesn't actually provide the
driverfunc.
That driver is still broken [GetInternalformativ will still segfault!]
but it was silly to be going through the sample count logic in the
nonmultisampling case at all.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The include isn't needed and the file has moved with LLVM master.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Even when we have arrays it is possible for simplify_cmp
to work on temps, just not on arrays.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=62696
Signed-off-by: Christian König <christian.koenig@amd.com>
The pipe query interface is reused. The list of available queries can be
obtained using pipe_screen::get_driver_query_info.
Reviewed-by: Brian Paul <brianp@vmware.com>
Virtual address is used for PIPE_QUERY_SO* queries in
r600_emit_query_begin, but not in r600_emit_query_end.
This will trigger a GPU fault when one of those queries is
made and virtual address is enabled.
Note: this is a candidate for the 9.1 branch
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Optimize out parts of the render target that are scissored out by taking
into account maximal scissor bounds in fd_gmem_render_tiles().
This is a big win on things like gnome-shell which frequently do partial
screen updates.
Signed-off-by: Rob Clark <robdclark@gmail.com>
target-specific variables are undefined when used as pre-requisites.
instead, use secondary-expansion.
I noticed this when building the patch:
i965: Add a driconf option to disable flush throttling
Signed-off-by: Adrian Marius Negreanu <adrian.m.negreanu@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Since half of ir_validate uses asserts() (the other using printf() then
abort()), there's not much use to calling it in a release build. Cuts
6.3% of the startup time of TF2.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes assign instead of compare defects reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This patch changes the arrays in brw_vue_map (which only ever contain
values from -1 to 58) from ints to signed chars. This reduces the
size of the struct from 488 bytes to 136 bytes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: fix STATIC_ASSERT to use 127 instead of 128.
Reviewed-by: Eric Anholt <eric@anholt.net>
With the introduction of geometry shaders, fragment inputs will no
longer come exclusively from the vertex shader; sometimes they come
from the geometry shader. So the name "vp_outputs_written" will
become a misnomer. This patch renames vp_outputs_written to
input_slots_valid, to reflect the true meaning of the bitfield from
the fragment shader's point of view: it indicates which of the
possible input slots contain valid data that was written by the
previous shader stage.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch modifies post-GS pipeline stages (transform feedback, clip,
sf, fs) to refer to the VUE map through brw->vue_map_geom_out rather
than brw->vs.prog_data->vue_map. This ensures that when geometry
shader support is added, these pipeline stages will consult the
geometry shader output VUE map when appropriate, rather than the
vertex shader output VUE map.
v2: Fixed some stale "CACHE_NEW_VS_PROG" comments.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently, the GPU pipeline has one active VUE map in effect at any
given time--the one representing the layout of vertex data coming from
the vertex shader. However, when geometry shaders are added, they
will have their own independent VUE map. Later pipeline stages (clip,
sf, fs) will need to consult the geometry shader VUE map if a geometry
shader is in use, and the vertex shader VUE map otherwise.
This patch adds a new field to brw_context, vue_map_geom_out, which
contains the VUE map that should be used by later pipeline stages. It
also adds a new state flag, BRW_NEW_VUE_MAP_GEOM_OUT, which is
signalled whenever the contents of the VUE map changes.
Since we don't support geometry shaders yet, vue_map_geom_out is
currently set only by the brw_vs_prog state atom.
v2: Don't set vue_map_geom_out in do_vs_prog--that's redundant and
possibly problematic for precompiles. Only set it in
brw_upload_vs_prog. Also, make a copy instead of using a
pointer--this makes it possible to detect when the VUE map hasn't
changed, so we can avoid redundant state uploads.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Future patches will allow for there to be separate VUE maps when both
a geometry shader and a vertex shader are in use. When this happens,
we will want to have correspondingly separate outputs_written
bitfields. Moving outputs_written into the VUE map will make this
easy.
For consistency with the terminology used in the VUE map, the bitfield
is renamed to "slots_valid" in the process.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen7 adds mask bits to the message header for a URB write which allow
the write to apply only to certain channels. We don't use this
functionality, so to ensure that the entire write always occurs, we
emit an OR instruction to set the mask bits.
With the advent of geometry shaders, URB writes won't just happen at
the end of a thread; they will happen in mid-thread too. Thus, we can
no longer rely on channel 0 being enabled, so we need to emit the OR
instruction in WE_all mode to ensure that it is executed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The new name clarifies that it represents *one more* than the maximum
possible brw_varying_slot value.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch removes the terminology "vert_result" from the i965 driver,
replacing it with "varying". The old terminology, "vert_result", was
confusing because (a) it referred to the enum gl_vert_result, which no
longer exists (it was replaced with gl_varying_slot), and (b) it
implied a vertex output, but with the advent of geometry shaders, it
could be either a vertex or a geometry output, depending what shaders
are in use. The generic term "varying" is less confusing.
No functional change.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Whitespace fixes.
Bump MAX_DEPTH_TEXTURE_SAMPLES to match what GetInternalformativ is
claiming. Since that limit is what is actually enforced now, this
doesn't actually change anything except the queried value.
There's still no piglits verifying that multisample depth textures work,
but this works in the Unigine demos.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Extends _mesa_check_sample_count() to properly support the
TEXTURE_2D_MULTISAMPLE and TEXTURE_2D_MULTISAMPLE_ARRAY targets, which
have subtly different limits than renderbuffers.
This resolves the remaining TODO in the implementation of
TexImage*DMultisample.
V2: - Don't introduce spurious block.
- Do this in multisample.c instead.
- Fix typo in error message.
- Inline spec quotes
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>