Commit graph

29185 commits

Author SHA1 Message Date
Charmaine Lee
ec138d6237 svga: allow copy_region if sample counts match
With this patch, we will allow blit with copy_region if the
source and destination textures have the same sample counts.

Fixes failures with piglit tests
 spec@arb_texture_float@multisample-formats 2 gl_arb_texture_float
 spec@arb_texture_rg@multisample-formats 2 gl_arb_texture_rg-float

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Charmaine Lee
a2d49c4b46 svga: set rendered-to flag after updating the texture using PredCopyRegion
This patch sets the rendered-to flag for the subresource after it is
updated using the PredCopyRegion command. This is to ensure that the GB surface
will be sync up properly before it will be directly mapped to.

Tested with MTT piglit, glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Charmaine Lee
59f14563a3 svga: add can_use_upload flag
This patch adds a flag "can_use_upload" to svga_texture structure
to avoid some checking of the upload availability at each transfer map time.

Tested with Lightsmark2008, Tropics, MTT glretrace, piglit.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Charmaine Lee
3dfb4243bd svga: fix texture upload path condition
As Thomas suggested, we'll first try to map directly to a GB surface.
If it is blocked, then we'll use texture upload buffer.
Also if a texture is already "rendered to", that is, the GB surface
is already out of sync, then we'll use the texture upload buffer
to avoid syncing the GB surface.

Tested with Lightsmark2008, Tropics, MTT piglit, glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Charmaine Lee
4750c4e543 svga: set rendered_to flag with texture uploaded using TransferFromBuffer command
This patch sets the rendered_to flag for the texture subresource that
is uploaded using the TransferFromBuffer command. This is to ensure that
the subresource will be read back or invalidated before it will be
directly mapped to. This makes sure that the content of the GB surface
will not be accidentally overwritten by the device at suspend/resume time.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Neha Bhende
03e1b7cacd svga: Add render_condition boolean flag in struct svga_context
set render_condition flag when driver performs conditional rendering.
Blit using DXPredCopyRegion command gets affected by conditional rendering so
We should check this flag while performing blit operation

Tested with piglit tests.

v2: As per Charmaine's comment, setting render_condition flag if svga_query is valid.
Tested with pigit tests.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-11-03 14:29:22 -06:00
Neha Bhende
2cff6f4512 svga: Allow DXPredCopyRegion for depth_and_stencil formats.
DXPredCopyRegion supports copy between src and dst for depth_and_stencil
formats if src and dst have same formats.

tested ith piglit

v2: As per Brian's comment, allow DXPredCopyRegion for depth+stencil buffers
if the blit mask is PIPE_MASK_ZS.

Tested with piglit tests and added new piglit test
arb_framebuffer_object-depth-stencil-blit to test this particular testcase.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Neha Bhende
9a9627a791 svga: fix memory leak in svga_clear_texture()
Piglit tests which uses arb_clear_texture extension, have memory leak issue.
pipe_surface created in svga_clear_texture() was not deleted which happens to be
the cause for memory leak.

tested all arb_clear_texture-* piglit tests with valgrid.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-11-03 14:29:22 -06:00
Thomas Hellstrom
d787ce7288 svga: Implement the pipe clear_render_target functionality v2
v2: Accounted for the fact that svga_try_clear_render_target also
honors conditional rendering.

Testing done: Excercised all functions in a separate feature branch. Forced
emission of conditional rendering commands when necessary.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Charmaine Lee
76f5f76468 svga: add SVGA_3D_CMD_INVALIDATE_GB_SURFACE support
This command will be used in a subsequent patch to invalidate a surface.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Nicolai Hähnle
27bd9c0f0a pipe-loader: add libamd_common for radeonsi
This fixes a build regression of commit 7115e56c21.
Sorry for the breakage, this second location for link dependencies escaped
my build tests.

Bugzilla: https://patchwork.freedesktop.org/patch/119816/
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-11-03 16:54:55 +01:00
Nicolai Hähnle
908f92ad1f radeonsi: generate GS prolog to (partially) fix triangle strip adjacency rotation
Fixes GL45-CTS.geometry_shader.adjacency.adjacency_indiced_triangle_strip and
others.

This leaves the case of triangle strips with adjacency and primitive restarts
open. It seems that the only thing that cares about that is a piglit test.
Fixing this efficiently would be really involved, and I don't want to use the
hammer of degrading to software handling of indices because there may well
be software that uses this draw mode (without caring about the precise
rotation of triangles).

v2:
- skip the GS prolog entirely if workaround is not needed
- only check for TES (TES is always non-null when tessellation is used)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:11:24 +01:00
Nicolai Hähnle
ffe4e829b0 radeonsi: remove si_shader_context::is_gs_copy_shader
It has become redundant.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:53 +01:00
Nicolai Hähnle
3b2516721b radeonsi: make the GS copy shader owned by the GS selector
The copy shader only depends on the selector. This change avoids creating
separate code paths for monolithic vs. non-monolithic geometry shaders.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:50 +01:00
Nicolai Hähnle
9c6f7d66dc radeonsi: si_shader_vs only depends on the GS selector
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:48 +01:00
Nicolai Hähnle
693435d846 radeonsi: si_vgt_gs_mode only depends on the selector
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:45 +01:00
Nicolai Hähnle
2e1fb7e7fc radeonsi: make si_generate_gs_copy_shader usable as a standalone function
It really only depends on the shader selector.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:42 +01:00
Nicolai Hähnle
ba5de0d034 radeonsi: unify the si_compile_* functions for prologs and epilogs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:37 +01:00
Nicolai Hähnle
aa9583b0da radeonsi: get rid of no_{prolog,epilog}
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:34 +01:00
Nicolai Hähnle
75503b1904 radeonsi: get rid of si_llvm_emit_fs_epilogue
It is no longer used.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:31 +01:00
Nicolai Hähnle
611510038a radeonsi: get rid of get_interp_param
Replace by a simple LLVMGetParam, since ctx->no_prolog is always false.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:29 +01:00
Nicolai Hähnle
3f4439b6ba radeonsi: get rid of select_interp_param
The condition !ctx->no_prolog is now always true.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:26 +01:00
Nicolai Hähnle
858ac2228f radeonsi: use TCS epilog for monolithic shaders
For fixed function TCS, we keep the copying of VS outputs to TES inputs inside
the main function; the call to si_copy_tcs_inputs is moved accordingly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:23 +01:00
Nicolai Hähnle
3f1be54e53 radeonsi: extract si_build_tcs_epilog_function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:20 +01:00
Nicolai Hähnle
be6e31c6a0 radeonsi: use VS epilog for monolithic TES
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:17 +01:00
Nicolai Hähnle
06dcb2d2a9 radeonsi: use VS prolog and epilog for monolithic shaders
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:14 +01:00
Nicolai Hähnle
f9daa2f470 radeonsi: extract si_build_vs_{prolog,epilog}_function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:12 +01:00
Nicolai Hähnle
6f37e992a3 radeonsi: use PS prolog for monolithic shaders
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:09 +01:00
Nicolai Hähnle
15dd332e6a radeonsi: set num_input_vgprs for fragment shaders in create_function
So that the prolog generated for monolithic fragment shaders will have the
right signature.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:05 +01:00
Nicolai Hähnle
fec7ced211 radeonsi: extract si_build_ps_prolog_function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:02 +01:00
Nicolai Hähnle
7115e56c21 radeonsi: use PS epilog for monolithic shaders
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:00 +01:00
Nicolai Hähnle
bf86c56594 radeonsi: extract si_build_ps_epilog_function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:57 +01:00
Nicolai Hähnle
0b9bba7f6c radeonsi: pass the function name to si_llvm_create_func
We will use multiple functions in one module, so they should have
different names.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:54 +01:00
Nicolai Hähnle
96d60dd9ee radeonsi: split is_monolithic into no_prolog and no_epilog
This helps to achieve a gradual transition towards building monolithic shaders
via inlining.

no_prolog and no_epilog will be removed by the end of the series,
separate_prolog remains in use to control the PS input mapping.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:50 +01:00
Nicolai Hähnle
8db9d915cd radeonsi: free data structures when shader compiles fail
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:47 +01:00
Nicolai Hähnle
4c1504af6a radeonsi: move main TGSI translation into its own function
The idea is that adding prolog and epilog code will be pulled out into the
caller.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:44 +01:00
Nicolai Hähnle
23dfb688ba radeonsi: add always-inline pass to si_llvm_finalize_module
Change the pass manager as well, since this is a module-level pass. No
noticeable run-time difference on shader-db.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:42 +01:00
Nicolai Hähnle
4ada1dabc4 radeonsi: fix signature of export intrinsic in VS epilog
The incompatible signature becomes an issue when the VS epilog gets merged
with the main vertex shader at the IR level.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:33 +01:00
Nicolai Hähnle
899b2f24a4 radeonsi: link against amd_common
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:30 +01:00
Samuel Pitoiset
548b5fee6b nv50,nvc0: stop limiting the number of active queries to 1
This limitation was initially here because AMD_performance_monitor
doesn't allow to expose the real number of hardware counters. But
this actually really annoying when profiling with qapitrace.

Anyways, performance counters are mostly for developers and
failures are expected if you try to monitor more queries than
supported.

This breaks amd_performance_monitor_measure but it's expected.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-11-02 23:42:09 +01:00
Samuel Pitoiset
b6137f226c nvc0: add new warp_nonpred_execution_efficiency metric on SM35
Event not_predicated_off_thread_inst_executed is SM35+.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-02 23:35:49 +01:00
Samuel Pitoiset
98a382d013 nvc0: add missing metric-issue_slot on SM35
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-02 23:35:46 +01:00
Samuel Pitoiset
c32d7175aa nvc0: do not expose metric-inst_issued twice on SM35
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-02 23:35:44 +01:00
Samuel Pitoiset
524703da58 nvc0: add new warp_execution_efficiency metric on SM30+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-02 23:35:42 +01:00
Samuel Pitoiset
51fe48660a nvc0: respect 80-chars for perf metrics descriptions
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-02 23:35:39 +01:00
Samuel Pitoiset
b58d85bac8 nvc0: sort performance metrics alphabetically
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-02 23:35:28 +01:00
Samuel Pitoiset
1d75d681d3 nv50: add missing draw_calls_indexed driver stat
Spotted when glancing at the VBO push code.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-02 21:11:57 +01:00
Nicolai Hähnle
5aef14932a radeonsi: fix BFE/BFI lowering for GLSL semantics
Fixes spec/arb_gpu_shader5/execution/built-in-functions/*-bitfield{Extract,Insert}

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-02 12:30:11 +01:00
Nicolai Hähnle
6526977306 tgsi: align the definition of BFI & [UI]BFE with GLSL
As previously written, these opcodes use the SM5 semantics which is
incompatible with GLSL when bits == 0, offset == 32.

At some point we may want to add BFI_SM5 etc. opcodes, but all users
currently either want (and expect!) the GLSL semantics or don't care.

Bitfield inserts are generated by the GLSL lower_instructions and
lower_packing_builtins passes with constant bits and offset arguments,
so any workaround code that drivers may have to emit to follow GLSL
semantics should be optimized away easily for those uses.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-02 12:30:07 +01:00
Marek Olšák
7786f8c635 gallium/radeon: add enum radeon_micro_mode
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-01 22:33:13 +01:00