Commit graph

58533 commits

Author SHA1 Message Date
Paul Berry
3a83b20dcc i965/fs: Stop wasting input attribute space on gl_FragCoord and gl_FrontFacing.
Previously, if a fragment shader accessed gl_FragCoord or
gl_FrontFacing, we would assign them their own slots in the fragment
shader input attribute array, using up space that could be made
available to real varyings.  This was not strictly necessary (since
these values are not true varyings, and are instead computed from
other data available in the FS payload).  But we had to do it anyway
because the SF/SBE setup code assumed that every 1 bit in the
gl_program::InputsRead bitfield corresponded to a genuine varying
variable.

Now that the SF/SBE code consults brw_wm_prog_data and only sets up
the attributes that the fragment shader actually needs, we don't have
to do this anymore.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-09-16 12:53:32 -07:00
Paul Berry
0af1252ae4 i965/sf: Consult brw_wm_prog_data when setting up SF/SBE state.
Previously, the SF/SBE setup code delivered varying inputs to the FS
in the order in which they appear in the gl_program::InputsRead
bitfield, since that's what the FS expects.

When we add support for more than 64 varying components, this will no
longer always be the case, because the Gen6+ SF/SBE stage is only
capable of performing arbitrary reorderings of 16 varying slots.  So,
when there are more than 16 vec4's worth of varying inputs, the FS
will have to adjust the order its input varyings in order to partially
match the order of outputs from the geometry or vertex shader.

To allow extra flexibility in the ordering of FS varyings, this patch
causes the SF/SBE to deliver varying inputs to the FS in exactly the
order that the FS requests, by consulting brw_wm_prog_data::urb_setup
and brw_wm_prog_data::num_varying_inputs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-09-16 12:53:29 -07:00
Paul Berry
af84bbd2ca i965/sf: Consolidate common code for setting up gen6-7 attribute overrides.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-09-16 12:53:25 -07:00
Paul Berry
d5b4095356 i965/sf: Use BRW_SF_URB_ENTRY_READ_OFFSET rather than hardcoded values.
We always program the SF unit to start reading the vertex URB entry at
offset 1.  In upcoming patches, we'll be adding FS code that relies on
this.  So consistently use the constant BRW_SF_URB_ENTRY_READ_OFFSET
rather than hardcoding a 1.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-09-16 12:53:21 -07:00
Paul Berry
8c2b9bd1df i965/fs: Consult brw_wm_prog_data::num_varying_inputs when setting up WM state.
Previously, we assumed that the number of varying inputs consumed by
the fragment shader was equal to the number of bits set in
gl_program::InputsRead.  However, we'll soon be making two changes
that will cause that not to be true:

- We'll stop wasting varying input space for gl_FragCoord and
  gl_FrontFacing, which aren't varyings.

- For fragment shaders that have more than 16 varying inputs, we'll
  adjust the layout of the inputs to account for the fact that the
  SF/SBE pipeline stage can't reorder inputs beyond the first 16; if
  there are GS outputs that the FS doens't use (or vice versa) this
  may cause the number of FS varying inputs to change.

So, instead of trying to guess the number of FS inputs from
gl_program::InputsRead, simply read it from
brw_wm_prog_data:num_varying_inputs, which is guaranteed to be correct
since it's populated by fs_visitor::calculate_urb_setup().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-09-16 12:53:18 -07:00
Paul Berry
8c69eaba1a i965/fs: Change brw_wm_prog_data::urb_read_length to num_varying_inputs.
On gen4-5, the FS stage reads varying inputs from URB entries that
were output by the SF thread, where each register stores the
interpolation setup for two components of a vec4, therefore the FS
urb_read_length is twice the number of FS input varyings.  On gen6+,
varying inputs are directly deposited in the FS payload by the SF/SBE
fixed function logic, so urb_read_length is irrelevant.

However, in future patches, it will be nice to be able to consult
brw_wm_prog_data to determine how many varying inputs the FS expects
(rather than inferring it from gl_program::InputsRead).  So instead of
storing urb_read_length, we simply store num_varying_inputs in
brw_wm_prog_data.  On gen4-5, we multiply this by 2 to recover the URB
read length.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-09-16 12:53:14 -07:00
Paul Berry
58f01bd17d i965/fs: Expose "urb_setup" as part of brw_wm_prog_data.
At the moment, for Gen6+, the FS assumes that all varying inputs are
delivered to it in the order in which they appear in the
gl_program::InputsRead bitfield, and the SF/SBE setup code ensures
that they are delivered in this order.

When we add support for more than 64 varying components, this will no
longer always be possible, because the Gen6+ SF/SBE stage is only
capable of performing arbitrary reorderings of 16 varying slots.

To allow extra flexibility in the ordering of FS varyings, this patch
causes the FS to advertise exactly what ordering it expects.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-09-16 12:53:05 -07:00
Chia-I Wu
4a6939edae ilo: make ilo_bind_sampler_states return void
So that it can be hooked up pipe_context::bind_sampler_states that is
currently living on another branch.
2013-09-17 00:20:50 +08:00
Kenneth Graunke
120d100627 glsl/tests: Update .gitignore for new unit test.
I rarely run 'git status', so I failed to notice this was missing.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2013-09-16 08:26:09 -07:00
Kenneth Graunke
1da3ff1b1c glsl/tests: Add a test for properties of sampler types.
For each sampler type, this tests that:
- The base type is GLSL_TYPE_SAMPLER.
- The dimensionality is set correctly.
- The returned data type is correct.
- The sampler_array and sampler_shadow flags are set correctly.
- sampler_coordinate_components() returns the correct value.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
2013-09-15 21:48:20 -07:00
Dave Airlie
2f508f244e st/mesa: don't dereference stObj->pt if NULL
It seems a user app can get us into this state, I trigger the fail
running fbo-maxsize inside virgl, it fails to create the backing
storage for the texture object, but then segfaults here when it
should fail the completeness test.

Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-09-16 08:33:02 +10:00
Dave Airlie
bbe3d6dc29 nouveau: fix regression since float comparison instructions (v2)
Fix the return type and allow src and dst types for comparison
to be separate, this at least fixes the two test cases I've written.

v2: drop the u32->s32 change

Acked-by: Christoph Bumiller <christoph.bumiller@speed.at>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-09-16 08:32:42 +10:00
Rico Schüller
6f52295129 vdpau/decode: Check max width and max height.
Reviewed-by: Christian König <christian.koenig@amd.com>
2013-09-15 16:18:08 +02:00
Rob Clark
ffa3244534 freedreno: PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE
When the old contents do not need to be preserved, it is faster to
create a new backing bo rather than stall.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-09-14 13:31:58 -04:00
Rob Clark
d7be322410 freedreno/a3xx: fix VFD_INDEX_MAX overflow
max_index may be 0xffffffff.  The hardware does not need 1 + max_index
(although it does not hurt unless max_index wraps around to zero).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-09-14 13:31:58 -04:00
Rob Clark
c756a3ef70 freedreno: add debug option to disable GMEM bypass
Useful for debugging.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-09-14 13:31:58 -04:00
Rob Clark
cdec879e38 freedreno/a3xx: handle front_ccw
Used by supertuxkart.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-09-14 13:31:58 -04:00
Rob Clark
cda75253f7 freedreno/a3xx: stencil fixes
For mem->gmem we don't sample depth/stencil as it's native type.  So we
need to setup the swizzle state for the sampler based on the format used
for sampling.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-09-14 13:31:58 -04:00
Rob Clark
65ae4392ce freedreno/a3xx: alpha-test
Needed by some games, like etuxracer and supertuxkart which use alpha
test rather than blending, to handle texture transparency.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-09-14 13:31:58 -04:00
Rob Clark
dbf041e61f freedreno/a3xx/compiler: implement SUB
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-09-14 13:31:58 -04:00
Rob Clark
1a42d4ee34 freedreno/a3xx: use INDIRECT state load for shaders
With a debug option to force DIRECT (mainly to make it easier for
capturing cmdstream dumps).  Using INDIRECT for large shaders at least
makes a noticable reduction in CPU load, which helps for CPU limited
games.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-09-14 13:31:58 -04:00
Rob Clark
6e9c386d16 freedreno: avoid stalling at ringbuffer wraparound
Because of how the tiling works, we can't really flush at arbitrary
points very easily.  So wraparound is handled by resetting to top of
ringbuffer.  Previously this would stall until current rendering is
complete.  Instead cycle through multiple ringbuffers to avoid a stall.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-09-14 13:31:58 -04:00
Rob Clark
ca505303a7 freedreno: emit markers to scratch registers
Emit markers by writing to scratch registers in order to "triangulate"
gpu lockup position from post-mortem register dump.  By comparing
register values in post-mortem dump to command-stream, it is possible to
narrow down which DRAW_INDX caused the lockup.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-09-14 13:31:58 -04:00
Rob Clark
1e6d290f21 freedreno: split out WFI helper
Mostly just to give an easy debug/instrumentation point.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-09-14 13:31:58 -04:00
Rob Clark
74052347f3 freedreno: fd_draw helper
Have a single helper that all draws come through.. mainly for a
convenient debug and instrumentation point.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-09-14 13:31:58 -04:00
Rob Clark
4712904ddc freedreno/a3xx: fix gpu lockup in some piglit tests
The varying-out config comes from the inputs of the frag shader (so that
we aren't exporting unneeded varyinges).  The varyings-count should come
from the frag shader as well, to avoid a discrepency in configuration
and resulting gpu lockup.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-09-14 13:31:58 -04:00
Rob Clark
64c134cedb freedreno/a3xx/compiler: add LIT
Needed by glxgears and etuxracer ;-)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-09-14 13:31:58 -04:00
Rob Clark
cb9e07aa84 freedreno: multi-slice resources (cubemap, mipmap, etc)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-09-14 13:31:58 -04:00
Paul Berry
71ffac691b glsl/builtins: Fix {texture1D,texture2D,shadow1D}ArrayLod availibility.
These functions are defined in EXT_texture_array, which makes no
mention of what shader types they should be allowed in.  At the time
EXT_texture_array was introduced, functions ending in "Lod" were
available only in vertex shaders, however this restriction was lifted
in later spec versions and extensions.

We already have the function lod_exists_in_stage() for figuring out
whether functions ending in "Lod" should be available, so just re-use
that.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-09-13 14:59:06 -07:00
Kenneth Graunke
4b3c0a797f i965: Use brw_stage_state for WM data as well.
This gets the VS, GS, and PS all using the same data structure.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-09-13 14:26:52 -07:00
Kenneth Graunke
e6e5f88848 i965: Increase the size of brw_stage_state::surf_offset.
Since BRW_MAX_WM_SURFACES is greater than BRW_MAX_VEC4_SURFACES, the
existing array isn't large enough to be used by the WM.  Increasing it
will make it possible to share them.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-09-13 14:26:50 -07:00
Kenneth Graunke
3a835b699a i965: Add comments to the new brw_state_state structure's fields.
These are largely based on the similar fields in brw->wm.

v2: Add a better comment than "Scratch buffer".

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-09-13 14:26:31 -07:00
Ian Romanick
ea373f03e8 mesa: Rename MESA_shader_integer_mix to EXT_shader_integer_mix
Everyone at the Khronos meeting was as surprised that GLSL didn't
already support this as we were.  Several vendors said they'd ship it,
but there didn't seem to be enough interest to put in the effort to make
it ARB or KHR.

v2: Fix a couple typos and rename the spec file to
EXT_shader_integer_mix.spec.  Suggested by Roland.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-09-13 09:56:36 -05:00
Marek Olšák
f4e35f897e radeonsi: fix and enable transform feedback for CIK
The CP_STRMOUT_CNTL register was moved again.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-09-13 01:08:04 +02:00
Marek Olšák
f317ce5c5d radeonsi: fix gl_InstanceID with non-zero start_instance
start_instance doesn't affect gl_InstanceID.

There's no piglit test, but it's kinda obvious the code was wrong.

Reviewed-by: Christian König <christian.koenig@amd.com>
2013-09-13 01:08:03 +02:00
Marek Olšák
9c75d2f65b gallium: comment that INSTANCEID doesn't include start_instance
Reviewed-by: Christian König <christian.koenig@amd.com>
2013-09-13 01:08:03 +02:00
Marek Olšák
122a880b78 radeonsi: enable streamout AKA transform feedback for SI
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-09-13 01:07:56 +02:00
Marek Olšák
8d03d923b6 radeonsi: implement streamout shader support
The shader is responsible for writing to streamout buffers using
the TBUFFER_STORE_FORMAT_* instructions.

The locations of some input SGPRs and VGPRs are assigned dynamically, because
the input SGPRs controlling streamout are not declared if they are not needed,
decreasing the indices of all following inputs.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-09-13 01:04:44 +02:00
Marek Olšák
9d16e70b3f radeonsi: implement glDrawTransformFeedback functionality
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-09-13 01:04:44 +02:00
Marek Olšák
6cf29c7dab radeonsi: fix streamout queries
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-09-13 01:04:44 +02:00
Marek Olšák
91ede46222 radeonsi: implement streamout flush properly
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-09-13 01:04:44 +02:00
Marek Olšák
2993ccab38 radeonsi: bind streamout buffers to VGT and the vertex shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-09-13 01:04:44 +02:00
Marek Olšák
e4c5d3ee27 radeonsi: handle rasterizer_discard and set GS_OUT_PRIM_TYPE
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-09-13 01:04:44 +02:00
Marek Olšák
9eb3b9dc2b radeonsi: initialize the first CS like any other
So that the "init" state is always emitted first and not later in draw_vbo.

This fixes streamout where the "init" state, which disables streamout,
was emitted in draw_vbo after streamout was enabled.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-09-13 01:04:44 +02:00
Marek Olšák
2b0a54d6ec radeonsi: integrate shared streamout state
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-09-13 01:04:44 +02:00
Marek Olšák
4ea35023c5 radeon: don't emit streamout state if there are no streamout buffers
This could happen if set_stream_output_targets is called twice
in a row without a draw call in between.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-09-13 01:04:44 +02:00
Marek Olšák
60416cb173 radeon: don't emit VGT_STRMOUT_BUFFER_BASE on SI
The register doesn't exist on SI.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-09-13 01:04:44 +02:00
Kenneth Graunke
2b71b3d466 mesa: Disallow relinking if a program is used by an active XFB object.
Paused transform feedback objects may refer to a program other than the
current program.  If any active objects refer to a program, LinkProgram
must reject the request to relink.

The code to detect this is ugly since _mesa_HashWalk is awkward to use,
but unfortunately we can't use hash_table_foreach since there's no way
to get at the underlying struct hash_table (and even then, we'd need to
handle locking somehow).

Fixes the last subcase of Piglit's new ARB_transform_feedback2
api-errors test.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2013-09-12 10:19:10 -07:00
Kenneth Graunke
9cc74c93f8 mesa: Reject ResumeTransformFeedback if the wrong program is bound.
This is actually a pretty important error condition: otherwise, you
could set up transform feedback with one program, and resume it with
a program that generates a completely different set of outputs.

Fixes a subcase of Piglit's new ARB_transform_feedback2 api-errors test.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2013-09-12 10:19:09 -07:00
Kenneth Graunke
c732f68cf4 mesa: Track the vertex program active at BeginTransformFeedback() time.
The next few patches will use this for API error checking.

All of the drivers appear to CALLOC_STRUCT transform feedback objects,
so this should be properly NULL initialized on creation.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2013-09-12 10:19:07 -07:00