Commit graph

67792 commits

Author SHA1 Message Date
Topi Pohjolainen
97caf5fa04 meta/blit: Write depth only when asked for
Implementing an idea from Ken, on i965 the shader program for 2D
blits becomes significantly simpler.

Before:

pln(8)   g6<1>F    g4<0,1,0>F    g2<8,8,1>F  { align1 1Q compacted };
pln(8)   g7<1>F    g4.4<0,1,0>F  g2<8,8,1>F  { align1 1Q compacted };
send(8)  g2<1>UW   g6<8,8,1>F
         sampler (1, 0, 0, 1) mlen 2 rlen 4  { align1 1Q };
mov(8)   g123<1>F  g2<8,8,1>F                { align1 1Q compacted };
mov(8)   g124<1>F  g3<8,8,1>F                { align1 1Q compacted };
mov(8)   g125<1>F  g4<8,8,1>F                { align1 1Q compacted };
mov(8)   g126<1>F  g5<8,8,1>F                { align1 1Q compacted };
mov(8)   g127<1>F  g2<8,8,1>F                { align1 1Q compacted };
nop                                                             ;
sendc(8) null        g123<8,8,1>F
    render RT write SIMD8 LastRT Surface = 0 mlen 5 rlen 0 { align1 1Q EOT };

After:

pln(8)   g6<1>F     g4<0,1,0>F    g2<8,8,1>F   { align1 1Q compacted };
pln(8)   g7<1>F     g4.4<0,1,0>F  g2<8,8,1>F   { align1 1Q compacted };
send(8)  g124<1>UW  g6<8,8,1>F
         sampler (1, 0, 0, 1) mlen 2 rlen 4    { align1 1Q };
sendc(8) null        g124<8,8,1>F
   render RT write SIMD8 LastRT Surface = 0 mlen 4 rlen 0 { align1 1Q EOT };

v2 (Matt): Removed unintended white-space change

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-30 09:57:51 +02:00
Topi Pohjolainen
4c157d34c0 meta/blit: Add plumbing for shaders without depth
Currently all blit programs are unconditionally compiled with
gl_FragDepth.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-30 09:54:53 +02:00
Jason Ekstrand
604ae33c8b nir/opt_algebraic: Add some constant bcsel reductions
total instructions in shared programs: 5998190 -> 5997603 (-0.01%)
instructions in affected programs:     54276 -> 53689 (-1.08%)
helped:                                293

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:11:13 -08:00
Jason Ekstrand
7f19cd5a56 nir/opt_algebraic: Add some boolean simplifications
total instructions in shared programs: 5998321 -> 5998287 (-0.00%)
instructions in affected programs:     4520 -> 4486 (-0.75%)
helped:                                8

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:11:10 -08:00
Jason Ekstrand
70273c5cd5 nir/algebraic: Support specifying variable as constant or by type
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:07:45 -08:00
Jason Ekstrand
81f77e4f3a nir/algebraic: Fail to compile of a variable is used in a replace but not the search
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:07:45 -08:00
Jason Ekstrand
026b5cc792 nir/search: Allow for matching variables based on types
This allows you to match on an unknown value but only if it is of a given
type.  90% of the uses of this are for matching only booleans, but adding
the generality of arbitrary types is no more complex.

nir_algebraic.py doesn't handle this yet but that's ok because the C
language will ensure that the default type on all variables is void.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:07:45 -08:00
Jason Ekstrand
d8999bcdce nir/search: Add support for matching unknown constants
There are some algebraic transformations that we want to do but only if
certain things are constants.  For instance, we may want to replace
a * (b + c) with (a * b) + (a * c) as long as a and either b or c is constant.
While this generates more instructions, some of it will get constant
folded.

nir_algebraic.py doesn't handle this yet, but that's ok because the C
language will make sure that false is the default for now.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:07:45 -08:00
Jason Ekstrand
5ab1489ae6 nir: Add an invalid type
This allows us to indicate a concept of an invalid type.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:07:45 -08:00
Roland Scheidegger
f01e8d3ba5 gallium/docs: fix docs wrt ARL/ARR/FLR
since the address reg holds integer values, ARL/ARR do an implicit float-to-int
conversion, so clarify that. Thus it is also incorrect to say that FLR really
does the same as ARL.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-01-29 22:08:12 +01:00
Eric Anholt
fc884eadf1 nir: Add variants of some of the comparison simplifications.
We end up with these from TGSI-to-NIR because the pass generating the
comparisons doesn't know if the arg is actually a bool input or not.  vc4
results:

total instructions in shared programs: 41801 -> 41508 (-0.70%)
instructions in affected programs:     4253 -> 3960 (-6.89%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-29 11:44:06 -08:00
Eric Anholt
2b9c3bace7 vc4: Fix point size handling when it's the first output. 2015-01-29 11:43:33 -08:00
Eric Anholt
9a3a60cb13 nir: Don't try to to-SSA ALU instructions that are already SSA.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-29 11:43:33 -08:00
Eric Anholt
68d476167c nir: Fix a bit of broken indentation.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-29 11:42:08 -08:00
Eric Anholt
36c604c824 nir: Add a couple of helpers for glsl types.
This will be used by tgsi_to_nir, which needs to get vec4 types for
declaring shader input/output variables.

v2: Add a missing space.

Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-29 11:41:17 -08:00
Emil Velikov
765cfe9a90 docs: fix mesa 10.4.3 release date
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-01-29 14:02:48 +00:00
Kalyan Kondapally
e638841b87 Mesa: Advertise GL_OES_texture_*float* extensions support with i965.
This patch advertises support for GL_OES_texture_*float* extensions
when using i965 drivers.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-01-29 08:22:12 +02:00
Kalyan Kondapally
2c2a92d5b8 Mesa: Add support for HALF_FLOAT_OES type.
This patch adds needed support for accepting HALF_FLOAT_OES as valid type
for TexImage*D and TexSubImage*D when Texture FLoat extensions are supported.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-01-29 08:21:41 +02:00
Kalyan Kondapally
a63c8a524b Mesa: Add support for GL_OES_texture_*float* extensions.
This patch series adds support for following GLES2 Texture Float extensions:
1)GL_OES_texture_float,
2)GL_OES_texture_half_float,
3)GL_OES_texture_float_linear,
4)GL_OES_texture_half_float_linear.

This patch adds basic infrastructure and needed boolean flags to advertise
support for these extensions, by default the support is disabled. Next patch
in the series introduces support for HALF_FLOAT_OES token.

v4: take assert away and make valid_filter_for_float conditional (Tapani),
    fix the alphabetical order (Emil)

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-01-29 08:16:47 +02:00
Eric Anholt
dd4d9a4e62 nir: Make vec-to-movs handle src/dest aliasing.
It now emits vector MOVs instead of a series of individual MOVs, which
should be useful to any vector backends.  This pushes the problem of
src/dest aliasing of channels on a scalar chip to the backend, but if
there are any vector operations in your shader then you needed to be
handling this already.

Fixes fs-swap-problem with my scalarizing patches.

v2: Rename to insert_mov(), and add a comment about what it does.
v3: Rewrite the comment.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v3)
2015-01-28 16:33:34 -08:00
Eric Anholt
d70eb38517 gallium: Replace u_simple_list.h with util/simple_list.h
The code was exactly the same, except util/ has c++ guards and a struct
simple_node declaration.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-01-28 16:33:34 -08:00
Eric Anholt
7c99187c6a mesa: Port a variant of 68afbe89c7 to util/
The idea is that after a remove_from_list(), you might want to be able to
do a remove_from_list() on it again or an is_empty_list().  This is
apparently relied on by r300g.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-01-28 16:33:34 -08:00
Eric Anholt
8ab6759cef mesa: Move simple_list.h to src/util.
We have two copies of it in the tree, I'm going to delete one.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-01-28 16:33:34 -08:00
Tom Stellard
2397a72129 radeonsi: Enable VGPR spilling for all shader types v5
v2:
  - Only emit write SPI_TMPRING_SIZE once per packet.
  - Use context global scratch buffer.

v3:
  - Patch shaders using WRITE_DATA packet instead of map/unmap.
  - Emit ICACHE_FLUSH, CS_PARTIAL_FLUSH, PS_PARTIAL_FLUSH, and
    VS_PARTIAL_FLUSH when patching shaders.

v4:
  - Code cleanups.
  - Remove unnecessary multiplies.

v5:
  - Patch shaders in system memory and re-upload to vram.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-28 21:03:47 +00:00
Tom Stellard
5dcd97f25c radeonsi/compute: Allocate the scratch buffer during state creation
This moves scratch buffer allocation from si_launch_grid() to
si_create_compute_state().  This helps to reduce the overhead of
launching a kernel and also fixes a bug in the code that would cause
the scratch buffer to be too small if a kernel with smaller scratch size
was launched before a kernel with a larger scratch size.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-28 21:03:46 +00:00
Tom Stellard
32206c5e56 radeonsi: Add radeon_shader_binary member to struct si_shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-28 21:03:46 +00:00
Tom Stellard
37559f8dfc radeonsi/compute: Rename si_compute::program to si_compute::shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-28 21:03:46 +00:00
Marek Olšák
5935edd47c radeonsi: Avoid leaking memory when rebuilding shader states
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-28 21:03:46 +00:00
Jason Ekstrand
bb26ebac13 nir/opcodes: Use a return type of tfloat for ldexp
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-28 13:21:40 -08:00
Jason Ekstrand
7ac79eea1a Revert "util: Move the alternate fpclassify implementation to util"
This reverts commits d6eb572905 and
58e8468d11.

This is no longer necessary as we aren't using it in NIR anymore.  Also, it
broke the build on some strange systems so let's put it back in querymatrix
where it came from.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88852

Acked-by: Matt Turner <mattst88@gmail.com>
2015-01-28 13:20:26 -08:00
Jason Ekstrand
f0340ff625 Revert "nir/opcodes: Use fpclassify() instead of isnormal() for ldexp"
This reverts commit d7d340fb2f.

We have an isnormal() implementation available, the only problem was that
we had the wrong return type (fixed in a later patch).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806

Acked-by: Matt Turner <mattst88@gmail.com>
2015-01-28 13:19:47 -08:00
Jason Ekstrand
58e8468d11 util: Predicate the fpclassify fallback on !defined(__cplusplus)
The problem is that the fallbacks we have at the moment don't work in C++.
While we could theoretically fix the fallbacks it would also raise the
issue of correctly detecting the fpclassify function.  So, for now, we'll
just disable it until we actually have a C++ user.

Reported-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: EdB <edb+mesa@sigluy.net>
2015-01-28 11:47:56 -08:00
Sven Arvidsson
3b7747c022 drirc: set allow_glsl_extension_directive_midshader for Dead Island.
Signed-off-by: Sven Arvidsson <sa@whiz.se>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87076
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-01-28 14:50:28 +01:00
Jason Ekstrand
d7d340fb2f nir/opcodes: Use fpclassify() instead of isnormal() for ldexp
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-01-28 03:42:41 -08:00
Jason Ekstrand
d6eb572905 util: Move the alternate fpclassify implementation to util
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-01-28 03:42:41 -08:00
Jason Ekstrand
5e8468e6da i965/tex: Don't create read-write textures with non-renderable formats
I haven't actually seen this bug in the wild, but it's possible that
someone could ask to do a S3TC PBO download or something.  This protects us
from accidentally creating a render target with a compressed or otherwise
non-renderable format.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-28 01:28:32 -08:00
Jason Ekstrand
34723c0861 i965/gen8: Include the buffer offset when emitting renderbuffer relocs
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88792
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-28 01:28:31 -08:00
Tapani Pälli
291d7ef84d mesa: improve error messaging for format CSV parser
Patch adds 2 error messages that point user directly to fix
mispelled or impossible swizzle field for a format.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-28 10:40:15 +02:00
EdB
6ee5effac1 clover/llvm: Dump the OpenCL C code earlier.
[ Francisco Jerez: As discussed on the mailing list, this is intended
  to produce more useful debug output in cases where the compilation
  terminates unexpectedly. ]

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-01-28 02:27:41 +02:00
EdB
13d23a9a17 clover/llvm: Move CLOVER_DEBUG stuff into anonymous namespace.
[ Francisco Jerez: As we're at it make debug_options[] local to its
  only user and remove temporary. ]

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-01-28 02:27:41 +02:00
Dave Airlie
349df23eb0 r600g: add support for primitive id without geom shader (v2)
GLSL 1.50 specifies a fragment shader may have a primitive id
input without a geometry shader present.

On r600 hw there is a special GS scenario for this, you have
to enable GS_SCENARIO_A and pass the primitive id through
the vertex shader which operates in GS_A mode.

This is a first pass attempt at this, and passes the piglit
tests that test for this.

v1.1: clean up debug print + no need to assign
key value to setup output.
v2: add r600 support

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-01-28 09:51:21 +10:00
Dave Airlie
cc2fc095bf r600g: move selecting the pixel shader earlier.
In order to detect that a pixel shader has a prim id
input when we have no geometry shader we need to reorder
the shader selection so the pixel shader is selected
first, then the vertex shader key can take into account
the primitive id input requirement and lack of geom shader.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-01-28 09:51:02 +10:00
Michel Dänzer
5c83a0d2ce st/clover: Pass target instead of target.begin() to std::string()
Fixes reading beyond allocated memory:

==1936== Invalid read of size 1
==1936==    at 0x4C2C1B4: strlen (vg_replace_strmem.c:412)
==1936==    by 0x9E00C30: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.20)
==1936==    by 0x5B44FAE: clover::compile_program_llvm(clover::compat::string const&, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&, pipe_shader_ir, clover::compat::string const&, clover::compat::string const&, clover::compat::string&) (invocation.cpp:698)
==1936==    by 0x5B39A20: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==1936==    by 0x5B20152: clBuildProgram (program.cpp:182)
==1936==    by 0x400F41: main (hello_world.c:109)
==1936==  Address 0x56fee1f is 0 bytes after a block of size 15 alloc'd
==1936==    at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==1936==    by 0x5B398F0: alloc (compat.hpp:59)
==1936==    by 0x5B398F0: vector<std::basic_string<char> > (compat.hpp:98)
==1936==    by 0x5B398F0: string<std::basic_string<char> > (compat.hpp:327)
==1936==    by 0x5B398F0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==1936==    by 0x5B20152: clBuildProgram (program.cpp:182)
==1936==    by 0x400F41: main (hello_world.c:109)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-01-27 16:55:29 +09:00
Michel Dänzer
ee31c8d706 r600g,radeonsi: Fix calculation of IR target cap string buffer size
Fixes writing beyond the allocated buffer:

==31855== Invalid write of size 1
==31855==    at 0x50AB2A9: vsprintf (iovsprintf.c:43)
==31855==    by 0x508F6F6: sprintf (sprintf.c:32)
==31855==    by 0xB59C7EC: r600_get_compute_param (r600_pipe_common.c:526)
==31855==    by 0x5B2B7DE: get_compute_param<char> (device.cpp:37)
==31855==    by 0x5B2B7DE: clover::device::ir_target() const (device.cpp:201)
==31855==    by 0x5B398E0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==31855==    by 0x5B20152: clBuildProgram (program.cpp:182)
==31855==    by 0x400F41: main (hello_world.c:109)
==31855==  Address 0x56fed5f is 0 bytes after a block of size 15 alloc'd
==31855==    at 0x4C29180: operator new(unsigned long) (vg_replace_malloc.c:324)
==31855==    by 0x5B2B7C2: allocate (new_allocator.h:104)
==31855==    by 0x5B2B7C2: allocate (alloc_traits.h:357)
==31855==    by 0x5B2B7C2: _M_allocate (stl_vector.h:170)
==31855==    by 0x5B2B7C2: _M_create_storage (stl_vector.h:185)
==31855==    by 0x5B2B7C2: _Vector_base (stl_vector.h:136)
==31855==    by 0x5B2B7C2: vector (stl_vector.h:278)
==31855==    by 0x5B2B7C2: get_compute_param<char> (device.cpp:35)
==31855==    by 0x5B2B7C2: clover::device::ir_target() const (device.cpp:201)
==31855==    by 0x5B398E0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==31855==    by 0x5B20152: clBuildProgram (program.cpp:182)
==31855==    by 0x400F41: main (hello_world.c:109)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-01-27 16:54:38 +09:00
Connor Abbott
f1a9252def nir: fix a bug with constant folding non-per-component instructions
Before, we were only copying the first N channels, where N is the size
of the SSA destination, which is fine for per-component instructions,
but non-per-component instructions like fdot3 can have more source
components than destination components. Fix this using the helper
function introduced in the last patch.

v2: use new helper name

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-26 21:26:36 -05:00
Connor Abbott
816f0515a2 nir: add a helper function for getting the number of source components
Unlike with non-SSA ALU instructions, where if they're per-component
you have to look at the writemask to know which source channels are
being used, SSA ALU instructions always have all the possible channels
enabled so we can just look at the number of components in the SSA
definition for per-component instructions to say how many source
components are being used.

v2: use new name nir_ssa_alu_instr_src_components()

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-26 21:26:36 -05:00
Sisinty Sasmita Patra
90bd943f2a i965: Implemente a tiled fast-path for glReadPixels and glGetTexImage
Added intel_readpixels_tiled_mempcpy and intel_gettexsubimage_tiled_mempcpy
functions. These are the fast paths for glReadPixels and glGetTexImage.

On chrome, using the RoboHornet 2D Canvas toDataURL test, this patch cuts
amount of time spent in glReadPixels by more than half and reduces the time
of the entire test by 10%.

v2: Jason Ekstrand <jason.ekstrand@intel.com>
   - Refactor to make the functions look more like the old
     intel_tex_subimage_tiled_memcpy
   - Don't export the readpixels_tiled_memcpy function
   - Fix some pointer arithmatic bugs in partial image downloads (using
     ReadPixels with a non-zero x or y offset)
   - Fix a bug when ReadPixels is performed on an FBO wrapping a texture
     miplevel other than zero.

v3: Jason Ekstrand <jason.ekstrand@intel.com>
   - Better documentation fot the *_tiled_memcpy functions
   - Add target restrictions for renderbuffers wrapping textures

v4: Jason Ekstrand <jason.ekstrand@intel.com>
   - Only check the return value of brw_bo_map for error and not bo->virtual

v5: Jason Ekstrand <jason.ekstrand@intel.com>
   - Don't unnecessarily repeat a comment

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-01-26 17:29:35 -08:00
Sisinty Sasmita Patra
b52959c602 i965/tiled_memcpy: Add tiled-to-linear paths
This commit addes tiled copy functions for coping from tiled memory to
linear memory.  These are very similar to the existing linear-to-tiled
paths.

v2: Jason Ekstrand <jason.ekstrand@intel.com>
   - New commit message
   - Various whitespace fixes
   - Added ptrdiff_t casts as done in commit 225a09790

v3: Jason Ekstrand <jason.ekstrand@intel.com>
   - Fixed a comment

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-01-26 17:29:34 -08:00
Sisinty Sasmita Patra
009be40b7d i965: Refactor tiled memcpy functions and move them into their own file
This commit refactors the tiled_memcpy code in intel_tex_subimage.c and
moves it into its own file intel_tiled_memcpy files.  Also, xtile_copy and
ytile_copy are renamed to linear_to_xtiled and linear_to_ytiled
respectively.  The *_faster functions are similarly renamed.

There was also a bit of logic to select between the the libc provided
memcpy function and our custom memcpy that does an RGBA -> BGRA swizzle.
This was moved into an intel_get_memcpy function so that rgba8_copy can
live (and be inlined) in intel_tiled_memcpy.c.

v2: Jason Ekstrand <jason.ekstrand@intel.com>
   - Better commit message
   - Fix up the copyright on the intel_tiled_memcpy files
   - Various whitespace fixes
   - Moved a bunch of stuff that did not need to be exposed from
     intel_tiled_memcpy.h to intel_tiled_memcpy.c
   - Added proper documentation for intel_get_memcpy
   - Incorperated the ptrdiff_t tweaks from commit 225a09790

v3: Jason Ekstrand <jason.ekstrand@intel.com>
   - Fixed a comment
   - Move the tile size constants into the .c file

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-01-26 17:29:34 -08:00
Jason Ekstrand
f883aac06e i965/tex_subimage: Use the fast tiled path for rectangle textures
There's no reason why we should be doing this for 2D textures and not
rectangles.  Just a matter of adding another hunk to the condition.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-01-26 17:29:34 -08:00