fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-07 02:48:06 +02:00

Author	SHA1	Message	Date
Kenneth Graunke	1eef0b73aa	i965: Rewrite FS input handling to use the new NIR intrinsics. This eliminates the need to walk the list of input variables, recurse into their types (via logic largely redundant with nir_lower_io), and interpolate all possible inputs up front. The backend no longer has to care about variables at all, which eliminates complications from trying to pack multiple variables into the same location. Instead, each intrinsic specifies exactly what's needed. This should unblock Timothy's work on GL_ARB_enhanced_layouts. Each load_interpolated_input intrinsic corresponds to PLN instructions, while load_barycentric_at_* intrinsics correspond to pixel interpolator messages. The pixel/centroid/sample barycentric intrinsics simply refer to payload fields (delta_xy[]), and don't actually generate any code. Because we use a single intrinsic for both centroid-qualified variables and interpolateAtCentroid(), they become indistinguishable. We stop sending pixel interpolator messages for those, and instead use the payload provided data, which should be considerably faster. On Broadwell: total instructions in shared programs: 9067751 -> 9067570 (-0.00%) instructions in affected programs: 145902 -> 145721 (-0.12%) helped: 422 HURT: 209 total spills in shared programs: 2849 -> 2899 (1.76%) spills in affected programs: 760 -> 810 (6.58%) helped: 0 HURT: 10 total fills in shared programs: 3910 -> 3950 (1.02%) fills in affected programs: 617 -> 657 (6.48%) helped: 0 HURT: 10 LOST: 3 GAINED: 3 The differences mostly appear to be slight changes in MOVs. v2: Use nir_shader_compiler_options::use_interpolated_input_intrinsics flag rather than passing it directly to nir_lower_io. Use the unreachable() macro rather than assert in one place. (Review feedback from Chris Forbes.) Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisforbes@google.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-20 11:01:16 -07:00
Kenneth Graunke	a2dc11a781	i965: Move load_interpolated_input/barycentric_* intrinsics to the top. Currently, i965 interpolates all FS inputs at the top of the program. This has advantages and disadvantages, but I'd like to keep that policy while reworking this code. We can consider changing it independently. The next patch will make the compiler generate PLN instructions "on the fly", when it encounters an input load intrinsic, rather than doing it for all inputs at the start of the program. To emulate this behavior, we introduce an ugly pass to move all NIR load_interpolated_input and payload-based (not interpolator message) load_barycentric_* intrinsics to the shader's start block. This helps avoid regressions in shader-db for cases such as: if (...) { ...load some input... } else { ...load that same input... } which CSE can't handle, because there's no dominance relationship between the two loads. Because the start block dominates all others, we can CSE all inputs and emit PLNs exactly once, as we did before. Ideally, global value numbering would eliminate these redundant loads, while not forcing them all the way to the start block. When that lands, we should consider dropping this hacky pass. Again, this pass currently does nothing, as i965 doesn't generate these intrinsics yet. But it will shortly, and I figured I'd separate this code as it's relatively self-contained. v2: Dramatically simplify pass - instead of creating new instructions, just remove/re-insert their list nodes (suggested by Jason Ekstrand). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisforbes@google.com> [v1] Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-20 11:01:11 -07:00
Kenneth Graunke	048a56c1fc	i965: Add a pass to demote sample interpolation intrinsics. When working with a non-multisampled render target, asking for "sample" interpolation locations doesn't make sense. We demote them to centroid. In a couple of patches, brw_compute_barycentric_modes will begin looking at these intrinsics to determine the barycentric modes. fs_visitor also will use them to code-generate pixel interpolator messages or payload references. Handling the "but what if it's not MSAA?" logic ahead of time in a NIR pass simplifies things and prevents duplicated logic. This patch doesn't actually do anything useful yet as we don't generate these intrinsics. I decided to keep it separate as it's self-contained, in the hopes of shrinking the "convert everything" patch for reviewers. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisforbes@google.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-20 11:01:08 -07:00
Kenneth Graunke	707ca00fce	nir: Add nir_load_interpolated_input lowering code. Now nir_lower_io can optionally produce load_interpolated_input and load_barycentric_* intrinsics for fragment shader inputs. flat inputs continue using regular load_input. v2: Use a nir_shader_compiler_options flag rather than ad-hoc boolean passing (in response to review feedback from Chris Forbes). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisforbes@google.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-20 11:01:00 -07:00
Kenneth Graunke	2496462479	nir: Add new intrinsics for fragment shader input interpolation. Backends can normally handle shader inputs solely by looking at load_input intrinsics, and ignore the nir_variables in nir->inputs. One exception is fragment shader inputs. load_input doesn't capture the necessary interpolation information - flat, smooth, noperspective mode, and centroid, sample, or pixel for the location. This means that backends have to interpolate based on the nir_variables, then associate those with the load_input intrinsics (say, by storing a map of which variables are at which locations). With GL_ARB_enhanced_layouts, we're going to have multiple varyings packed into a single vec4 location. The intrinsics make this easy: simply load N components from location <loc, component>. However, working with variables and correlating the two is very awkward; we'd much rather have intrinsics capture all the necessary information. Fragment shader input interpolation typically works by producing a set of barycentric coordinates, then using those to do a linear interpolation between the values at the triangle's corners. We represent this by introducing five new load_barycentric_* intrinsics: - load_barycentric_pixel (ordinary variable) - load_barycentric_centroid (centroid qualified variable) - load_barycentric_sample (sample qualified variable) - load_barycentric_at_sample (ARB_gpu_shader5's interpolateAtSample()) - load_barycentric_at_offset (ARB_gpu_shader5's interpolateAtOffset()) Each of these take the interpolation mode (smooth or noperspective only) as a const_index, and produce a vec2. The last two also take a sample or offset source. We then introduce a new load_interpolated_input intrinsic, which is like a normal load_input intrinsic, but with an additional barycentric coordinate source. The intention is that flat inputs will still use regular load_input intrinsics. This makes them distinguishable from normal inputs that need fancy interpolation, while also providing all the necessary data. This nicely unifies regular inputs and interpolateAt functions. Qualifiers and variables become irrelevant; there are just load_barycentric intrinsics that determine the interpolation. v2: Document the interp_mode const_index value, define a new BARYCENTRIC() helper rather than using SYSTEM_VALUE() for some of them (requested by Jason Ekstrand). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisforbes@google.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-20 11:00:45 -07:00
Kenneth Graunke	e614062e54	anv: Properly call gen75_emit_state_base_address on Haswell. This should fix MOCS values. Caught by Coverity. CID: 1364155 Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-20 10:59:44 -07:00
Kenneth Graunke	87660579f5	genxml: Rename "API Rendering Disable" to "Rendering Disable". Gen7/7.5 call it "Rendering Disable" while Gen8/9 prefix it with "API". Pick one for consistency, and so we can share code between generations. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-20 10:59:44 -07:00
Kenneth Graunke	bfd9942cdc	anv: Unify 3DSTATE_CLIP code across generations. The bulk of this is the same. There are just a couple fields that only exist on one generation or another, and we can easily handle those with an #ifdef. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-20 10:59:44 -07:00
Kenneth Graunke	44502afd82	anv: Enable early culling on Gen7. We set the cull mode, but forgot the enable bit. Gen8 uses this. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-20 10:59:44 -07:00
Kenneth Graunke	0d77f08042	anv: Fix near plane clipping on Gen7/7.5. The Gen7/7.5 clip code used APIMODE_OGL, while the Gen8+ clip code used APIMODE_D3D. The meaning hasn't changed, so one of these must be wrong. It appears that the hardware documentation is completely wrong. It claims that the "API Mode" bit means: 0h APIMODE_OGL NEAR_VP boundary == 0.0 (NDC) 1h APIMODE_D3D NEAR_VP boundary == -1.0 (NDC) However, DirectX typically uses 0.0 for the near plane, while unextended OpenGL uses -1.0. i965's gen6_clip_state.c uses APIMODE_D3D for the GL_ZERO_TO_ONE case, so I believe the meanings are backwards from what the documentation says. Section 23.2 ("Primitive Clipping") of the Vulkan 1.0.21 specification contains the following equations: -w_c <= x_c <= w_c -w_c <= y_c <= w_c 0 <= z_c <= w_c This means that Vulkan follows D3D semantics. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-20 10:59:44 -07:00
Kenneth Graunke	6b67270262	genxml: Add APIMODE_D3D missing enum values and improve consistency. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-20 10:59:44 -07:00
Kenneth Graunke	c31cf532af	genxml: Add CLIPMODE_* prefix to 3DSTATE_CLIP's "Clip Mode" enum values. Gen6-7.5 use CLIPMODE_REJECT_ALL, while Gen8+ just used REJECT_ALL. Being consistent will let me unify code, and I prefer having the prefix. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-20 10:59:44 -07:00
Tim Rowley	0f13a8f770	swr: [rasterizer core] introduce simd16intrin.h Refactoring to leave existing simd_* intrinsics in "simdintrin.h" unchanged, adding corresponding simd16_* intrinsics in "simd16intrin.h" on the side, with emulation, that we can use piecemeal, rather than the all-or-nothing approach to bring up avx512. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-07-20 10:22:15 -05:00
Tim Rowley	5fe361e2c0	swr: [rasterizer core] fix for possible int32 overflow condition Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-07-20 10:22:15 -05:00
Tim Rowley	a123d12e14	swr: [rasterizer core] rename _MAX enum values to _COUNT Makes these names semantically correct. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-07-20 10:22:15 -05:00
Tim Rowley	e41d9dd576	swr: [rasterizer core] centroid correction Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-07-20 10:22:15 -05:00
Tim Rowley	e0529a4668	swr: [rasterizer core] support range of values in TemplateArgUnroller Fixes Linux warnings. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-07-20 10:22:15 -05:00
Tim Rowley	0363015964	swr: [rasterizer core] ensure adjacent topologies use the cut-aware PA Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-07-20 10:22:15 -05:00
Tim Rowley	efdaf5fa3e	swr: [rasterizer] attribute swizzling and linkage Add support for enhanced attribute swizzling. Currently supports constant source overrides to handle PrimitiveID support. No support yet for input select swizzling or wrap shortest. Removes obsoleted linkageMask and associated code. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-07-20 10:22:15 -05:00
Tim Rowley	a5846fb75a	swr: [rasterizer common] icc declspec definitions Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-07-20 10:22:15 -05:00
Tim Rowley	0d13f2e801	swr: [rasterizer jitter] rework vertex/instance ID storage in fetch Moved the setting into the existing component control code. Fixes bad interaction between attribute/component setting for vertex/instance ID and component packing. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-07-20 10:22:14 -05:00
Tim Rowley	1d09b3971a	swr: [rasterizer core] avx512 simd utility work Enabling KNOB_SIMD_WIDTH = 16 for AVX512 pre-work and low level simd utils Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-07-20 10:22:14 -05:00
Tim Rowley	98641f4e73	swr: [rasterizer core] viewport rounding for disabled scissor Adjust viewport rounding when scissor rect is disabled during macro tile scissor setup. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-07-20 10:22:14 -05:00
Jason Ekstrand	96dfed49e4	i965: Stop muging cube array lengths by 6 From the Sky Lake PRM: "For SURFTYPE_CUBE: For Sampling Engine Surfaces and Typed Data Port Surfaces, the range of this field is [0,340], indicating the number of cube array elements (equal to the number of underlying 2D array elements divided by 6). For other surfaces, this field must be zero." In other words, the depth field for cube maps is in number of cubes not number of 2-D slices so we need to divide by 6. ISL will do this correctly for us assuming that we provide it with the correct array bounds which it expects to be in 2-D slices. It appears as if we've been doing this wrong ever since we first added cube map arrays for Sandy Bridge and the change to ISL made things slightly worse. While we're at it, we now need to remoe the shader hacks we've always done since they were only needed because we were setting the depth field six times too large. v2: Fix the vec4 backend as well (not sure how I missed this). Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Chris Forbes <chrisforbes@google.com>	2016-07-20 08:19:26 -07:00
Jason Ekstrand	e19b7f7f1b	i965/miptree: Set logical_depth0 == 6 for cube maps This matches what we do for cube maps where logical_depth0 is in number of face-layers rather than number of cubes. This does mean that we will temporarily be setting the surface bounds too loose for cube map textures but we are already setting them too loose for cube arrays and we will be fixing that in the next commit anyway. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Chris Forbes <chrisforbes@google.com> Cc: "12.0 11.2 11.1" <mesa-stable@lists.freedesktop.org>	2016-07-20 08:19:22 -07:00
Jason Ekstrand	d4d505d0b0	i965/miptree: Enforce that height == 1 for 1-D array textures The GL API and mesa internals do this differently than we do. In GL, there is no depth parameter for 1-D arrays and height is used. In the i965 miptree code we do the sane thing and make height == 1 and use depth for number of slices. This makes for a mismatch every time we create a 1-D array texture from GL. Instead of actually solving this problem, we just said "1-D is hard, let's make sure it works no matter which way we pass the parameters" and called it a day. This commit fixes the one GL -> i965 transition point where we weren't already handling 1-D array textures to do the right thing and then replaces the magic fixup code with an assert that you're doing the right thing. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Chris Forbes <chrisforbes@google.com> Cc: "12.0 11.2 11.1" <mesa-stable@lists.freedesktop.org>	2016-07-20 08:18:19 -07:00
Stefan Dirsch	27ef7bfd6c	Avoid overflow in 'last' variable of FindGLXFunction(...) This 'last' variable used in FindGLXFunction(...) may become negative, but has been defined as unsigned int resulting in an overflow, finally resulting in a segfault when accessing _glXDispatchTableStrings[...]. Fixed this by definining it as signed int. 'first' variable also needs to be defined as signed int. Otherwise condition for while loop fails due to C implicitly converting signed to unsigned values before comparison. Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Stefan Dirsch <sndirsch@suse.de> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-07-20 16:05:17 +01:00
Tomasz Figa	9e1248d075	egl/android: Stop leaking DRI images Current implementation of the DRI image loader does not free the images created in get_back_bo() and so leaks memory. Moreover, it creates a new image every time the DRI driver queries for buffers, even if the backing native buffer has not changed. leaking memory again. This patch adds missing call to destroyImage() in droid_enqueue_buffer() and a check if image is already created to get_back_bo() to fix the above. Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Tomasz Figa <tfiga@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-07-20 15:48:54 +01:00
Tomasz Figa	565fa6b748	egl/android: Add some useful error messages It is much easier to debug issues when the application gives some meaningful error messages. This patch adds few to the EGL Android platform backend. Signed-off-by: Tomasz Figa <tfiga@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-07-20 15:48:03 +01:00
Tomasz Figa	94282b6dd0	egl/android: Check return value of dri2_get_dri_config() It might return NULL if specific config variant is unsupported. Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Tomasz Figa <tfiga@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-07-20 15:47:23 +01:00
Emil Velikov	4f48674d51	i965: store reference to the context within struct brw_fence (v2) As the spec allows for {server,client}_wait_sync to be called without currently bound context, while our implementation requires context pointer. v2: Add a mutex and acquire it for the duration of brw_fence_client_wait() and brw_fence_is_completed() as suggested by Chad. Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Tomasz Figa <tfiga@chromium.org>	2016-07-20 15:45:20 +01:00
Nicolas Boichat	9bebef4034	egl/dri2: dri2_make_current: Set EGL error if bindContext fails Without this, if a configuration is, say, available only on GLES2/3, but not on GLES1, and is rejected by the dri module's bindContext call, eglMakeCurrent fails with error "EGL_SUCCESS". In this patch, we set error to EGL_BAD_MATCH, which is what CTS/dEQP dEQP-EGL.functional.surfaceless_context expect. Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-07-20 15:10:33 +01:00
Tomasz Figa	ccda100a5a	egl/android: Remove unused variables There are some unused variables left after previous clean-ups triggering compiler warnings. Let's remove them. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Tomasz Figa <tfiga@chromium.org> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-07-20 15:10:33 +01:00
Tomasz Figa	70a28afb29	gallium/dri: Add shared glapi to LIBADD on Android An earlier patch fixed the problem for classic drivers, however Gallium was still left broken. This patch applies the same workaround to Gallium, when compiled for Android. Following is a quote from the original patch: `0cbc90c57c` mesa: dri: Add shared glapi to LIBADD on Android /system/vendor/lib/dri/*_dri.so actually depend on libglapi: without this, loading the so file fails with: cannot locate symbol "__emutls_v._glapi_tls_Context" On non-Android (non-bionic) platform, EGL uses the following workflow, which works fine: dlopen("libglapi.so", RTLD_LAZY \| RTLD_GLOBAL); dlopen("dri/<driver>_dri.so", RTLD_NOW \| RTLD_GLOBAL); However, bionic does not respect the RTLD_GLOBAL flag, and the dri library cannot find symbols in libglapi.so, so we need to link to libglapi.so explicitly. Android.mk already does this. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Tomasz Figa <tfiga@chromium.org> Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-07-20 15:10:33 +01:00
Emil Velikov	ae9a2baaa6	mesa: scons: remove left over src/glsl include The path no longer exists. Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2016-07-20 13:33:43 +01:00
Emil Velikov	1c7c0d77ac	mesa: scons: list builddir before srcdir Analogous to previous commit. Note: scons always uses OOT builds, while the in-tree generated files could be created either manually or by the autoconf build. Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org> Cc: Alexander von Gluck IV <kallisti5@unixzen.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2016-07-20 13:32:24 +01:00
Emil Velikov	eafa82e20e	mesa: automake: list builddir before srcdir In the case of building in out-of-tree fashion, while having generated in-tree sources, the latter [likely stale] files will be used. Flip the order to prevent that. Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2016-07-20 13:30:50 +01:00
Józef Kucia	14608ef920	radeonsi: advertise 8 bits subpixel precision for viewport bounds Signed-off-by: Józef Kucia <joseph.kucia@gmail.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2016-07-20 12:45:31 +02:00
Józef Kucia	98aa807188	r600: advertise 8 bits subpixel precision for viewport bounds Signed-off-by: Józef Kucia <joseph.kucia@gmail.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2016-07-20 12:45:31 +02:00
Józef Kucia	3cd28fe3de	gallium: add a cap for VIEWPORT_SUBPIXEL_BITS (v2) This allows Gallium drivers to advertise the subpixel precision for floating point viewports bounds. v2: - Set ViewportSubpixelBits in st_init_limits. Signed-off-by: Józef Kucia <joseph.kucia@gmail.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-07-20 12:45:31 +02:00
Samuel Pitoiset	3c78d89692	nvc0: disable MS images on GM107+ MS images have to be handled explicitly and I don't plan to implement them for now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-07-20 11:11:33 +02:00
Samuel Pitoiset	8489f20689	nv50/ir: print OP_SUREDB subops in debug mode Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-07-20 11:11:30 +02:00
Samuel Pitoiset	1edc44bfd3	gm107/ir: add emission for SUREDx Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-07-20 11:11:26 +02:00
Samuel Pitoiset	4aaacd6dd0	gm107/ir: add emission for SUSTx and SULDx Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-07-20 11:11:21 +02:00
Samuel Pitoiset	e14cb05ce1	gm107/ra: fix constraints for surface operations Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-07-20 11:11:16 +02:00
Samuel Pitoiset	c68989b2c8	gm107/ir: lower surface operations Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-07-20 11:11:12 +02:00
Samuel Pitoiset	2ae4b5d622	nvc0: bind images for 3d/cp shaders on GM107+ On Maxwell, images binding is slightly different (and much better) regarding Fermi and Kepler because a texture view needs to be uploaded for each image and this is going to simplify the thing a lot. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-07-20 11:11:03 +02:00
Samuel Pitoiset	1da704a94c	nvc0: increase the tex handles area size in the driver cb Currently, we can store 32 tex handles of 32-bits integer each and that fits perfectly with the underlying hardware except on GM107+ which requires to upload a texture view for each images. This patch increases the number of storable texture handles in the driver constant buffer from 32 to 40 because we expose 8 images. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-07-20 11:10:56 +02:00
Kenneth Graunke	f0f466214e	nir: Fix uninitialized use of 'replacement'. For intrinsics we don't care about, just skip to the next loop iteration and process the next instruction. We don't want to execute the rest of the code. This was a bug in commit `cdfc05ea6e`. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2016-07-19 17:34:59 -07:00
Kenneth Graunke	89873c9b08	i965: Use tex_mocs instead of rb_mocs for GL images. Fixes a 10-20% performance regression in OglCSDof caused by commit `5a8c89038a`, which made images (in the image load/store sense) use BDW_MOCS_PTE instead of BDW_MOCS_WB. This seems sketchy, as the default PTE value is supposed to be WB LLC eLLC, which is the same as our MOCS WB setting. It's only supposed to change when using a surface for display, which won't ever happen for images. Something may be wrong in the kernel... Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-19 17:34:59 -07:00

1 2 3 4 5 ...

83410 commits