fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-03-18 16:40:34 +01:00

Author	SHA1	Message	Date
Francisco Jerez	fadf347735	i965: Fix stride field for the result of emit_uniformize(). This is essentially the same problem fixed in an earlier patch for immediates. Setting the stride to zero will be particularly useful for my future SIMD lowering pass, because we will be able to just check whether the stride of a source register is zero and skip emitting the copies required to unzip it in that case. Instead of setting stride to zero in every caller of emit_uniformize() I've changed the function to return the result as its return value (previously it was being written into a caller-provided destination register), because this way we can enforce that the result is used with the correct regioning from the function itself. The changes to the prototype of its VEC4 counterpart are mainly for the sake of symmetry, VEC4 registers don't have stride. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2015-07-21 17:54:00 +03:00
Francisco Jerez	9383664a9c	i965/fs: Fix stride field for uniforms. This fixes essentially the same problem as for immediates. Registers of the UNIFORM file are typically accessed according to the formula: read_uniform(r, channel_index, array_index) = read_element(r, channel_index * 0 + array_index * 1) Which matches the general direct addressing formula for stride=0: read_direct(r, channel_index, array_index) = read_element(r, channel_index * stride + array_index * max{1, stride * width}) In either case if reladdr is present the access will be according to the composition of two register regions, the first one determining the per-channel array_index used for the second, like: read_indirect(r, channel_index, array_index) = read_direct(r, channel_index, read(r.reladdr, channel_index, array_index)) where: read(r, channel_index, array_index) = if r.reladdr == NULL then read_direct(r, channel_index, array_index) else read_indirect(r, channel_index, array_index) In conclusion we can handle uniforms consistently with the other register files if we set stride to zero. After lowering to a GRF using VARYING_PULL_CONSTANT_LOAD in demote_pull_constant_loads() the stride of the source is set to one again because the result of VARYING_PULL_CONSTANT_LOAD is generally non-uniform. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2015-07-21 17:54:00 +03:00
Francisco Jerez	5f8d9ae5a5	i965/fs: Fix stride for immediate registers. When the width field was removed from fs_reg the BROADCAST handling code in opt_algebraic() started to miss a number of trivial optimization cases resulting in the ugly indirect-addressing sequence to be emitted unnecessarily for some variable-indexed texturing and UBO loads regardless of one of the sources of BROADCAST being immediate. Apparently the reason was that we were setting the stride field to one for immediates even though they are typically uniform. Width used to be set to one too which is why this optimization used to work previously until the "reg.width == 1" check was removed. The stride field of vector immediates is intentionally left equal to one, because they are strictly speaking not uniform. The assertion in fs_generator makes sure that immediates have the expected stride as consistency check. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2015-07-21 17:54:00 +03:00
Iago Toral Quiroga	b298311d51	i965/vec4: Fix liveness analysis with BRW_OPCODE_SEL We only consider a vgrf defined by a given block if the block writes to it unconditionally. So far we have been checking this by testing that the instruction is not predicated, however, in the case of BRW_OPCODE_SEL, the predication is used to select the value to write, not to decide if the write is actually done. The consequence of this was increased life spans for affected vgrfs, which could lead to additional register pressure. Since NIR generates selects for conditional writes this was causing massive register pressure in a handful of piglit and dEQP tests that had a large number of select operations with the NIR-vec4 backend. Fixes the following piglit tests with the NIR-vec4 backend: spec/glsl-1.50/execution/variable-indexing/vs-output-array-vec4-index-wr-before-gs spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec4-index-rd spec/glsl-1.50/execution/variable-indexing/vs-output-array-vec2-index-wr-before-gs spec/glsl-1.50/execution/variable-indexing/vs-output-array-vec3-index-wr-before-gs spec/glsl-1.50/execution/variable-indexing/vs-output-array-float-index-wr-before-gs Fixes 80 dEQP tests with the NIR-vec4 backend in the following category: dEQP-GLES3.functional.ubo.* Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-07-21 09:00:14 +02:00
Kenneth Graunke	2f11e92cef	mesa: Rename _mesa_lookup_enum_by_nr() to _mesa_enum_to_string(). Generated by sed; no manual changes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Brian Paul <brianp@vmware.com>	2015-07-20 16:45:37 -07:00
Samuel Pitoiset	cd0dec0d9d	nouveau: use bool instead of boolean Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-07-21 00:42:53 +02:00
Tom Stellard	4be30fcd05	gallivm: Initialize LLVM Modules's DataLayout to an empty string. This fixes crashes in llvmpipe with LLVM 3.8 and also some piglit tests on radeonsi that use the draw module. This is just a temporary solution. The correct solution will require creating a TargetMachine during gallivm initialization and pulling the DataLayout from there. This will be a somewhat invasive change, and it will need to be validatated on multiple LLVM versions. https://llvm.org/bugs/show_bug.cgi?id=24172 Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2015-07-20 20:28:26 +00:00
Samuel Pitoiset	5b7dd4d419	nvc0: add a missing parameter to nvc0_set_shader_images() This fixes a compilation warning introduced in commit `05a12c5` (gallium: add interface for writable shader images). While we are at it, fix indentation and rename parameters according to the gallium interface. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-07-20 18:49:14 +02:00
Samuel Pitoiset	c2cb771354	nouveau: always align buffers to 0x100 Only constbufs must be aligned to 0x100, but since all buffers can be rebinded as constant buffers they must be also aligned. This patch prevents this behaviour by aligning everything to 256-byte increments at buffer creation. This fixes dmesg fails for the following piglit test: ext_transform_feedback-immediate-reuse-uniform-buffer -auto -fbo Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-07-20 18:48:27 +02:00
Samuel Pitoiset	19a6214b0f	nv50: limit the maximum number of samplers to 16 NV50_3D_BIND_TSC only allows to bind 16 samplers, and since we don't want to do anything with NV50_3D_BIND_TSC2, just limit the maximum number of samplers to 16 like for nvc0. This fixes dmesg fails with the following piglit test: max-samplers But the test still fails. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-07-20 18:45:56 +02:00
Samuel Pitoiset	6d207b8e35	nv50: turn samples counts off during blit Fixes the following piglit test: occlusion_query_meta_no_fragments Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-07-20 18:45:56 +02:00
Samuel Pitoiset	d246a96bbc	nv50: add nesting support for occlusion queries This is loosely based on nvc0. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-07-20 18:45:55 +02:00
Alejandro Piñeiro	8ba1982b1e	i965/nir/fs: removed unneeded support for global variables As functions are inlined, and nir_lower_global_vars_to_local gets run, all global variables are lowered to local variables. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-07-20 09:50:04 +02:00
Ilia Mirkin	801d41fa43	nv50: fix max level clamping on G80 It appears that the G80 did not have support for the sampler view first/last clamping. Put the view's last level in the place of the texture's so that it doesn't go past what the sampler view allows. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2015-07-20 00:59:37 -04:00
Ilia Mirkin	8c8a71f0d1	gm107/ir: fix indirect txq emission Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2015-07-18 19:03:07 -04:00
Ilia Mirkin	346ce0b988	nvc0/ir: don't worry about sampler in txq handling There's no need to deal with samplers for texture size queries. That code also was accidentally setting an invalid sIndirectSrc position, but it can now just be removed. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2015-07-18 18:48:14 -04:00
Ilia Mirkin	20e484afa4	nvc0/ir: fix txq on indirect samplers Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2015-07-18 17:34:48 -04:00
Abdiel Janulgue	670914ea7c	i965: Disable resource streamer in BLORP Switch off hardware-generated binding tables and gather push constants in the blorp. Blorp requires only a minimal set of simple constants. There is no need for the extra complexity to program a gather table entry into the pipeline. Cc: kenneth@whitecape.org Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>	2015-07-18 16:17:01 +03:00
Abdiel Janulgue	fc65b6eb61	i965: Upload binding tables in hw-generated binding table format. When hardware-generated binding tables are enabled, use the hw-generated binding table format when uploading binding table state. Normally, the CS will will just consume the binding table pointer commands as pipelined state. When the RS is enabled however, the RS flushes whatever edited surface state entries of our on-chip binding table to the binding table pool before passing the command on to the CS. Note that the the binding table pointer offset is relative to the binding table pool base address when resource streamer instead of the surface state base address. v2: Fix possible buffer overflow when allocating a chunk out of the hw-binding table pool (Ken). v3: Remove extra newline and add missing brace around if-statement (Matt). v4: Fix broken INTEL_DEBUG=shader_time for hw-generated binding tables. Document PRM WaStateBindingTableOverfetch workaround. Cc: kenneth@whitecape.org Cc: mattst88@gmail.com Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>	2015-07-18 16:16:59 +03:00
Abdiel Janulgue	2133980bc7	i965: Implement interface to edit binding table entries Unlike normal software binding tables where the driver has to manually generate and fill a binding table array which are then uploaded to the hardware, the resource streamer instead presents the driver with an option to fill out slots for individual binding table indices. The hardware accumulates the state for these combined edits which it then automatically flushes to a binding table pool when the binding table pointer state command is invoked. v2: Clarify binding table edit bit aligment (Topi). v3: Make comments and function names more clearer (Ken). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>	2015-07-18 16:16:56 +03:00
Abdiel Janulgue	190756482e	i965: Enable hardware-generated binding tables on render path. This patch implements the binding table enable command which is also used to allocate a binding table pool where where hardware-generated binding table entries are flushed into. Each binding table offset in the binding table pool is unique per each shader stage that are enabled within a batch. Also insert the required brw_tracked_state objects to enable hw-generated binding tables in normal render path. v2: - Use MOCS in binding table pool alloc for GEN8 - Fix spurious offset when allocating binding table pool entry and start from zero instead. v3: - Include GEN8 fix for spurious offset above. v4: - Fixup wrong packet length in enable/disable hw-binding table for GEN8 (Ville). - Don't invoke HW-binding table disable command when we dont have resource streamer (Chris). v5: - Reorder the state cache invalidate flush so it happens in-between enabling hw-generated binding tables and the previous sw-binding table GPU state (Chris). v6: - Do the same fix in v5 for gen7_disable_hw_binding_tables(). - Adhere to coding guidelines and make comments more informative. Cc: kenneth@whitecape.org Cc: syrjala@sci.fi Cc: chris@chris-wilson.co.uk Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>	2015-07-18 16:16:54 +03:00
Abdiel Janulgue	090529af18	i965: Enable resource streamer for the batchbuffer Check first if the hardware and kernel supports resource streamer. If this is allowed, tell the kernel to enable the resource streamer enable bit on MI_BATCHBUFFER_START by specifying I915_EXEC_RESOURCE_STREAMER execbuffer flags. v2: - Use new I915_PARAM_HAS_RESOURCE_STREAMER ioctl to check if kernel supports RS (Ken). - Add brw_device_info::has_resource_streamer and toggle it for Haswell, Broadwell, Cherryview, Skylake, and Broxton (Ken). v3: - Update I915_PARAM_HAS_RESOURCE_STREAMER to match updated kernel. v4: - Always inspect the getparam.value (Chris Wilson). v5: - Fold redundant devinfo->has_resource_streamer check in context create into init screen. Cc: kenneth@whitecape.org Cc: chris@chris-wilson.co.uk Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>	2015-07-18 16:16:52 +03:00
Abdiel Janulgue	ccf9598ad7	i965: Define HW-binding table and resource streamer control opcodes v2: Use macros for HW binding table edits (Topi) v3: Add Broadwell support. v4: Make hardware binding table bit definitions even more clearer (Ken) Cc: kenneth@whitecape.org Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>	2015-07-18 16:16:50 +03:00
Eric Anholt	ff7896a398	vc4: Switch to using a separate ioctl for making shaders. This gives the kernel a chance to validate and lock down the data, without having to deal with mmap zapping. With this, GLBenchmark stops on a texture relocations, because we'd recycled a shader BO as another shader and failed to revalidate, since we weren't clearing the cached validation state on mmap faults.	2015-07-17 22:11:56 -07:00
Roland Scheidegger	e42cfe5d03	mesa: fix up some texture error checks In particular, we were incorrectly accepting s3tc (and lots of others) for CompressedTexSubImage3D (but not CompressedTexImage3D) calls with 3d targets. At this time, the only allowed formats for these calls are the bptc ones, since none of the specific extensions allow it (astc hdr would). Also, fix up a bug in _mesa_target_can_be_compressed - 3d target needs to be allowed for bptc formats. Reviewed-by: Brian Paul <brianp@vmware.com>	2015-07-18 02:35:24 +02:00
Eric Anholt	27aa31fab4	vc4: Fix printing of shader-db debug when shader-db isn't turned on.	2015-07-17 12:25:55 -07:00
Eric Anholt	5341349dde	vc4: Add debugging on texture relocation validation failures.	2015-07-17 12:25:55 -07:00
Eric Anholt	be7adc2eca	vc4: Also consider uniform 0 in uniform lowering. The hash table considers key 0 to be the empty key.	2015-07-17 12:25:55 -07:00
Eric Anholt	90dfabc3b5	vc4: Use the pure/const attributes on a bunch of our QPU functions. On a release build, this makes the rest of vc4_qpu_validate.c go away (the compiler didn't know that our qpu helper function calls had no side effects).	2015-07-17 12:25:55 -07:00
Eric Anholt	be1f49bda9	mesa: Detect and provide macros for function attributes pure and const. These are really useful hints to the compiler in the absence of link-time optimization, and I'm going to use them in VC4. I've made the const attribute be ATTRIBUTE_CONST unlike other function attributes, because we have other things in the tree #defining CONST for their own unrelated purposes. v2: Alphabetize. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)	2015-07-17 12:25:54 -07:00
Connor Abbott	bde4c8ec1f	i965/fs: don't make unused payload registers interfere Before, we were setting payload_last_use_ip for unused payload registers to 0, which made them interfere with whatever the first instruction wrote to due to the workaround for SIMD16 uniform arguments. Just use -1 to mean "unused" instead, and then skip setting any interferences for unused payload registers. instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 GAINED: 1 LOST: 0 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>	2015-07-17 10:10:57 -07:00
Connor Abbott	18e73bf7f8	i965/fs: remove special case in setup_payload_interference() regs_read() will handle LINTERP for us since the previous commit. In addition, we were being too conservative, since it will only read 2 registers on SIMD8. instructions in affected programs: 9061 -> 8893 (-1.85%) helped: 10 HURT: 0 GAINED: 0 LOST: 0 All of the changes were due to spills being eliminated, mostly in KSP shaders. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>	2015-07-17 10:10:51 -07:00
Jordan Justen	c4a2217e79	i965/fs: Mark last used ip for all regs read in the payload If a source register in the push constant registers uses more than one register, then we wouldn't update payload_last_use_ip for subsequent registers. Unlike most uniform data pushed into registers, the CS gl_LocalInvocationID data varies per execution channel. Therefore for SIMD16 mode, we have vec16 data in the payload. In this case we then need to mark 2 registers in payload_last_use_ip as last used by the instruction. There's a similar situation for the z and w coordinates of gl_FragCoord for fragment shaders, where it had only happened to work before because of some bogus interferences which the next commit removes. (Connor: added bit about gl_FragCoord to commit message) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Connor Abbott <connor.w.abbott@intel.com>	2015-07-17 10:10:48 -07:00
Connor Abbott	9f344b908a	i965/fs: fix regs_read() for LINTERP The second source always stays within the same SIMD8 register. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>	2015-07-17 10:10:39 -07:00
Connor Abbott	eaf799ddff	nir: add nir_foreach_instr_safe_reverse() Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>	2015-07-17 09:49:53 -07:00
Connor Abbott	8eea091747	nir: add nir_instr_is_first() and nir_instr_is_last() helpers Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>	2015-07-17 09:47:22 -07:00
Jordan Justen	01cdbba341	i965/cs: Use dispatch width of 8 for cs terminate payload setup This prevents an assertion failure in brw_fs_live_variables.cpp, fs_live_variables::setup_one_write: Assertion `var < num_vars' failed. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-07-16 21:37:24 -07:00
Jordan Justen	7e337859ff	i965/cs: Return 1 for regs_read on CS_OPCODE_CS_TERMINATE This prevents an assertion failure in brw_fs_live_variables.cpp, fs_live_variables::setup_one_read: Assertion `var < num_vars' failed. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-07-16 21:37:07 -07:00
Kenneth Graunke	4b17f0d9f5	program: Allow redundant OPTION ARB_fog_* directives. A fragment program from "Pixel Piracy" contains redundant OPTION directives: !!ARBfp1.0 OPTION ARB_precision_hint_fastest; OPTION ARB_fog_exp2; OPTION ARB_precision_hint_fastest; OPTION ARB_fog_exp2; ... We already allow redundant ARB_precision_hint_fastest directives, but disallow the redundant (yet consistent) ARB_fog_exp2 directives, failing to compile the program. The specification seems to contradict itself - the main text says that only one fog application option may be specified, but then backpedals, indicating the intent is to disallow /contradictory/ flags. One of the issues suggests that specifying contradictory ones is stupid, but allowed, and only the last one should take effect. Accepting multiple redundant (but consistent) directives seems harmless, and like a reasonable interpretation of the specification. It also fixes a fragment program found in the wild. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-07-16 20:26:43 -07:00
Ben Widawsky	3a31876600	i965: Push miptree tiling request into flags With the last few patches a way was provided to influence lower layer miptree layout and allocation decisions via flags (replacing bools). For simplicity, I chose not to touch the tiling requests because the change was slightly less mechanical than replacing the bools. The goal is to organize the code so we can continue to add new parameters and tiling types while minimizing risk to the existing code, and not having to constantly add new function parameters. v2: Rebased on Anuj's recent Yf/Ys changes Fix non-msrt MCS allocation (was only happening in gen8 case before) v3: small fix in assertion requested by Chad v4: Use parens to get the order right from v3. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2015-07-16 17:02:35 -07:00
Ben Widawsky	ef42352ff4	Revert "i965: Push miptree tiling request into flags" This reverts commit `51e8d549e1`.	2015-07-16 16:52:08 -07:00
Ben Widawsky	51e8d549e1	i965: Push miptree tiling request into flags With the last few patches a way was provided to influence lower layer miptree layout and allocation decisions via flags (replacing bools). For simplicity, I chose not to touch the tiling requests because the change was slightly less mechanical than replacing the bools. The goal is to organize the code so we can continue to add new parameters and tiling types while minimizing risk to the existing code, and not having to constantly add new function parameters. v2: Rebased on Anuj's recent Yf/Ys changes Fix non-msrt MCS allocation (was only happening in gen8 case before) v3: small fix in assertion requested by Chad Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (v2) Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v2) Reviewed-by: Chad Versace <chad.versace@intel.com> (v2)	2015-07-16 13:28:33 -07:00
Francisco Jerez	4bddd82bf3	i965/fs: Factor out universally broken calculation of the register component size. This in principle simple calculation was being open-coded in a number of places (in a series I haven't yet sent for review there will be a couple more), all of them were subtly broken in one way or another: None of them were handling the HW_REG case correctly as pointed out by Connor, and fs_inst::regs_read() was handling the stride=0 case rather naively. This patch solves both problems and factors out the calculation as a new fs_reg method. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-07-16 18:31:01 +03:00
Francisco Jerez	b00cd6e4a0	i965: Implement nir_op_uadd_carry and _usub_borrow without accumulator. This gets rid of two no16() fall-backs and should allow better scheduling of the generated IR. There are no uses of usubBorrow() or uaddCarry() in shader-db so no changes are expected. However the "arb_gpu_shader5/execution/built-in-functions/fs-usubBorrow" and "arb_gpu_shader5/execution/built-in-functions/fs-uaddCarry" piglit tests go from 40 to 28 instructions. The reason is that the plain ADD instruction can easily be CSE'ed with the original addition, and the b2i negation can easily be propagated into the source modifier of another instruction, so effectively both operations are performed with just one instruction. v2: Rely on carry_to_arith() and borrow_to_arith() to lower these (Ilia Mirkin). Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-07-16 18:29:32 +03:00
Francisco Jerez	3ee2daf23d	i965: Implement b2f and b2i using negation. Booleans are represented as 0/-1 on modern hardware which means we can just negate them to convert them into a numeric type. Negation has the benefit that it can be implemented using a source modifier which can easily be propagated into some other instruction. shader-db results on HSW: total instructions in shared programs: 6349082 -> 6346693 (-0.04%) instructions in affected programs: 40948 -> 38559 (-5.83%) helped: 123 HURT: 1 GAINED: 1 LOST: 0 Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-07-16 18:29:32 +03:00
Marek Olšák	8fba933ca2	gallium: add interface for writable shader buffers Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-07-16 16:52:21 +02:00
Marek Olšák	05a12c53a3	gallium: add interface for writable shader images PIPE_CAPs will be added some other time. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-07-16 16:52:20 +02:00
Marek Olšák	b73bec0ecd	gallium: add new limits for shader buffers and images Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-07-16 16:52:17 +02:00
Marek Olšák	f9f79d29ce	gallium: add BIND flags for R/W buffers and images PIPE_CAPs and TGSI support will be added later. The TGSI support should be straightforward. We only need to split TGSI_FILE_RESOURCE into TGSI_FILE_IMAGE and TGSI_FILE_BUFFER, though duplicating all opcodes shouldn't be necessary. The idea is: * ARB_shader_image_load_store should use set_shader_images. * ARB_shader_storage_buffer_object should use set_shader_buffers(slots 0..M-1) if M shader storage buffers are supported. * ARB_shader_atomic_counters should use set_shader_buffers(slots M..N) if N-M+1 atomic counter buffers are supported. PIPE_CAPs can describe various constraints for early DX11 hardware. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-07-16 16:52:02 +02:00
Marek Olšák	26222932c0	gallium: add PIPE_CAP_MAX_SHADER_PATCH_VARYINGS Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-07-16 16:09:20 +02:00

1 2 3 4 5 ...

64582 commits