mesa/src/intel/compiler/brw_builder.h

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

1068 lines
32 KiB
C
Raw Normal View History

i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/* -*- c++ -*- */
/*
* Copyright © 2010-2015 Intel Corporation
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#pragma once
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
#include "brw_eu.h"
#include "brw_shader.h"
#include "brw_inst.h"
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
static inline brw_reg offset(const brw_reg &, const brw_builder &,
brw: Basic infrastructure to store convergent values as scalars In SIMD16 and SIMD32, storing convergent values in full 16- or 32-channel registers is wasteful. It wastes register space, and in most cases on SIMD32, it wastes instructions. Our register allocator is not clever enough to handle scalar allocations. It's fundamental unit of allocation is SIMD8. Start treating convergent values as SIMD8. Add a tracking bit in brw_reg to specify that a register represents a convergent, scalar value. This has two implications: 1. All channels of the SIMD8 register must contain the same value. In general, this means that writes to the register must be force_writemask_all and exec_size = 8; 2. Reads of this register can (and should) use <0,1,0> stride. SIMD8 instructions that have restrictions on source stride can us <8,8,1>. Values that are vectors (e.g., results of load_uniform or texture operations) will be stored as multiple SIMD8 hardware registers. v2: brw_fs_opt_copy_propagation_defs fix from Ken. Fix for Xe2. v3: Eliminte offset_to_scalar(). Remove mention of vec4 backend in brw_reg.h. Both suggested by Caio. The offset_to_scalar() change necessitates some trickery in the fs_builder offset() function, but I think this is an improvement overall. There is also some rework in find_value_for_offset to account for the possibility that is_scalar sources in LOAD_PAYLOAD might be <8;8,1> or <0;1,0>. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-02-09 17:12:11 -08:00
unsigned);
/**
* Toolbox to assemble an BRW IR program out of individual instructions.
*/
class brw_builder {
public:
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/**
* Construct an brw_builder that inserts instructions
* at the end of \p shader. The optional \p dispatch_width
* gives the execution width to be used instead of the
* shader original dispatch_width.
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
*/
brw_builder(brw_shader *shader,
unsigned dispatch_width = 0) :
shader(shader), block(NULL), cursor(NULL),
_dispatch_width(dispatch_width ? dispatch_width : shader->dispatch_width),
_group(0),
force_writemask_all(false),
annotation()
{
if (shader->cfg && shader->cfg->num_blocks > 0) {
block = shader->cfg->last_block();
cursor = &block->instructions.tail_sentinel;
} else {
cursor = (brw_exec_node *)&shader->instructions.tail_sentinel;
}
}
/**
* Construct an brw_builder that inserts instructions before
* instruction \p inst in the same basic block. The default
* execution controls and debug annotation are initialized from the
* instruction passed as argument.
*/
explicit brw_builder(brw_inst *inst) :
shader(inst->block->cfg->s), block(inst->block), cursor(inst),
_dispatch_width(inst->exec_size),
_group(inst->group),
force_writemask_all(inst->force_writemask_all)
{
#ifndef NDEBUG
annotation.str = inst->annotation;
#else
annotation.str = NULL;
#endif
}
brw_builder
at_start(bblock_t *block) const
{
brw_builder bld = *this;
bld.block = block;
bld.cursor = block->instructions.head_sentinel.next;
return bld;
}
brw_builder
at_end(bblock_t *block) const
{
brw_builder bld = *this;
bld.block = block;
bld.cursor = &block->instructions.tail_sentinel;
return bld;
}
brw_builder
before(brw_inst *ref) const
{
brw_builder bld = *this;
bld.block = ref->block;
bld.cursor = ref;
return bld;
}
brw_builder
after(brw_inst *ref) const
{
brw_builder bld = *this;
bld.block = ref->block;
bld.cursor = ref->next;
return bld;
}
brw_builder
after_block_before_control_flow(bblock_t *block) const
{
brw_builder bld = *this;
bld.block = block;
bld.cursor = block->last_non_control_flow_inst()->next;
return bld;
}
/**
* Construct a builder specifying the default SIMD width and group of
* channel enable signals, inheriting other code generation parameters
* from this.
*
* \p n gives the default SIMD width, \p i gives the slot group used for
* predication and control flow masking in multiples of \p n channels.
*/
brw_builder
group(unsigned n, unsigned i) const
{
brw_builder bld = *this;
if (n <= dispatch_width() && i < dispatch_width() / n) {
bld._group += i * n;
} else {
/* The requested channel group isn't a subset of the channel group
* of this builder, which means that the resulting instructions
* would use (potentially undefined) channel enable signals not
* specified by the parent builder. That's only valid if the
* instruction doesn't have per-channel semantics, in which case
* we should clear off the default group index in order to prevent
* emitting instructions with channel group not aligned to their
* own execution size.
*/
assert(force_writemask_all);
bld._group = 0;
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
}
bld._dispatch_width = n;
return bld;
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/**
* Alias for group() with width equal to eight.
*/
brw_builder
quarter(unsigned i) const
{
return group(8, i);
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/**
* Construct a builder with per-channel control flow execution masking
* disabled if \p b is true. If control flow execution masking is
* already disabled this has no effect.
*/
brw_builder
exec_all(bool b = true) const
{
brw_builder bld = *this;
if (b)
bld.force_writemask_all = true;
return bld;
}
/**
* Construct a builder for SIMD1 operations.
*/
brw_builder
uniform() const
{
return exec_all().group(1, 0);
}
/**
* Construct a builder for SIMD8-as-scalar
*/
brw_builder
scalar_group() const
{
return exec_all().group(8 * reg_unit(shader->devinfo), 0);
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/**
* Construct a builder with the given debug annotation info.
*/
brw_builder
annotate(const char *str) const
{
brw_builder bld = *this;
bld.annotation.str = str;
return bld;
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/**
* Get the SIMD width in use.
*/
unsigned
dispatch_width() const
{
return _dispatch_width;
}
/**
* Get the channel group in use.
*/
unsigned
group() const
{
return _group;
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/**
* Allocate a virtual register of natural vector size (one for this IR)
* and SIMD width. \p n gives the amount of space to allocate in
* dispatch_width units (which is just enough space for one logical
* component in this IR).
*/
brw_reg
vgrf(enum brw_reg_type type, unsigned n = 1) const
{
assert(dispatch_width() <= 32);
if (n > 0)
return brw_allocate_vgrf(*shader, type, n * dispatch_width());
else
return retype(null_reg_ud(), type);
}
brw_reg
vaddr(enum brw_reg_type type, unsigned subnr) const
{
brw_reg addr = brw_address_reg(subnr);
addr.nr = shader->next_address_register_nr++;
return retype(addr, type);
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/**
* Create a null register of floating type.
*/
brw_reg
null_reg_f() const
{
return brw_reg(retype(brw_null_reg(), BRW_TYPE_F));
}
brw_reg
null_reg_df() const
{
return brw_reg(retype(brw_null_reg(), BRW_TYPE_DF));
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/**
* Create a null register of signed integer type.
*/
brw_reg
null_reg_d() const
{
return brw_reg(retype(brw_null_reg(), BRW_TYPE_D));
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/**
* Create a null register of unsigned integer type.
*/
brw_reg
null_reg_ud() const
{
return brw_reg(retype(brw_null_reg(), BRW_TYPE_UD));
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/**
* Insert an instruction into the program.
*/
brw_inst *
emit(const brw_inst &inst) const
{
return emit(new(shader->mem_ctx) brw_inst(inst));
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/**
* Create and insert a nullary control instruction into the program.
*/
brw_inst *
emit(enum opcode opcode) const
{
return emit(brw_inst(opcode, dispatch_width()));
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/**
* Create and insert a nullary instruction into the program.
*/
brw_inst *
emit(enum opcode opcode, const brw_reg &dst) const
{
return emit(brw_inst(opcode, dispatch_width(), dst));
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/**
* Create and insert a unary instruction into the program.
*/
brw_inst *
emit(enum opcode opcode, const brw_reg &dst, const brw_reg &src0) const
{
return emit(brw_inst(opcode, dispatch_width(), dst, src0));
}
/**
* Create and insert a binary instruction into the program.
*/
brw_inst *
emit(enum opcode opcode, const brw_reg &dst, const brw_reg &src0,
const brw_reg &src1) const
{
return emit(brw_inst(opcode, dispatch_width(), dst,
src0, src1));
}
/**
* Create and insert a ternary instruction into the program.
*/
brw_inst *
emit(enum opcode opcode, const brw_reg &dst, const brw_reg &src0,
const brw_reg &src1, const brw_reg &src2) const
{
switch (opcode) {
case BRW_OPCODE_BFE:
case BRW_OPCODE_BFI2:
case BRW_OPCODE_MAD:
case BRW_OPCODE_LRP: {
brw_reg fixed0 = fix_3src_operand(src0);
brw_reg fixed1 = fix_3src_operand(src1);
brw_reg fixed2 = fix_3src_operand(src2);
return emit(brw_inst(opcode, dispatch_width(), dst,
fixed0, fixed1, fixed2));
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
default:
return emit(brw_inst(opcode, dispatch_width(), dst,
src0, src1, src2));
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
}
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/**
* Create and insert an instruction with a variable number of sources
* into the program.
*/
brw_inst *
emit(enum opcode opcode, const brw_reg &dst, const brw_reg srcs[],
unsigned n) const
{
/* Use the emit() methods for specific operand counts to ensure that
* opcode-specific operand fixups occur.
*/
if (n == 3) {
return emit(opcode, dst, srcs[0], srcs[1], srcs[2]);
} else {
return emit(brw_inst(opcode, dispatch_width(), dst, srcs, n));
}
}
/**
* Insert a preallocated instruction into the program.
*/
brw_inst *
emit(brw_inst *inst) const
{
assert(inst->exec_size <= 32);
assert(inst->exec_size == dispatch_width() ||
force_writemask_all);
inst->group = _group;
inst->force_writemask_all = force_writemask_all;
#ifndef NDEBUG
inst->annotation = annotation.str;
#endif
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
if (block)
block->insert_before(inst, cursor);
else
cursor->insert_before(inst);
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
return inst;
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/**
* Select \p src0 if the comparison of both sources with the given
* conditional mod evaluates to true, otherwise select \p src1.
*
* Generally useful to get the minimum or maximum of two values.
*/
brw_inst *
emit_minmax(const brw_reg &dst, const brw_reg &src0,
const brw_reg &src1, brw_conditional_mod mod) const
{
assert(mod == BRW_CONDITIONAL_GE || mod == BRW_CONDITIONAL_L);
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/* In some cases we can't have bytes as operand for src1, so use the
* same type for both operand.
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
*/
return set_condmod(mod, SEL(dst, fix_unsigned_negate(src0),
fix_unsigned_negate(src1)));
}
/**
* Copy any live channel from \p src to the first channel of the result.
*/
brw_reg
emit_uniformize(const brw_reg &src) const
{
/* Trivial: skip unnecessary work and retain IMM */
if (src.file == IMM)
return src;
/* FIXME: We use a vector chan_index and dst to allow constant and
* copy propagration to move result all the way into the consuming
* instruction (typically a surface index or sampler index for a
* send). Once we teach const/copy propagation about scalars we
* should go back to scalar destinations here.
*/
const brw_builder xbld = scalar_group();
const brw_reg chan_index = xbld.vgrf(BRW_TYPE_UD);
/* FIND_LIVE_CHANNEL will only write a single component after
* lowering. Munge size_written here to match the allocated size of
* chan_index.
*/
exec_all().emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index)
->size_written = chan_index.component_size(xbld.dispatch_width());
return BROADCAST(src, component(chan_index, 0));
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
brw_reg
move_to_vgrf(const brw_reg &src, unsigned num_components) const
{
brw_reg *const src_comps = new brw_reg[num_components];
brw: Basic infrastructure to store convergent values as scalars In SIMD16 and SIMD32, storing convergent values in full 16- or 32-channel registers is wasteful. It wastes register space, and in most cases on SIMD32, it wastes instructions. Our register allocator is not clever enough to handle scalar allocations. It's fundamental unit of allocation is SIMD8. Start treating convergent values as SIMD8. Add a tracking bit in brw_reg to specify that a register represents a convergent, scalar value. This has two implications: 1. All channels of the SIMD8 register must contain the same value. In general, this means that writes to the register must be force_writemask_all and exec_size = 8; 2. Reads of this register can (and should) use <0,1,0> stride. SIMD8 instructions that have restrictions on source stride can us <8,8,1>. Values that are vectors (e.g., results of load_uniform or texture operations) will be stored as multiple SIMD8 hardware registers. v2: brw_fs_opt_copy_propagation_defs fix from Ken. Fix for Xe2. v3: Eliminte offset_to_scalar(). Remove mention of vec4 backend in brw_reg.h. Both suggested by Caio. The offset_to_scalar() change necessitates some trickery in the fs_builder offset() function, but I think this is an improvement overall. There is also some rework in find_value_for_offset to account for the possibility that is_scalar sources in LOAD_PAYLOAD might be <8;8,1> or <0;1,0>. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-02-09 17:12:11 -08:00
for (unsigned i = 0; i < num_components; i++)
src_comps[i] = offset(src, *this, i);
const brw_reg dst = vgrf(src.type, num_components);
LOAD_PAYLOAD(dst, src_comps, num_components, 0);
delete[] src_comps;
return brw_reg(dst);
}
brw_inst *
emit_undef_for_dst(const brw_inst *old_inst) const
{
assert(old_inst->dst.file == VGRF);
brw_inst *inst = emit(SHADER_OPCODE_UNDEF,
retype(old_inst->dst, BRW_TYPE_UD));
inst->size_written = old_inst->size_written;
intel/fs: reduce liveness of variables in lowering passes When lowering a single instruction with a destination VGRF to 2 or more, the VGRF is now considered partially written by each generated instruction and that increases its liveness especially in loops. Thus potentially increasing the number of spills/fills due to register allocation. Putting an UNDEF instruction in front of the lowered instructions allows the IR to limit the liveness of the VGRF, reducing register pressure. This has a pretty dramatic effect on spills/fills for RT shaders. Here the stats on Q2RTX shaders on DG2 (wipping out any spills/fills due to register allocation) : Instructions in all programs: 26150 -> 24955 (-4.6%) SENDs in all programs: 1148 -> 1148 (+0.0%) Loops in all programs: 4 -> 4 (+0.0%) Cycles in all programs: 392179 -> 332787 (-15.1%) Spills in all programs: 132 -> 116 (-12.1%) Fills in all programs: 262 -> 154 (-41.2%) Shader-db results on TGL : total instructions in shared programs: 21158140 -> 21158377 (<.01%) instructions in affected programs: 76629 -> 76866 (0.31%) helped: 18 HURT: 20 helped stats (abs) min: 1 max: 60 x̄: 18.89 x̃: 12 helped stats (rel) min: 0.21% max: 3.61% x̄: 1.02% x̃: 0.77% HURT stats (abs) min: 1 max: 79 x̄: 28.85 x̃: 18 HURT stats (rel) min: 0.04% max: 2.81% x̄: 1.13% x̃: 0.79% 95% mean confidence interval for instructions value: -4.82 17.30 95% mean confidence interval for instructions %-change: -0.34% 0.57% Inconclusive result (value mean confidence interval includes 0). total loops in shared programs: 5753 -> 5753 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 798856834 -> 798870688 (<.01%) cycles in affected programs: 6208395 -> 6222249 (0.22%) helped: 22 HURT: 17 helped stats (abs) min: 2 max: 8794 x̄: 1438.18 x̃: 782 helped stats (rel) min: 0.05% max: 2.28% x̄: 0.63% x̃: 0.44% HURT stats (abs) min: 2 max: 19178 x̄: 2676.12 x̃: 1358 HURT stats (rel) min: 0.04% max: 23.49% x̄: 2.25% x̃: 0.71% 95% mean confidence interval for cycles value: -952.19 1662.65 95% mean confidence interval for cycles %-change: -0.64% 1.90% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 4078 -> 4066 (-0.29%) spills in affected programs: 40 -> 28 (-30.00%) helped: 2 HURT: 0 total fills in shared programs: 2856 -> 2832 (-0.84%) fills in affected programs: 127 -> 103 (-18.90%) helped: 2 HURT: 0 total sends in shared programs: 998554 -> 998554 (0.00%) sends in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 0 GAINED: 0 Total CPU time (seconds): 2346.06 -> 2304.80 (-1.76%) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>
2022-09-14 02:40:01 +03:00
return inst;
}
intel/fs: reduce liveness of variables in lowering passes When lowering a single instruction with a destination VGRF to 2 or more, the VGRF is now considered partially written by each generated instruction and that increases its liveness especially in loops. Thus potentially increasing the number of spills/fills due to register allocation. Putting an UNDEF instruction in front of the lowered instructions allows the IR to limit the liveness of the VGRF, reducing register pressure. This has a pretty dramatic effect on spills/fills for RT shaders. Here the stats on Q2RTX shaders on DG2 (wipping out any spills/fills due to register allocation) : Instructions in all programs: 26150 -> 24955 (-4.6%) SENDs in all programs: 1148 -> 1148 (+0.0%) Loops in all programs: 4 -> 4 (+0.0%) Cycles in all programs: 392179 -> 332787 (-15.1%) Spills in all programs: 132 -> 116 (-12.1%) Fills in all programs: 262 -> 154 (-41.2%) Shader-db results on TGL : total instructions in shared programs: 21158140 -> 21158377 (<.01%) instructions in affected programs: 76629 -> 76866 (0.31%) helped: 18 HURT: 20 helped stats (abs) min: 1 max: 60 x̄: 18.89 x̃: 12 helped stats (rel) min: 0.21% max: 3.61% x̄: 1.02% x̃: 0.77% HURT stats (abs) min: 1 max: 79 x̄: 28.85 x̃: 18 HURT stats (rel) min: 0.04% max: 2.81% x̄: 1.13% x̃: 0.79% 95% mean confidence interval for instructions value: -4.82 17.30 95% mean confidence interval for instructions %-change: -0.34% 0.57% Inconclusive result (value mean confidence interval includes 0). total loops in shared programs: 5753 -> 5753 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 798856834 -> 798870688 (<.01%) cycles in affected programs: 6208395 -> 6222249 (0.22%) helped: 22 HURT: 17 helped stats (abs) min: 2 max: 8794 x̄: 1438.18 x̃: 782 helped stats (rel) min: 0.05% max: 2.28% x̄: 0.63% x̃: 0.44% HURT stats (abs) min: 2 max: 19178 x̄: 2676.12 x̃: 1358 HURT stats (rel) min: 0.04% max: 23.49% x̄: 2.25% x̃: 0.71% 95% mean confidence interval for cycles value: -952.19 1662.65 95% mean confidence interval for cycles %-change: -0.64% 1.90% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 4078 -> 4066 (-0.29%) spills in affected programs: 40 -> 28 (-30.00%) helped: 2 HURT: 0 total fills in shared programs: 2856 -> 2832 (-0.84%) fills in affected programs: 127 -> 103 (-18.90%) helped: 2 HURT: 0 total sends in shared programs: 998554 -> 998554 (0.00%) sends in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 0 GAINED: 0 Total CPU time (seconds): 2346.06 -> 2304.80 (-1.76%) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>
2022-09-14 02:40:01 +03:00
/**
* Emit UNDEF for the given register if its data doesn't fully occupy
* the space we allocated.
*/
void
emit_undef_for_partial_reg(const brw_reg &reg) const
{
if (brw_type_size_bytes(reg.type) * dispatch_width() < REG_SIZE)
UNDEF(reg);
}
/**
* Assorted arithmetic ops.
* @{
*/
#define _ALU1(prefix, op) \
brw_inst * \
op(const brw_reg &dst, const brw_reg &src0) const \
{ \
assert(_dispatch_width == 1 || \
(dst.file >= VGRF && dst.stride != 0) || \
(dst.file < VGRF && dst.hstride != 0)); \
return emit(prefix##op, dst, src0); \
} \
brw_reg \
op(const brw_reg &src0, brw_inst **out = NULL) const \
{ \
brw_reg dst = vgrf(src0.type); \
emit_undef_for_partial_reg(dst); \
brw_inst *inst = op(dst, src0); \
if (out) *out = inst; \
return inst->dst; \
}
#define ALU1(op) _ALU1(BRW_OPCODE_, op)
#define VIRT1(op) _ALU1(SHADER_OPCODE_, op)
brw_inst *
alu2(opcode op, const brw_reg &dst, const brw_reg &src0, const brw_reg &src1) const
{
return emit(op, dst, src0, src1);
}
brw_reg
alu2(opcode op, const brw_reg &src0, const brw_reg &src1, brw_inst **out = NULL) const
{
enum brw_reg_type inferred_dst_type =
brw_type_larger_of(src0.type, src1.type);
brw_reg dst = vgrf(inferred_dst_type);
emit_undef_for_partial_reg(dst);
brw_inst *inst = alu2(op, dst, src0, src1);
if (out) *out = inst;
return inst->dst;
}
#define _ALU2(prefix, op) \
brw_inst * \
op(const brw_reg &dst, const brw_reg &src0, const brw_reg &src1) const \
{ \
return alu2(prefix##op, dst, src0, src1); \
} \
brw_reg \
op(const brw_reg &src0, const brw_reg &src1, brw_inst **out = NULL) const \
{ \
return alu2(prefix##op, src0, src1, out); \
}
#define ALU2(op) _ALU2(BRW_OPCODE_, op)
#define VIRT2(op) _ALU2(SHADER_OPCODE_, op)
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
#define ALU2_ACC(op) \
brw_inst * \
op(const brw_reg &dst, const brw_reg &src0, const brw_reg &src1) const \
{ \
brw_inst *inst = emit(BRW_OPCODE_##op, dst, src0, src1); \
inst->writes_accumulator = true; \
return inst; \
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
#define ALU3(op) \
brw_inst * \
op(const brw_reg &dst, const brw_reg &src0, const brw_reg &src1, \
const brw_reg &src2) const \
{ \
return emit(BRW_OPCODE_##op, dst, src0, src1, src2); \
} \
brw_reg \
op(const brw_reg &src0, const brw_reg &src1, const brw_reg &src2, \
brw_inst **out = NULL) const \
{ \
enum brw_reg_type inferred_dst_type = \
brw_type_larger_of(brw_type_larger_of(src0.type, src1.type),\
src2.type); \
brw_inst *inst = op(vgrf(inferred_dst_type), src0, src1, src2); \
if (out) *out = inst; \
return inst->dst; \
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
ALU3(ADD3)
ALU2_ACC(ADDC)
ALU2(AND)
ALU2(ASR)
ALU2(AVG)
ALU3(BFE)
ALU2(BFI1)
ALU3(BFI2)
ALU1(BFREV)
ALU1(CBIT)
ALU2(DP2)
ALU2(DP3)
ALU2(DP4)
ALU2(DPH)
ALU1(FBH)
ALU1(FBL)
ALU1(FRC)
ALU3(DP4A)
ALU2(LINE)
ALU1(LZD)
ALU2(MAC)
ALU2_ACC(MACH)
ALU3(MAD)
ALU1(MOV)
ALU2(MUL)
ALU1(NOT)
ALU2(OR)
ALU2(PLN)
ALU1(RNDD)
ALU1(RNDE)
ALU1(RNDU)
ALU1(RNDZ)
ALU2(ROL)
ALU2(ROR)
ALU2(SEL)
ALU2(SHL)
ALU2(SHR)
ALU2_ACC(SUBB)
ALU2(XOR)
VIRT1(RCP)
VIRT1(RSQ)
VIRT1(SQRT)
VIRT1(EXP2)
VIRT1(LOG2)
VIRT2(POW)
VIRT2(INT_QUOTIENT)
VIRT2(INT_REMAINDER)
VIRT1(SIN)
VIRT1(COS)
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
#undef ALU3
#undef ALU2_ACC
#undef ALU2
#undef VIRT2
#undef _ALU2
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
#undef ALU1
#undef VIRT1
#undef _ALU1
/** @} */
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
brw_inst *
ADD(const brw_reg &dst, const brw_reg &src0, const brw_reg &src1) const
{
return alu2(BRW_OPCODE_ADD, dst, src0, src1);
}
brw_reg
ADD(const brw_reg &src0, const brw_reg &src1, brw_inst **out = NULL) const
{
if (src1.file == IMM && src1.ud == 0 && !out)
return src0;
return alu2(BRW_OPCODE_ADD, src0, src1, out);
}
/**
* CMP: Sets the low bit of the destination channels with the result
* of the comparison, while the upper bits are undefined, and updates
* the flag register with the packed 16 bits of the result.
*/
brw_inst *
CMP(const brw_reg &dst, const brw_reg &src0, const brw_reg &src1,
brw_conditional_mod condition) const
{
/* Take the instruction:
*
* CMP null<d> src0<f> src1<f>
*
* Original gfx4 does type conversion to the destination type
* before comparison, producing garbage results for floating
* point comparisons.
*/
const enum brw_reg_type type =
dst.is_null() ?
src0.type :
brw_type_with_size(src0.type, brw_type_size_bits(dst.type));
return set_condmod(condition,
emit(BRW_OPCODE_CMP, retype(dst, type),
fix_unsigned_negate(src0),
fix_unsigned_negate(src1)));
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/**
* CMPN: Behaves like CMP, but produces true if src1 is NaN.
*/
brw_inst *
CMPN(const brw_reg &dst, const brw_reg &src0, const brw_reg &src1,
brw_conditional_mod condition) const
{
/* Take the instruction:
*
* CMP null<d> src0<f> src1<f>
*
* Original gfx4 does type conversion to the destination type
* before comparison, producing garbage results for floating
* point comparisons.
*/
const enum brw_reg_type type =
dst.is_null() ?
src0.type :
brw_type_with_size(src0.type, brw_type_size_bits(dst.type));
return set_condmod(condition,
emit(BRW_OPCODE_CMPN, retype(dst, type),
fix_unsigned_negate(src0),
fix_unsigned_negate(src1)));
}
/**
* CSEL: dst = src2 <op> 0.0f ? src0 : src1
*/
brw_inst *
CSEL(const brw_reg &dst, const brw_reg &src0, const brw_reg &src1,
const brw_reg &src2, brw_conditional_mod condition) const
{
return set_condmod(condition,
emit(BRW_OPCODE_CSEL,
retype(dst, src2.type),
retype(src0, src2.type),
retype(src1, src2.type),
src2));
}
/**
* Emit a linear interpolation instruction.
*/
brw_inst *
LRP(const brw_reg &dst, const brw_reg &x, const brw_reg &y,
const brw_reg &a) const
{
if (shader->devinfo->ver <= 10) {
/* The LRP instruction actually does op1 * op0 + op2 * (1 - op0), so
* we need to reorder the operands.
*/
return emit(BRW_OPCODE_LRP, dst, a, y, x);
} else {
/* We can't use the LRP instruction. Emit x*(1-a) + y*a. */
const brw_reg y_times_a = vgrf(dst.type);
const brw_reg one_minus_a = vgrf(dst.type);
const brw_reg x_times_one_minus_a = vgrf(dst.type);
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
MUL(y_times_a, y, a);
ADD(one_minus_a, negate(a), brw_imm_f(1.0f));
MUL(x_times_one_minus_a, x, brw_reg(one_minus_a));
return ADD(dst, brw_reg(x_times_one_minus_a), brw_reg(y_times_a));
}
}
/**
* Collect a number of registers in a contiguous range of registers.
*/
brw_inst *
LOAD_PAYLOAD(const brw_reg &dst, const brw_reg *src,
unsigned sources, unsigned header_size) const
{
brw_inst *inst = emit(SHADER_OPCODE_LOAD_PAYLOAD, dst, src, sources);
inst->header_size = header_size;
inst->size_written = header_size * REG_SIZE;
for (unsigned i = header_size; i < sources; i++) {
inst->size_written += dispatch_width() * brw_type_size_bytes(src[i].type) *
dst.stride;
}
return inst;
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
brw_inst *
VEC(const brw_reg &dst, const brw_reg *src, unsigned sources) const
{
return sources == 1 ? MOV(dst, src[0])
: LOAD_PAYLOAD(dst, src, sources, 0);
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
brw_inst *
SYNC(enum tgl_sync_function sync) const
{
return emit(BRW_OPCODE_SYNC, null_reg_ud(), brw_imm_ud(sync));
}
brw_inst *
UNDEF(const brw_reg &dst) const
{
assert(dst.file == VGRF);
assert(dst.offset % REG_SIZE == 0);
brw_inst *inst = emit(SHADER_OPCODE_UNDEF,
retype(dst, BRW_TYPE_UD));
inst->size_written = shader->alloc.sizes[dst.nr] * REG_SIZE - dst.offset;
return inst;
}
brw_inst *
DPAS(const brw_reg &dst, const brw_reg &src0, const brw_reg &src1, const brw_reg &src2,
unsigned sdepth, unsigned rcount) const
{
assert(_dispatch_width == 8 * reg_unit(shader->devinfo));
assert(sdepth == 8);
assert(rcount == 1 || rcount == 2 || rcount == 4 || rcount == 8);
brw_inst *inst = emit(BRW_OPCODE_DPAS, dst, src0, src1, src2);
inst->sdepth = sdepth;
inst->rcount = rcount;
unsigned type_size = brw_type_size_bytes(dst.type);
assert(type_size == 4 || type_size == 2);
inst->size_written = rcount * reg_unit(shader->devinfo) * 8 * type_size;
return inst;
}
void
VARYING_PULL_CONSTANT_LOAD(const brw_reg &dst,
const brw_reg &surface,
const brw_reg &surface_handle,
const brw_reg &varying_offset,
uint32_t const_offset,
uint8_t alignment,
unsigned components) const
{
assert(components <= 4);
/* We have our constant surface use a pitch of 4 bytes, so our index can
* be any component of a vector, and then we load 4 contiguous
* components starting from that. TODO: Support loading fewer than 4.
*/
brw_reg total_offset = ADD(varying_offset, brw_imm_ud(const_offset));
/* The pull load message will load a vec4 (16 bytes). If we are loading
* a double this means we are only loading 2 elements worth of data.
* We also want to use a 32-bit data type for the dst of the load operation
* so other parts of the driver don't get confused about the size of the
* result.
*/
brw_reg vec4_result = vgrf(BRW_TYPE_F, 4);
brw_reg srcs[PULL_VARYING_CONSTANT_SRCS];
srcs[PULL_VARYING_CONSTANT_SRC_SURFACE] = surface;
srcs[PULL_VARYING_CONSTANT_SRC_SURFACE_HANDLE] = surface_handle;
srcs[PULL_VARYING_CONSTANT_SRC_OFFSET] = total_offset;
srcs[PULL_VARYING_CONSTANT_SRC_ALIGNMENT] = brw_imm_ud(alignment);
brw_inst *inst = emit(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_LOGICAL,
vec4_result, srcs, PULL_VARYING_CONSTANT_SRCS);
inst->size_written = 4 * vec4_result.component_size(inst->exec_size);
shuffle_from_32bit_read(dst, vec4_result, 0, components);
}
brw_reg
LOAD_SUBGROUP_INVOCATION() const
{
brw_reg reg = vgrf(shader->dispatch_width < 16 ? BRW_TYPE_UD : BRW_TYPE_UW);
exec_all().emit(SHADER_OPCODE_LOAD_SUBGROUP_INVOCATION, reg);
return reg;
}
brw_reg
BROADCAST(brw_reg value, brw_reg index) const
{
const brw_builder xbld = scalar_group();
const brw_reg dst = xbld.vgrf(value.type);
assert(is_uniform(index));
intel/brw: Use CSE for LOAD_SUBGROUP_INVOCATION Instead of emitting a single one at the top, and making reference to it, emit the virtual instruction as needed and let CSE do its job. Since load_subgroup_invocation now can appear not at the start of the shader, use UNDEF in all cases to ensure that the liveness of the destination doesn't extend to the first partial write done here (it was being used only for SIMD > 8 before). Note this option was considered in the past 6132992cdb858268af0e985727d80e4140be389c but at the time dismissed. The difference now is that the lowering of the virtual instruction happens earlier than the scheduling. The motivation for this change is to allow passes other than the NIR conversion to use this value. The alternative of storing a `brw_reg` in the shader (instead of NIR state) gets complicated by passes like compact_vgrfs, that move VGRFs around (and update the instructions). This and maybe other passes would have to care about the brw_reg. Fossil-db numbers, TGL ``` *** Shaders only in 'after' results are ignored: steam-native/shadow_of_the_tomb_raider/c683ea5067ee157d/fs.32/0, steam-native/shadow_of_the_tomb_raider/f4df450c3cef40b4/fs.32/0, steam-native/shadow_of_the_tomb_raider/94b708fb8e3d9597/fs.32/0, steam-native/shadow_of_the_tomb_raider/19d44c328edabd30/fs.32/0, steam-native/shadow_of_the_tomb_raider/8a7dcbd5a74a19bf/fs.32/0, and 366 more from 4 apps: steam-dxvk/alan_wake, steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider *** Shaders only in 'before' results are ignored: steam-dxvk/octopath_traveler/aaa3d10acb726906/fs.32/0, steam-dxvk/batman_arkham_origins/e6872ae23569c35f/fs.32/0, steam-dxvk/octopath_traveler/fd33a99fa5c271a8/fs.32/0, steam-dxvk/octopath_traveler/9a077cdc16f24520/fs.32/0, steam-dxvk/batman_arkham_city_goty/fac7b438ad52f622/fs.32/0, and 12 more from 4 apps: steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-dxvk/octopath_traveler, steam-native/shadow_of_the_tomb_raider Totals: Instrs: 149752381 -> 149751337 (-0.00%); split: -0.00%, +0.00% Cycle count: 11553609349 -> 11549970294 (-0.03%); split: -0.06%, +0.03% Spill count: 42763 -> 42764 (+0.00%); split: -0.01%, +0.01% Fill count: 75650 -> 75651 (+0.00%); split: -0.00%, +0.01% Max live registers: 31725096 -> 31671792 (-0.17%) Max dispatch width: 5546008 -> 5551672 (+0.10%); split: +0.11%, -0.00% Totals from 52574 (8.34% of 630441) affected shaders: Instrs: 9535159 -> 9534115 (-0.01%); split: -0.03%, +0.02% Cycle count: 1006627109 -> 1002988054 (-0.36%); split: -0.65%, +0.29% Spill count: 11588 -> 11589 (+0.01%); split: -0.03%, +0.03% Fill count: 21057 -> 21058 (+0.00%); split: -0.01%, +0.02% Max live registers: 1992493 -> 1939189 (-2.68%) Max dispatch width: 559696 -> 565360 (+1.01%); split: +1.06%, -0.05% ``` and DG2 ``` *** Shaders only in 'after' results are ignored: steam-native/shadow_of_the_tomb_raider/1f95a9d3db21df85/fs.32/0, steam-native/shadow_of_the_tomb_raider/56b87c4a46613a2a/fs.32/0, steam-native/shadow_of_the_tomb_raider/a74b4137f85dbbd3/fs.32/0, steam-native/shadow_of_the_tomb_raider/e07e38d3f48e8402/fs.32/0, steam-native/shadow_of_the_tomb_raider/206336789c48996c/fs.32/0, and 268 more from 4 apps: steam-dxvk/alan_wake, steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider *** Shaders only in 'before' results are ignored: steam-native/shadow_of_the_tomb_raider/0420d7c3a2ea99ec/fs.32/0, steam-native/shadow_of_the_tomb_raider/2ff39f8bf7d24abb/fs.32/0, steam-native/shadow_of_the_tomb_raider/92d7be2824bd9659/fs.32/0, steam-native/shadow_of_the_tomb_raider/f09ca6d2ecf18015/fs.32/0, steam-native/shadow_of_the_tomb_raider/490f8ffd59e52949/fs.32/0, and 205 more from 3 apps: steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider Totals: Instrs: 151597619 -> 151599914 (+0.00%); split: -0.00%, +0.00% Subgroup size: 7699776 -> 7699784 (+0.00%) Cycle count: 12738501989 -> 12739841170 (+0.01%); split: -0.01%, +0.02% Spill count: 61283 -> 61274 (-0.01%) Fill count: 119886 -> 119849 (-0.03%) Max live registers: 31810432 -> 31758920 (-0.16%) Max dispatch width: 5540128 -> 5541136 (+0.02%); split: +0.08%, -0.06% Totals from 49286 (7.81% of 631231) affected shaders: Instrs: 8607753 -> 8610048 (+0.03%); split: -0.01%, +0.04% Subgroup size: 857752 -> 857760 (+0.00%) Cycle count: 305939495 -> 307278676 (+0.44%); split: -0.28%, +0.72% Spill count: 6339 -> 6330 (-0.14%) Fill count: 12571 -> 12534 (-0.29%) Max live registers: 1788346 -> 1736834 (-2.88%) Max dispatch width: 510920 -> 511928 (+0.20%); split: +0.85%, -0.66% ``` Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30489>
2024-07-31 22:46:20 -07:00
/* A broadcast will always be at the full dispatch width even if the
* use of the broadcast result is smaller. If the source is_scalar,
* it may be allocated at less than the full dispatch width (e.g.,
* allocated at SIMD8 with SIMD32 dispatch). The input may or may
* not be stride=0. If it is not, the generated broadcast
*
* broadcast(32) dst, value<1>, index<0>
*
* is invalid because it may read out of bounds from value.
*
* To account for this, modify the stride of an is_scalar input to be
* zero.
*/
if (value.is_scalar)
value = component(value, 0);
/* Ensure that the source of a broadcast is always register aligned.
* See brw_broadcast() non-scalar case for more details.
*/
if (reg_offset(value) % (REG_SIZE * reg_unit(shader->devinfo)) != 0)
value = MOV(value);
/* BROADCAST will only write a single component after lowering. Munge
* size_written here to match the allocated size of dst.
*/
exec_all().emit(SHADER_OPCODE_BROADCAST, dst, value, index)
->size_written = dst.component_size(xbld.dispatch_width());
brw/build: Use SIMD8 temporaries in emit_uniformize The fossil-db results are very different from v1. This is now mostly helpful on older platforms. v2: When optimizing BROADCAST or FIND_LIVE_CHANNEL to a simple MOV, adjust the exec_size to match the size allocated for the destination register. Fixes EU validation failures in some piglit OpenCL tests (e.g., atomic_add-global-return.cl). v3: Use component_size() in emit_uniformize and BROADCAST to properly account for UQ vs UD destination. This doesn't matter for emit_uniformize because the type is always UD, but it is technically more correct. v4: Update trace checksums. Now amly expects the same checksum as several other platforms. v5: Use xbld.dispatch_width() in the builder for when scalar_group() eventually becomes SIMD1. Suggested by Lionel. shader-db: Lunar Lake, Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown) total instructions in shared programs: 18091701 -> 18091586 (<.01%) instructions in affected programs: 29616 -> 29501 (-0.39%) helped: 28 / HURT: 18 total cycles in shared programs: 919250494 -> 919123828 (-0.01%) cycles in affected programs: 12201102 -> 12074436 (-1.04%) helped: 124 / HURT: 108 LOST: 0 GAINED: 1 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20480808 -> 20480624 (<.01%) instructions in affected programs: 58465 -> 58281 (-0.31%) helped: 61 / HURT: 20 total cycles in shared programs: 874860168 -> 874960312 (0.01%) cycles in affected programs: 18240986 -> 18341130 (0.55%) helped: 113 / HURT: 158 total spills in shared programs: 4557 -> 4555 (-0.04%) spills in affected programs: 93 -> 91 (-2.15%) helped: 1 / HURT: 0 total fills in shared programs: 5247 -> 5243 (-0.08%) fills in affected programs: 224 -> 220 (-1.79%) helped: 1 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 220486064 -> 220486959 (+0.00%); split: -0.00%, +0.00% Subgroup size: 14102592 -> 14102624 (+0.00%) Cycle count: 31602733838 -> 31604733270 (+0.01%); split: -0.01%, +0.02% Max live registers: 65371025 -> 65355084 (-0.02%) Totals from 12130 (1.73% of 702392) affected shaders: Instrs: 5162700 -> 5163595 (+0.02%); split: -0.06%, +0.08% Subgroup size: 388128 -> 388160 (+0.01%) Cycle count: 751721956 -> 753721388 (+0.27%); split: -0.54%, +0.81% Max live registers: 1538550 -> 1522609 (-1.04%) Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 241601142 -> 241599114 (-0.00%); split: -0.00%, +0.00% Subgroup size: 9631168 -> 9631216 (+0.00%) Cycle count: 25101781573 -> 25097909570 (-0.02%); split: -0.03%, +0.01% Max live registers: 41540611 -> 41514296 (-0.06%) Max dispatch width: 6993456 -> 7000928 (+0.11%); split: +0.15%, -0.05% Totals from 16852 (2.11% of 796880) affected shaders: Instrs: 6303937 -> 6301909 (-0.03%); split: -0.11%, +0.07% Subgroup size: 323592 -> 323640 (+0.01%) Cycle count: 625455880 -> 621583877 (-0.62%); split: -1.20%, +0.58% Max live registers: 1072491 -> 1046176 (-2.45%) Max dispatch width: 76672 -> 84144 (+9.75%); split: +14.04%, -4.30% Tiger Lake Totals: Instrs: 235190395 -> 235193286 (+0.00%); split: -0.00%, +0.00% Cycle count: 23130855720 -> 23128936334 (-0.01%); split: -0.02%, +0.01% Max live registers: 41644106 -> 41620052 (-0.06%) Max dispatch width: 6959160 -> 6981512 (+0.32%); split: +0.34%, -0.02% Totals from 15102 (1.90% of 793371) affected shaders: Instrs: 5771042 -> 5773933 (+0.05%); split: -0.06%, +0.11% Cycle count: 371062226 -> 369142840 (-0.52%); split: -1.04%, +0.52% Max live registers: 989858 -> 965804 (-2.43%) Max dispatch width: 61344 -> 83696 (+36.44%); split: +38.42%, -1.98% Ice Lake and Skylake had similar results. (Ice Lake shown) Totals: Instrs: 236063150 -> 236063242 (+0.00%); split: -0.00%, +0.00% Cycle count: 24516187174 -> 24516027518 (-0.00%); split: -0.00%, +0.00% Spill count: 567071 -> 567049 (-0.00%) Fill count: 701323 -> 701273 (-0.01%) Max live registers: 41914047 -> 41913281 (-0.00%) Max dispatch width: 7042608 -> 7042736 (+0.00%); split: +0.00%, -0.00% Totals from 3904 (0.49% of 798473) affected shaders: Instrs: 2809690 -> 2809782 (+0.00%); split: -0.02%, +0.03% Cycle count: 182114259 -> 181954603 (-0.09%); split: -0.34%, +0.25% Spill count: 1696 -> 1674 (-1.30%) Fill count: 2523 -> 2473 (-1.98%) Max live registers: 341695 -> 340929 (-0.22%) Max dispatch width: 32752 -> 32880 (+0.39%); split: +0.44%, -0.05% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097>
2024-10-15 15:51:22 -07:00
return component(dst, 0);
}
brw_reg
LOAD_REG(const brw_reg &src0, brw_inst **out = NULL) const
{
/* LOAD_REG is a raw, bulk copy of one VGRF to another. The type is
* irrelevant. The pass that inserts LOAD_REG to encourage results to be
* defs will force all types to be integer types. Forcing the type to
* always be integer here helps with uniformity, and it will also help
* implement unit tests that want to compare two shaders for equality.
*/
brw_reg_type t = brw_type_with_size(BRW_TYPE_UD,
brw_type_size_bits(src0.type));
brw_reg dst = retype(brw_allocate_vgrf_units(*shader,
shader->alloc.sizes[src0.nr]),
t);
assert(src0.file == VGRF);
assert(shader->alloc.sizes[dst.nr] == shader->alloc.sizes[src0.nr]);
brw_inst *inst = emit(SHADER_OPCODE_LOAD_REG, dst, retype(src0, t));
inst->size_written = REG_SIZE * shader->alloc.sizes[src0.nr];
assert(shader->alloc.sizes[inst->dst.nr] * REG_SIZE == inst->size_written);
assert(!inst->is_partial_write());
if (out) *out = inst;
return retype(inst->dst, src0.type);
}
brw_shader *shader;
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
brw_inst *BREAK() const { return emit(BRW_OPCODE_BREAK); }
brw_inst *ELSE() const { return emit(BRW_OPCODE_ELSE); }
brw_inst *ENDIF() const { return emit(BRW_OPCODE_ENDIF); }
brw_inst *NOP() const { return emit(BRW_OPCODE_NOP); }
brw_inst *CONTINUE() const { return emit(BRW_OPCODE_CONTINUE); }
brw_inst *
IF(brw_predicate predicate = BRW_PREDICATE_NORMAL) const
{
return set_predicate(predicate, emit(BRW_OPCODE_IF));
}
brw_inst *
WHILE(brw_predicate predicate = BRW_PREDICATE_NONE) const
{
return set_predicate(predicate, emit(BRW_OPCODE_WHILE));
}
void
DO() const
{
intel/brw: Always have a (non-DO) block after a DO in the CFG Make the "block after DO" more stable so that adding instructions after a DO doesn't require repairing the CFG. Use a new SHADER_OPCODE_FLOW instruction that is a placeholder representing "go to the next block" and disappears at code generation. For some context, there are a few facts about how CFG currently works - Blocks are assumed to not be empty; - DO is always by itself in a block, i.e. starts and ends a block; - There are no empty blocks; - Predicated WHILE and CONTINUE will link to the "block after DO"; - When nesting loops, it is possible that the "block after DO" is another "DO". Reasons and further explanations for those are in the brw_cfg.c comments. What makes this new change useful is that a pass might want to add instructions between two DO instructions. When that happens, a new block must be created and any predicated WHILE and CONTINUE must be repaired. So, instead of requiring a repair (which has proven to be tricky in the past), this change adds a block that can be "virtually" empty but allow instructions to be added without further changes. One alternative design would be allowing empty blocks, that would be a deeper change since the blocks are currently assumed to be not empty in various places. We'll save that for when other changes are made to the CFG. The problem described happens in brw_opt_combine_constants, and a different patch will clean that up. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33536>
2025-02-13 18:38:00 -08:00
emit(BRW_OPCODE_DO);
/* Ensure that there'll always be a block after DO to add
* instructions and serve as sucessor for predicated WHILE
* and CONTINUE.
*
* See more details in brw_cfg::validate().
*/
emit(SHADER_OPCODE_FLOW);
}
bool has_writemask_all() const {
return force_writemask_all;
}
brw: use transpose unspill messages when possible This simplifies the unspill messages quite a bit. A/B testing on DG2 : BlackOps3 : +0.96% TotalWarPharaoh: +0.31% DG2 shader changes : Assassin's Creed Valhalla: Totals from 19 (0.89% of 2131) affected shaders: Instrs: 70542 -> 64369 (-8.75%) Cycle count: 18810945 -> 18560169 (-1.33%); split: -1.40%, +0.06% Black Ops 3: Totals from 55 (3.41% of 1612) affected shaders: Instrs: 389549 -> 350646 (-9.99%) Cycle count: 344168275 -> 340652311 (-1.02%); split: -1.17%, +0.15% Control: Totals from 1 (0.11% of 878) affected shaders: Instrs: 3409 -> 3212 (-5.78%) Cycle count: 255991 -> 250411 (-2.18%) Cyberpunk 2077: Totals from 1 (0.08% of 1264) affected shaders: Instrs: 2363 -> 2337 (-1.10%) Cycle count: 69283 -> 69186 (-0.14%) Fallout 4: Totals from 1 (0.06% of 1601) affected shaders: Instrs: 27946 -> 20056 (-28.23%) Cycle count: 2391398 -> 2153658 (-9.94%) Fortnite: Totals from 273 (3.65% of 7470) affected shaders: Instrs: 634377 -> 601519 (-5.18%) Cycle count: 31870433 -> 31624089 (-0.77%); split: -0.78%, +0.01% Hogwarts Legacy: Totals from 50 (3.02% of 1656) affected shaders: Instrs: 110455 -> 103339 (-6.44%) Cycle count: 6613728 -> 6530832 (-1.25%); split: -1.28%, +0.03% Metro Exodus: Totals from 70 (0.16% of 43076) affected shaders: Instrs: 253847 -> 245321 (-3.36%) Cycle count: 13269473 -> 13209131 (-0.45%) Spill count: 1111 -> 1108 (-0.27%) Fill count: 2868 -> 2865 (-0.10%) Red Dead Redemption 2: Totals from 139 (2.38% of 5847) affected shaders: Instrs: 496551 -> 450180 (-9.34%) Cycle count: 43233944 -> 40947386 (-5.29%); split: -5.33%, +0.04% Spill count: 6322 -> 6326 (+0.06%) Fill count: 15558 -> 15568 (+0.06%) Rise Of The Tomb Raider: Totals from 1 (0.56% of 178) affected shaders: Instrs: 1682 -> 1437 (-14.57%) Cycle count: 603670 -> 586766 (-2.80%) Spiderman Remastered: Totals from 820 (11.77% of 6965) affected shaders: Instrs: 4622877 -> 3984893 (-13.80%) Cycle count: 235094963186 -> 234483925430 (-0.26%); split: -0.42%, +0.16% Spill count: 73414 -> 73581 (+0.23%); split: -0.02%, +0.25% Fill count: 215090 -> 215627 (+0.25%); split: -0.02%, +0.27% Scratch Memory Size: 3520512 -> 3528704 (+0.23%); split: -0.12%, +0.35% Some of stats show spilling changes which is telling of how our spill code is not adequate. Some of the spilled values are probably being respilled which shouldn't be the case. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32110>
2024-11-13 11:26:53 +02:00
private:
/**
* Workaround for negation of UD registers. See comment in
* brw_generator::generate_code() for more details.
*/
brw_reg
fix_unsigned_negate(const brw_reg &src) const
{
if (src.type == BRW_TYPE_UD &&
src.negate) {
brw_reg temp = vgrf(BRW_TYPE_UD);
MOV(temp, src);
return brw_reg(temp);
} else {
return src;
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
}
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/**
* Workaround for source register modes not supported by the ternary
* instruction encoding.
*/
brw_reg
fix_3src_operand(const brw_reg &src) const
{
switch (src.file) {
case FIXED_GRF:
/* FINISHME: Could handle scalar region, other stride=1 regions */
if (src.vstride != BRW_VERTICAL_STRIDE_8 ||
src.width != BRW_WIDTH_8 ||
src.hstride != BRW_HORIZONTAL_STRIDE_1)
break;
FALLTHROUGH;
case ATTR:
case VGRF:
case UNIFORM:
case IMM:
return src;
default:
break;
}
brw_reg expanded = vgrf(src.type);
MOV(expanded, src);
return expanded;
}
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
void shuffle_from_32bit_read(const brw_reg &dst,
const brw_reg &src,
uint32_t first_component,
uint32_t components) const;
bblock_t *block;
brw_exec_node *cursor;
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
unsigned _dispatch_width;
unsigned _group;
bool force_writemask_all;
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
/** Debug annotation info. */
struct {
const char *str;
} annotation;
};
i965/fs: Introduce FS IR builder. The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
brw: Basic infrastructure to store convergent values as scalars In SIMD16 and SIMD32, storing convergent values in full 16- or 32-channel registers is wasteful. It wastes register space, and in most cases on SIMD32, it wastes instructions. Our register allocator is not clever enough to handle scalar allocations. It's fundamental unit of allocation is SIMD8. Start treating convergent values as SIMD8. Add a tracking bit in brw_reg to specify that a register represents a convergent, scalar value. This has two implications: 1. All channels of the SIMD8 register must contain the same value. In general, this means that writes to the register must be force_writemask_all and exec_size = 8; 2. Reads of this register can (and should) use <0,1,0> stride. SIMD8 instructions that have restrictions on source stride can us <8,8,1>. Values that are vectors (e.g., results of load_uniform or texture operations) will be stored as multiple SIMD8 hardware registers. v2: brw_fs_opt_copy_propagation_defs fix from Ken. Fix for Xe2. v3: Eliminte offset_to_scalar(). Remove mention of vec4 backend in brw_reg.h. Both suggested by Caio. The offset_to_scalar() change necessitates some trickery in the fs_builder offset() function, but I think this is an improvement overall. There is also some rework in find_value_for_offset to account for the possibility that is_scalar sources in LOAD_PAYLOAD might be <8;8,1> or <0;1,0>. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-02-09 17:12:11 -08:00
/**
* Offset by a number of components into a VGRF
*
* It is assumed that the VGRF represents a vector (e.g., returned by
* load_uniform or a texture operation). Convergent and divergent values are
* stored differently, so care must be taken to offset properly.
*/
static inline brw_reg
offset(const brw_reg &reg, const brw_builder &bld, unsigned delta)
{
brw: Basic infrastructure to store convergent values as scalars In SIMD16 and SIMD32, storing convergent values in full 16- or 32-channel registers is wasteful. It wastes register space, and in most cases on SIMD32, it wastes instructions. Our register allocator is not clever enough to handle scalar allocations. It's fundamental unit of allocation is SIMD8. Start treating convergent values as SIMD8. Add a tracking bit in brw_reg to specify that a register represents a convergent, scalar value. This has two implications: 1. All channels of the SIMD8 register must contain the same value. In general, this means that writes to the register must be force_writemask_all and exec_size = 8; 2. Reads of this register can (and should) use <0,1,0> stride. SIMD8 instructions that have restrictions on source stride can us <8,8,1>. Values that are vectors (e.g., results of load_uniform or texture operations) will be stored as multiple SIMD8 hardware registers. v2: brw_fs_opt_copy_propagation_defs fix from Ken. Fix for Xe2. v3: Eliminte offset_to_scalar(). Remove mention of vec4 backend in brw_reg.h. Both suggested by Caio. The offset_to_scalar() change necessitates some trickery in the fs_builder offset() function, but I think this is an improvement overall. There is also some rework in find_value_for_offset to account for the possibility that is_scalar sources in LOAD_PAYLOAD might be <8;8,1> or <0;1,0>. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-02-09 17:12:11 -08:00
/* If the value is convergent (stored as one or more SIMD8), offset using
* SIMD8 and select component 0.
*/
if (reg.is_scalar) {
const unsigned allocation_width = 8 * reg_unit(bld.shader->devinfo);
brw_reg offset_reg = offset(reg, allocation_width, delta);
/* If the dispatch width is larger than the allocation width, that
* implies that the register can only be used as a source. Otherwise the
* instruction would write past the allocation size of the register.
*/
if (bld.dispatch_width() > allocation_width)
return component(offset_reg, 0);
else
return offset_reg;
}
/* Offset to the component assuming the value was allocated in
* dispatch_width units.
*/
return offset(reg, bld.dispatch_width(), delta);
}
brw_reg brw_sample_mask_reg(const brw_builder &bld);
void brw_emit_predicate_on_sample_mask(const brw_builder &bld, brw_inst *inst);
brw_reg
brw_fetch_payload_reg(const brw_builder &bld, uint8_t regs[2],
brw_reg_type type = BRW_TYPE_F,
unsigned n = 1);
brw_reg
brw_fetch_barycentric_reg(const brw_builder &bld, uint8_t regs[2]);
void
brw_check_dynamic_msaa_flag(const brw_builder &bld,
const struct brw_wm_prog_data *wm_prog_data,
enum intel_msaa_flags flag);