i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
/* -*- c++ -*- */
|
|
|
|
|
/*
|
|
|
|
|
* Copyright © 2010-2015 Intel Corporation
|
|
|
|
|
*
|
|
|
|
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
|
|
|
|
* copy of this software and associated documentation files (the "Software"),
|
|
|
|
|
* to deal in the Software without restriction, including without limitation
|
|
|
|
|
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
|
|
|
|
* and/or sell copies of the Software, and to permit persons to whom the
|
|
|
|
|
* Software is furnished to do so, subject to the following conditions:
|
|
|
|
|
*
|
|
|
|
|
* The above copyright notice and this permission notice (including the next
|
|
|
|
|
* paragraph) shall be included in all copies or substantial portions of the
|
|
|
|
|
* Software.
|
|
|
|
|
*
|
|
|
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
|
|
|
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
|
|
|
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
|
|
|
|
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
|
|
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
|
|
|
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
|
|
|
|
* IN THE SOFTWARE.
|
|
|
|
|
*/
|
|
|
|
|
|
2024-12-06 14:25:29 -08:00
|
|
|
#pragma once
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2022-07-21 11:38:03 -07:00
|
|
|
#include "brw_eu.h"
|
2025-02-05 14:25:15 -08:00
|
|
|
#include "brw_shader.h"
|
2024-12-06 13:05:43 -08:00
|
|
|
#include "brw_inst.h"
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
static inline brw_reg offset(const brw_reg &, const brw_builder &,
|
brw: Basic infrastructure to store convergent values as scalars
In SIMD16 and SIMD32, storing convergent values in full 16- or
32-channel registers is wasteful. It wastes register space, and in most
cases on SIMD32, it wastes instructions. Our register allocator is not
clever enough to handle scalar allocations. It's fundamental unit of
allocation is SIMD8. Start treating convergent values as SIMD8.
Add a tracking bit in brw_reg to specify that a register represents a
convergent, scalar value. This has two implications:
1. All channels of the SIMD8 register must contain the same value. In
general, this means that writes to the register must be
force_writemask_all and exec_size = 8;
2. Reads of this register can (and should) use <0,1,0> stride. SIMD8
instructions that have restrictions on source stride can us <8,8,1>.
Values that are vectors (e.g., results of load_uniform or texture
operations) will be stored as multiple SIMD8 hardware registers.
v2: brw_fs_opt_copy_propagation_defs fix from Ken. Fix for Xe2.
v3: Eliminte offset_to_scalar(). Remove mention of vec4 backend in
brw_reg.h. Both suggested by Caio. The offset_to_scalar() change
necessitates some trickery in the fs_builder offset() function, but I
think this is an improvement overall. There is also some rework in
find_value_for_offset to account for the possibility that is_scalar
sources in LOAD_PAYLOAD might be <8;8,1> or <0;1,0>.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-02-09 17:12:11 -08:00
|
|
|
unsigned);
|
|
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Toolbox to assemble an BRW IR program out of individual instructions.
|
|
|
|
|
*/
|
|
|
|
|
class brw_builder {
|
|
|
|
|
public:
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
/**
|
2025-02-27 22:56:15 -08:00
|
|
|
* Construct an brw_builder that inserts instructions
|
2025-04-02 16:12:45 -07:00
|
|
|
* at the end of \p shader. The optional \p dispatch_width
|
|
|
|
|
* gives the execution width to be used instead of the
|
|
|
|
|
* shader original dispatch_width.
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
*/
|
2024-12-07 10:25:45 -08:00
|
|
|
brw_builder(brw_shader *shader,
|
2025-04-02 16:12:45 -07:00
|
|
|
unsigned dispatch_width = 0) :
|
2024-12-29 16:06:27 -08:00
|
|
|
shader(shader), block(NULL), cursor(NULL),
|
2025-04-02 16:12:45 -07:00
|
|
|
_dispatch_width(dispatch_width ? dispatch_width : shader->dispatch_width),
|
2024-12-29 16:06:27 -08:00
|
|
|
_group(0),
|
|
|
|
|
force_writemask_all(false),
|
|
|
|
|
annotation()
|
|
|
|
|
{
|
2025-04-02 16:12:45 -07:00
|
|
|
if (shader->cfg && shader->cfg->num_blocks > 0) {
|
|
|
|
|
block = shader->cfg->last_block();
|
|
|
|
|
cursor = &block->instructions.tail_sentinel;
|
|
|
|
|
} else {
|
2025-07-28 16:07:44 -04:00
|
|
|
cursor = (brw_exec_node *)&shader->instructions.tail_sentinel;
|
2025-04-02 16:12:45 -07:00
|
|
|
}
|
2025-02-27 22:48:36 -08:00
|
|
|
}
|
2023-11-21 10:12:09 -08:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
2025-02-27 22:04:03 -08:00
|
|
|
* Construct an brw_builder that inserts instructions before
|
|
|
|
|
* instruction \p inst in the same basic block. The default
|
2024-12-29 16:06:27 -08:00
|
|
|
* execution controls and debug annotation are initialized from the
|
|
|
|
|
* instruction passed as argument.
|
|
|
|
|
*/
|
2025-02-27 22:04:03 -08:00
|
|
|
explicit brw_builder(brw_inst *inst) :
|
|
|
|
|
shader(inst->block->cfg->s), block(inst->block), cursor(inst),
|
2024-12-29 16:06:27 -08:00
|
|
|
_dispatch_width(inst->exec_size),
|
|
|
|
|
_group(inst->group),
|
|
|
|
|
force_writemask_all(inst->force_writemask_all)
|
|
|
|
|
{
|
2024-08-23 10:46:13 -07:00
|
|
|
#ifndef NDEBUG
|
2024-12-29 16:06:27 -08:00
|
|
|
annotation.str = inst->annotation;
|
2024-08-23 10:46:13 -07:00
|
|
|
#else
|
2024-12-29 16:06:27 -08:00
|
|
|
annotation.str = NULL;
|
2024-08-23 10:46:13 -07:00
|
|
|
#endif
|
2024-12-29 16:06:27 -08:00
|
|
|
}
|
2015-07-27 17:54:46 +03:00
|
|
|
|
2025-04-02 16:12:45 -07:00
|
|
|
brw_builder
|
|
|
|
|
at_start(bblock_t *block) const
|
|
|
|
|
{
|
|
|
|
|
brw_builder bld = *this;
|
|
|
|
|
bld.block = block;
|
|
|
|
|
bld.cursor = block->instructions.head_sentinel.next;
|
|
|
|
|
return bld;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
brw_builder
|
|
|
|
|
at_end(bblock_t *block) const
|
|
|
|
|
{
|
|
|
|
|
brw_builder bld = *this;
|
|
|
|
|
bld.block = block;
|
|
|
|
|
bld.cursor = &block->instructions.tail_sentinel;
|
|
|
|
|
return bld;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
brw_builder
|
|
|
|
|
before(brw_inst *ref) const
|
|
|
|
|
{
|
|
|
|
|
brw_builder bld = *this;
|
|
|
|
|
bld.block = ref->block;
|
|
|
|
|
bld.cursor = ref;
|
|
|
|
|
return bld;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
brw_builder
|
|
|
|
|
after(brw_inst *ref) const
|
|
|
|
|
{
|
|
|
|
|
brw_builder bld = *this;
|
|
|
|
|
bld.block = ref->block;
|
|
|
|
|
bld.cursor = ref->next;
|
|
|
|
|
return bld;
|
|
|
|
|
}
|
|
|
|
|
|
2025-04-02 16:34:44 -07:00
|
|
|
brw_builder
|
|
|
|
|
after_block_before_control_flow(bblock_t *block) const
|
|
|
|
|
{
|
|
|
|
|
brw_builder bld = *this;
|
|
|
|
|
bld.block = block;
|
|
|
|
|
bld.cursor = block->last_non_control_flow_inst()->next;
|
|
|
|
|
return bld;
|
|
|
|
|
}
|
|
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Construct a builder specifying the default SIMD width and group of
|
|
|
|
|
* channel enable signals, inheriting other code generation parameters
|
|
|
|
|
* from this.
|
|
|
|
|
*
|
|
|
|
|
* \p n gives the default SIMD width, \p i gives the slot group used for
|
|
|
|
|
* predication and control flow masking in multiples of \p n channels.
|
|
|
|
|
*/
|
|
|
|
|
brw_builder
|
|
|
|
|
group(unsigned n, unsigned i) const
|
|
|
|
|
{
|
|
|
|
|
brw_builder bld = *this;
|
|
|
|
|
|
|
|
|
|
if (n <= dispatch_width() && i < dispatch_width() / n) {
|
|
|
|
|
bld._group += i * n;
|
|
|
|
|
} else {
|
|
|
|
|
/* The requested channel group isn't a subset of the channel group
|
|
|
|
|
* of this builder, which means that the resulting instructions
|
|
|
|
|
* would use (potentially undefined) channel enable signals not
|
|
|
|
|
* specified by the parent builder. That's only valid if the
|
|
|
|
|
* instruction doesn't have per-channel semantics, in which case
|
|
|
|
|
* we should clear off the default group index in order to prevent
|
|
|
|
|
* emitting instructions with channel group not aligned to their
|
|
|
|
|
* own execution size.
|
|
|
|
|
*/
|
|
|
|
|
assert(force_writemask_all);
|
|
|
|
|
bld._group = 0;
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
}
|
|
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
bld._dispatch_width = n;
|
|
|
|
|
return bld;
|
|
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Alias for group() with width equal to eight.
|
|
|
|
|
*/
|
|
|
|
|
brw_builder
|
|
|
|
|
quarter(unsigned i) const
|
|
|
|
|
{
|
|
|
|
|
return group(8, i);
|
|
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Construct a builder with per-channel control flow execution masking
|
|
|
|
|
* disabled if \p b is true. If control flow execution masking is
|
|
|
|
|
* already disabled this has no effect.
|
|
|
|
|
*/
|
|
|
|
|
brw_builder
|
|
|
|
|
exec_all(bool b = true) const
|
|
|
|
|
{
|
|
|
|
|
brw_builder bld = *this;
|
|
|
|
|
if (b)
|
|
|
|
|
bld.force_writemask_all = true;
|
|
|
|
|
return bld;
|
|
|
|
|
}
|
2024-10-09 15:19:06 -07:00
|
|
|
|
2025-04-03 01:14:03 -07:00
|
|
|
/**
|
|
|
|
|
* Construct a builder for SIMD1 operations.
|
|
|
|
|
*/
|
|
|
|
|
brw_builder
|
|
|
|
|
uniform() const
|
|
|
|
|
{
|
|
|
|
|
return exec_all().group(1, 0);
|
|
|
|
|
}
|
|
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Construct a builder for SIMD8-as-scalar
|
|
|
|
|
*/
|
|
|
|
|
brw_builder
|
|
|
|
|
scalar_group() const
|
|
|
|
|
{
|
|
|
|
|
return exec_all().group(8 * reg_unit(shader->devinfo), 0);
|
|
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Construct a builder with the given debug annotation info.
|
|
|
|
|
*/
|
|
|
|
|
brw_builder
|
|
|
|
|
annotate(const char *str) const
|
|
|
|
|
{
|
|
|
|
|
brw_builder bld = *this;
|
|
|
|
|
bld.annotation.str = str;
|
|
|
|
|
return bld;
|
|
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Get the SIMD width in use.
|
|
|
|
|
*/
|
|
|
|
|
unsigned
|
|
|
|
|
dispatch_width() const
|
|
|
|
|
{
|
|
|
|
|
return _dispatch_width;
|
|
|
|
|
}
|
2016-05-20 16:14:13 -07:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Get the channel group in use.
|
|
|
|
|
*/
|
|
|
|
|
unsigned
|
|
|
|
|
group() const
|
|
|
|
|
{
|
|
|
|
|
return _group;
|
|
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Allocate a virtual register of natural vector size (one for this IR)
|
|
|
|
|
* and SIMD width. \p n gives the amount of space to allocate in
|
|
|
|
|
* dispatch_width units (which is just enough space for one logical
|
|
|
|
|
* component in this IR).
|
|
|
|
|
*/
|
|
|
|
|
brw_reg
|
|
|
|
|
vgrf(enum brw_reg_type type, unsigned n = 1) const
|
|
|
|
|
{
|
|
|
|
|
assert(dispatch_width() <= 32);
|
|
|
|
|
|
|
|
|
|
if (n > 0)
|
2025-01-31 12:50:20 -08:00
|
|
|
return brw_allocate_vgrf(*shader, type, n * dispatch_width());
|
2024-12-29 16:06:27 -08:00
|
|
|
else
|
|
|
|
|
return retype(null_reg_ud(), type);
|
|
|
|
|
}
|
2024-03-13 11:01:16 +02:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
brw_reg
|
|
|
|
|
vaddr(enum brw_reg_type type, unsigned subnr) const
|
|
|
|
|
{
|
|
|
|
|
brw_reg addr = brw_address_reg(subnr);
|
|
|
|
|
addr.nr = shader->next_address_register_nr++;
|
|
|
|
|
return retype(addr, type);
|
|
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Create a null register of floating type.
|
|
|
|
|
*/
|
|
|
|
|
brw_reg
|
|
|
|
|
null_reg_f() const
|
|
|
|
|
{
|
|
|
|
|
return brw_reg(retype(brw_null_reg(), BRW_TYPE_F));
|
|
|
|
|
}
|
2016-01-07 14:12:26 +01:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
brw_reg
|
|
|
|
|
null_reg_df() const
|
|
|
|
|
{
|
|
|
|
|
return brw_reg(retype(brw_null_reg(), BRW_TYPE_DF));
|
|
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Create a null register of signed integer type.
|
|
|
|
|
*/
|
|
|
|
|
brw_reg
|
|
|
|
|
null_reg_d() const
|
|
|
|
|
{
|
|
|
|
|
return brw_reg(retype(brw_null_reg(), BRW_TYPE_D));
|
|
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Create a null register of unsigned integer type.
|
|
|
|
|
*/
|
|
|
|
|
brw_reg
|
|
|
|
|
null_reg_ud() const
|
|
|
|
|
{
|
|
|
|
|
return brw_reg(retype(brw_null_reg(), BRW_TYPE_UD));
|
|
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Insert an instruction into the program.
|
|
|
|
|
*/
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
|
|
|
|
emit(const brw_inst &inst) const
|
2024-12-29 16:06:27 -08:00
|
|
|
{
|
2024-12-07 00:23:07 -08:00
|
|
|
return emit(new(shader->mem_ctx) brw_inst(inst));
|
2024-12-29 16:06:27 -08:00
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Create and insert a nullary control instruction into the program.
|
|
|
|
|
*/
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
2024-12-29 16:06:27 -08:00
|
|
|
emit(enum opcode opcode) const
|
|
|
|
|
{
|
2024-12-07 00:23:07 -08:00
|
|
|
return emit(brw_inst(opcode, dispatch_width()));
|
2024-12-29 16:06:27 -08:00
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Create and insert a nullary instruction into the program.
|
|
|
|
|
*/
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
2024-12-29 16:06:27 -08:00
|
|
|
emit(enum opcode opcode, const brw_reg &dst) const
|
|
|
|
|
{
|
2024-12-07 00:23:07 -08:00
|
|
|
return emit(brw_inst(opcode, dispatch_width(), dst));
|
2024-12-29 16:06:27 -08:00
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Create and insert a unary instruction into the program.
|
|
|
|
|
*/
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
2024-12-29 16:06:27 -08:00
|
|
|
emit(enum opcode opcode, const brw_reg &dst, const brw_reg &src0) const
|
|
|
|
|
{
|
2024-12-07 00:23:07 -08:00
|
|
|
return emit(brw_inst(opcode, dispatch_width(), dst, src0));
|
2024-12-29 16:06:27 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Create and insert a binary instruction into the program.
|
|
|
|
|
*/
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
2024-12-29 16:06:27 -08:00
|
|
|
emit(enum opcode opcode, const brw_reg &dst, const brw_reg &src0,
|
|
|
|
|
const brw_reg &src1) const
|
|
|
|
|
{
|
2024-12-07 00:23:07 -08:00
|
|
|
return emit(brw_inst(opcode, dispatch_width(), dst,
|
2024-12-29 16:06:27 -08:00
|
|
|
src0, src1));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Create and insert a ternary instruction into the program.
|
|
|
|
|
*/
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
2024-12-29 16:06:27 -08:00
|
|
|
emit(enum opcode opcode, const brw_reg &dst, const brw_reg &src0,
|
|
|
|
|
const brw_reg &src1, const brw_reg &src2) const
|
|
|
|
|
{
|
|
|
|
|
switch (opcode) {
|
|
|
|
|
case BRW_OPCODE_BFE:
|
|
|
|
|
case BRW_OPCODE_BFI2:
|
|
|
|
|
case BRW_OPCODE_MAD:
|
2025-08-10 01:28:33 -07:00
|
|
|
case BRW_OPCODE_LRP: {
|
|
|
|
|
brw_reg fixed0 = fix_3src_operand(src0);
|
|
|
|
|
brw_reg fixed1 = fix_3src_operand(src1);
|
|
|
|
|
brw_reg fixed2 = fix_3src_operand(src2);
|
2024-12-07 00:23:07 -08:00
|
|
|
return emit(brw_inst(opcode, dispatch_width(), dst,
|
2025-08-10 01:28:33 -07:00
|
|
|
fixed0, fixed1, fixed2));
|
|
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
default:
|
2024-12-07 00:23:07 -08:00
|
|
|
return emit(brw_inst(opcode, dispatch_width(), dst,
|
2024-12-29 16:06:27 -08:00
|
|
|
src0, src1, src2));
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
}
|
2024-12-29 16:06:27 -08:00
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Create and insert an instruction with a variable number of sources
|
|
|
|
|
* into the program.
|
|
|
|
|
*/
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
2024-12-29 16:06:27 -08:00
|
|
|
emit(enum opcode opcode, const brw_reg &dst, const brw_reg srcs[],
|
|
|
|
|
unsigned n) const
|
|
|
|
|
{
|
|
|
|
|
/* Use the emit() methods for specific operand counts to ensure that
|
|
|
|
|
* opcode-specific operand fixups occur.
|
|
|
|
|
*/
|
|
|
|
|
if (n == 3) {
|
|
|
|
|
return emit(opcode, dst, srcs[0], srcs[1], srcs[2]);
|
|
|
|
|
} else {
|
2024-12-07 00:23:07 -08:00
|
|
|
return emit(brw_inst(opcode, dispatch_width(), dst, srcs, n));
|
2015-07-14 19:32:03 +03:00
|
|
|
}
|
2024-12-29 16:06:27 -08:00
|
|
|
}
|
2015-07-14 19:32:03 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Insert a preallocated instruction into the program.
|
|
|
|
|
*/
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
|
|
|
|
emit(brw_inst *inst) const
|
2024-12-29 16:06:27 -08:00
|
|
|
{
|
|
|
|
|
assert(inst->exec_size <= 32);
|
|
|
|
|
assert(inst->exec_size == dispatch_width() ||
|
|
|
|
|
force_writemask_all);
|
|
|
|
|
|
|
|
|
|
inst->group = _group;
|
|
|
|
|
inst->force_writemask_all = force_writemask_all;
|
2024-08-23 10:46:13 -07:00
|
|
|
#ifndef NDEBUG
|
2024-12-29 16:06:27 -08:00
|
|
|
inst->annotation = annotation.str;
|
2024-08-23 10:46:13 -07:00
|
|
|
#endif
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
if (block)
|
2025-04-02 17:53:03 -07:00
|
|
|
block->insert_before(inst, cursor);
|
2024-12-29 16:06:27 -08:00
|
|
|
else
|
|
|
|
|
cursor->insert_before(inst);
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
return inst;
|
|
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Select \p src0 if the comparison of both sources with the given
|
|
|
|
|
* conditional mod evaluates to true, otherwise select \p src1.
|
|
|
|
|
*
|
|
|
|
|
* Generally useful to get the minimum or maximum of two values.
|
|
|
|
|
*/
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
2024-12-29 16:06:27 -08:00
|
|
|
emit_minmax(const brw_reg &dst, const brw_reg &src0,
|
|
|
|
|
const brw_reg &src1, brw_conditional_mod mod) const
|
|
|
|
|
{
|
|
|
|
|
assert(mod == BRW_CONDITIONAL_GE || mod == BRW_CONDITIONAL_L);
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/* In some cases we can't have bytes as operand for src1, so use the
|
|
|
|
|
* same type for both operand.
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
*/
|
2024-12-29 16:06:27 -08:00
|
|
|
return set_condmod(mod, SEL(dst, fix_unsigned_negate(src0),
|
|
|
|
|
fix_unsigned_negate(src1)));
|
|
|
|
|
}
|
2015-07-13 15:52:28 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Copy any live channel from \p src to the first channel of the result.
|
|
|
|
|
*/
|
|
|
|
|
brw_reg
|
|
|
|
|
emit_uniformize(const brw_reg &src) const
|
|
|
|
|
{
|
|
|
|
|
/* Trivial: skip unnecessary work and retain IMM */
|
|
|
|
|
if (src.file == IMM)
|
|
|
|
|
return src;
|
|
|
|
|
|
|
|
|
|
/* FIXME: We use a vector chan_index and dst to allow constant and
|
|
|
|
|
* copy propagration to move result all the way into the consuming
|
|
|
|
|
* instruction (typically a surface index or sampler index for a
|
|
|
|
|
* send). Once we teach const/copy propagation about scalars we
|
|
|
|
|
* should go back to scalar destinations here.
|
|
|
|
|
*/
|
|
|
|
|
const brw_builder xbld = scalar_group();
|
|
|
|
|
const brw_reg chan_index = xbld.vgrf(BRW_TYPE_UD);
|
|
|
|
|
|
|
|
|
|
/* FIND_LIVE_CHANNEL will only write a single component after
|
|
|
|
|
* lowering. Munge size_written here to match the allocated size of
|
|
|
|
|
* chan_index.
|
|
|
|
|
*/
|
|
|
|
|
exec_all().emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index)
|
|
|
|
|
->size_written = chan_index.component_size(xbld.dispatch_width());
|
|
|
|
|
|
|
|
|
|
return BROADCAST(src, component(chan_index, 0));
|
|
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
brw_reg
|
|
|
|
|
move_to_vgrf(const brw_reg &src, unsigned num_components) const
|
|
|
|
|
{
|
|
|
|
|
brw_reg *const src_comps = new brw_reg[num_components];
|
brw: Basic infrastructure to store convergent values as scalars
In SIMD16 and SIMD32, storing convergent values in full 16- or
32-channel registers is wasteful. It wastes register space, and in most
cases on SIMD32, it wastes instructions. Our register allocator is not
clever enough to handle scalar allocations. It's fundamental unit of
allocation is SIMD8. Start treating convergent values as SIMD8.
Add a tracking bit in brw_reg to specify that a register represents a
convergent, scalar value. This has two implications:
1. All channels of the SIMD8 register must contain the same value. In
general, this means that writes to the register must be
force_writemask_all and exec_size = 8;
2. Reads of this register can (and should) use <0,1,0> stride. SIMD8
instructions that have restrictions on source stride can us <8,8,1>.
Values that are vectors (e.g., results of load_uniform or texture
operations) will be stored as multiple SIMD8 hardware registers.
v2: brw_fs_opt_copy_propagation_defs fix from Ken. Fix for Xe2.
v3: Eliminte offset_to_scalar(). Remove mention of vec4 backend in
brw_reg.h. Both suggested by Caio. The offset_to_scalar() change
necessitates some trickery in the fs_builder offset() function, but I
think this is an improvement overall. There is also some rework in
find_value_for_offset to account for the possibility that is_scalar
sources in LOAD_PAYLOAD might be <8;8,1> or <0;1,0>.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-02-09 17:12:11 -08:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
for (unsigned i = 0; i < num_components; i++)
|
|
|
|
|
src_comps[i] = offset(src, *this, i);
|
2018-11-16 10:46:27 -06:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
const brw_reg dst = vgrf(src.type, num_components);
|
|
|
|
|
LOAD_PAYLOAD(dst, src_comps, num_components, 0);
|
2018-11-16 10:46:27 -06:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
delete[] src_comps;
|
2018-11-16 10:46:27 -06:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
return brw_reg(dst);
|
|
|
|
|
}
|
2018-11-16 10:46:27 -06:00
|
|
|
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
|
|
|
|
emit_undef_for_dst(const brw_inst *old_inst) const
|
2024-12-29 16:06:27 -08:00
|
|
|
{
|
|
|
|
|
assert(old_inst->dst.file == VGRF);
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *inst = emit(SHADER_OPCODE_UNDEF,
|
2024-12-29 16:06:27 -08:00
|
|
|
retype(old_inst->dst, BRW_TYPE_UD));
|
|
|
|
|
inst->size_written = old_inst->size_written;
|
2022-09-14 02:40:01 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
return inst;
|
|
|
|
|
}
|
2022-09-14 02:40:01 +03:00
|
|
|
|
2024-10-23 00:44:54 -07:00
|
|
|
/**
|
|
|
|
|
* Emit UNDEF for the given register if its data doesn't fully occupy
|
|
|
|
|
* the space we allocated.
|
|
|
|
|
*/
|
|
|
|
|
void
|
|
|
|
|
emit_undef_for_partial_reg(const brw_reg ®) const
|
|
|
|
|
{
|
|
|
|
|
if (brw_type_size_bytes(reg.type) * dispatch_width() < REG_SIZE)
|
|
|
|
|
UNDEF(reg);
|
|
|
|
|
}
|
|
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Assorted arithmetic ops.
|
|
|
|
|
* @{
|
|
|
|
|
*/
|
2024-04-12 17:57:33 -07:00
|
|
|
#define _ALU1(prefix, op) \
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst * \
|
2024-12-29 16:06:27 -08:00
|
|
|
op(const brw_reg &dst, const brw_reg &src0) const \
|
|
|
|
|
{ \
|
|
|
|
|
assert(_dispatch_width == 1 || \
|
|
|
|
|
(dst.file >= VGRF && dst.stride != 0) || \
|
|
|
|
|
(dst.file < VGRF && dst.hstride != 0)); \
|
|
|
|
|
return emit(prefix##op, dst, src0); \
|
|
|
|
|
} \
|
|
|
|
|
brw_reg \
|
2024-12-07 00:23:07 -08:00
|
|
|
op(const brw_reg &src0, brw_inst **out = NULL) const \
|
2024-12-29 16:06:27 -08:00
|
|
|
{ \
|
2024-10-23 00:47:55 -07:00
|
|
|
brw_reg dst = vgrf(src0.type); \
|
|
|
|
|
emit_undef_for_partial_reg(dst); \
|
|
|
|
|
brw_inst *inst = op(dst, src0); \
|
2024-12-29 16:06:27 -08:00
|
|
|
if (out) *out = inst; \
|
|
|
|
|
return inst->dst; \
|
|
|
|
|
}
|
2024-04-12 17:57:33 -07:00
|
|
|
#define ALU1(op) _ALU1(BRW_OPCODE_, op)
|
|
|
|
|
#define VIRT1(op) _ALU1(SHADER_OPCODE_, op)
|
|
|
|
|
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
2024-12-29 16:06:27 -08:00
|
|
|
alu2(opcode op, const brw_reg &dst, const brw_reg &src0, const brw_reg &src1) const
|
|
|
|
|
{
|
|
|
|
|
return emit(op, dst, src0, src1);
|
|
|
|
|
}
|
|
|
|
|
brw_reg
|
2024-12-07 00:23:07 -08:00
|
|
|
alu2(opcode op, const brw_reg &src0, const brw_reg &src1, brw_inst **out = NULL) const
|
2024-12-29 16:06:27 -08:00
|
|
|
{
|
|
|
|
|
enum brw_reg_type inferred_dst_type =
|
|
|
|
|
brw_type_larger_of(src0.type, src1.type);
|
2024-10-23 00:47:55 -07:00
|
|
|
brw_reg dst = vgrf(inferred_dst_type);
|
|
|
|
|
emit_undef_for_partial_reg(dst);
|
|
|
|
|
brw_inst *inst = alu2(op, dst, src0, src1);
|
2024-12-29 16:06:27 -08:00
|
|
|
if (out) *out = inst;
|
|
|
|
|
return inst->dst;
|
|
|
|
|
}
|
2024-06-18 13:48:38 -07:00
|
|
|
|
2024-04-12 15:41:34 -07:00
|
|
|
#define _ALU2(prefix, op) \
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst * \
|
2024-12-29 16:06:27 -08:00
|
|
|
op(const brw_reg &dst, const brw_reg &src0, const brw_reg &src1) const \
|
|
|
|
|
{ \
|
|
|
|
|
return alu2(prefix##op, dst, src0, src1); \
|
|
|
|
|
} \
|
|
|
|
|
brw_reg \
|
2024-12-07 00:23:07 -08:00
|
|
|
op(const brw_reg &src0, const brw_reg &src1, brw_inst **out = NULL) const \
|
2024-12-29 16:06:27 -08:00
|
|
|
{ \
|
|
|
|
|
return alu2(prefix##op, src0, src1, out); \
|
|
|
|
|
}
|
2024-04-12 17:57:33 -07:00
|
|
|
#define ALU2(op) _ALU2(BRW_OPCODE_, op)
|
|
|
|
|
#define VIRT2(op) _ALU2(SHADER_OPCODE_, op)
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
|
|
|
|
#define ALU2_ACC(op) \
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst * \
|
2024-12-29 16:06:27 -08:00
|
|
|
op(const brw_reg &dst, const brw_reg &src0, const brw_reg &src1) const \
|
|
|
|
|
{ \
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *inst = emit(BRW_OPCODE_##op, dst, src0, src1); \
|
2024-12-29 16:06:27 -08:00
|
|
|
inst->writes_accumulator = true; \
|
|
|
|
|
return inst; \
|
|
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
|
|
|
|
#define ALU3(op) \
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst * \
|
2024-12-29 16:06:27 -08:00
|
|
|
op(const brw_reg &dst, const brw_reg &src0, const brw_reg &src1, \
|
|
|
|
|
const brw_reg &src2) const \
|
|
|
|
|
{ \
|
|
|
|
|
return emit(BRW_OPCODE_##op, dst, src0, src1, src2); \
|
|
|
|
|
} \
|
|
|
|
|
brw_reg \
|
|
|
|
|
op(const brw_reg &src0, const brw_reg &src1, const brw_reg &src2, \
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst **out = NULL) const \
|
2024-12-29 16:06:27 -08:00
|
|
|
{ \
|
|
|
|
|
enum brw_reg_type inferred_dst_type = \
|
|
|
|
|
brw_type_larger_of(brw_type_larger_of(src0.type, src1.type),\
|
|
|
|
|
src2.type); \
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *inst = op(vgrf(inferred_dst_type), src0, src1, src2); \
|
2024-12-29 16:06:27 -08:00
|
|
|
if (out) *out = inst; \
|
|
|
|
|
return inst->dst; \
|
|
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
ALU3(ADD3)
|
|
|
|
|
ALU2_ACC(ADDC)
|
|
|
|
|
ALU2(AND)
|
|
|
|
|
ALU2(ASR)
|
|
|
|
|
ALU2(AVG)
|
|
|
|
|
ALU3(BFE)
|
|
|
|
|
ALU2(BFI1)
|
|
|
|
|
ALU3(BFI2)
|
|
|
|
|
ALU1(BFREV)
|
|
|
|
|
ALU1(CBIT)
|
|
|
|
|
ALU2(DP2)
|
|
|
|
|
ALU2(DP3)
|
|
|
|
|
ALU2(DP4)
|
|
|
|
|
ALU2(DPH)
|
|
|
|
|
ALU1(FBH)
|
|
|
|
|
ALU1(FBL)
|
|
|
|
|
ALU1(FRC)
|
|
|
|
|
ALU3(DP4A)
|
|
|
|
|
ALU2(LINE)
|
|
|
|
|
ALU1(LZD)
|
|
|
|
|
ALU2(MAC)
|
|
|
|
|
ALU2_ACC(MACH)
|
|
|
|
|
ALU3(MAD)
|
|
|
|
|
ALU1(MOV)
|
|
|
|
|
ALU2(MUL)
|
|
|
|
|
ALU1(NOT)
|
|
|
|
|
ALU2(OR)
|
|
|
|
|
ALU2(PLN)
|
|
|
|
|
ALU1(RNDD)
|
|
|
|
|
ALU1(RNDE)
|
|
|
|
|
ALU1(RNDU)
|
|
|
|
|
ALU1(RNDZ)
|
|
|
|
|
ALU2(ROL)
|
|
|
|
|
ALU2(ROR)
|
|
|
|
|
ALU2(SEL)
|
|
|
|
|
ALU2(SHL)
|
|
|
|
|
ALU2(SHR)
|
|
|
|
|
ALU2_ACC(SUBB)
|
|
|
|
|
ALU2(XOR)
|
|
|
|
|
|
|
|
|
|
VIRT1(RCP)
|
|
|
|
|
VIRT1(RSQ)
|
|
|
|
|
VIRT1(SQRT)
|
|
|
|
|
VIRT1(EXP2)
|
|
|
|
|
VIRT1(LOG2)
|
|
|
|
|
VIRT2(POW)
|
|
|
|
|
VIRT2(INT_QUOTIENT)
|
|
|
|
|
VIRT2(INT_REMAINDER)
|
|
|
|
|
VIRT1(SIN)
|
|
|
|
|
VIRT1(COS)
|
2024-04-12 17:57:33 -07:00
|
|
|
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
#undef ALU3
|
|
|
|
|
#undef ALU2_ACC
|
|
|
|
|
#undef ALU2
|
2024-04-12 17:57:33 -07:00
|
|
|
#undef VIRT2
|
|
|
|
|
#undef _ALU2
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
#undef ALU1
|
2024-04-12 17:57:33 -07:00
|
|
|
#undef VIRT1
|
|
|
|
|
#undef _ALU1
|
2024-12-29 16:06:27 -08:00
|
|
|
/** @} */
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
2024-12-29 16:06:27 -08:00
|
|
|
ADD(const brw_reg &dst, const brw_reg &src0, const brw_reg &src1) const
|
|
|
|
|
{
|
|
|
|
|
return alu2(BRW_OPCODE_ADD, dst, src0, src1);
|
|
|
|
|
}
|
2024-06-18 13:49:17 -07:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
brw_reg
|
2024-12-07 00:23:07 -08:00
|
|
|
ADD(const brw_reg &src0, const brw_reg &src1, brw_inst **out = NULL) const
|
2024-12-29 16:06:27 -08:00
|
|
|
{
|
|
|
|
|
if (src1.file == IMM && src1.ud == 0 && !out)
|
|
|
|
|
return src0;
|
2024-06-18 13:49:17 -07:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
return alu2(BRW_OPCODE_ADD, src0, src1, out);
|
|
|
|
|
}
|
2024-06-18 13:49:17 -07:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* CMP: Sets the low bit of the destination channels with the result
|
|
|
|
|
* of the comparison, while the upper bits are undefined, and updates
|
|
|
|
|
* the flag register with the packed 16 bits of the result.
|
|
|
|
|
*/
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
2024-12-29 16:06:27 -08:00
|
|
|
CMP(const brw_reg &dst, const brw_reg &src0, const brw_reg &src1,
|
|
|
|
|
brw_conditional_mod condition) const
|
|
|
|
|
{
|
|
|
|
|
/* Take the instruction:
|
|
|
|
|
*
|
|
|
|
|
* CMP null<d> src0<f> src1<f>
|
|
|
|
|
*
|
|
|
|
|
* Original gfx4 does type conversion to the destination type
|
|
|
|
|
* before comparison, producing garbage results for floating
|
|
|
|
|
* point comparisons.
|
|
|
|
|
*/
|
|
|
|
|
const enum brw_reg_type type =
|
|
|
|
|
dst.is_null() ?
|
|
|
|
|
src0.type :
|
|
|
|
|
brw_type_with_size(src0.type, brw_type_size_bits(dst.type));
|
|
|
|
|
|
|
|
|
|
return set_condmod(condition,
|
|
|
|
|
emit(BRW_OPCODE_CMP, retype(dst, type),
|
|
|
|
|
fix_unsigned_negate(src0),
|
|
|
|
|
fix_unsigned_negate(src1)));
|
|
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* CMPN: Behaves like CMP, but produces true if src1 is NaN.
|
|
|
|
|
*/
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
2024-12-29 16:06:27 -08:00
|
|
|
CMPN(const brw_reg &dst, const brw_reg &src0, const brw_reg &src1,
|
|
|
|
|
brw_conditional_mod condition) const
|
|
|
|
|
{
|
|
|
|
|
/* Take the instruction:
|
|
|
|
|
*
|
|
|
|
|
* CMP null<d> src0<f> src1<f>
|
|
|
|
|
*
|
|
|
|
|
* Original gfx4 does type conversion to the destination type
|
|
|
|
|
* before comparison, producing garbage results for floating
|
|
|
|
|
* point comparisons.
|
|
|
|
|
*/
|
|
|
|
|
const enum brw_reg_type type =
|
|
|
|
|
dst.is_null() ?
|
|
|
|
|
src0.type :
|
|
|
|
|
brw_type_with_size(src0.type, brw_type_size_bits(dst.type));
|
|
|
|
|
|
|
|
|
|
return set_condmod(condition,
|
|
|
|
|
emit(BRW_OPCODE_CMPN, retype(dst, type),
|
|
|
|
|
fix_unsigned_negate(src0),
|
|
|
|
|
fix_unsigned_negate(src1)));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* CSEL: dst = src2 <op> 0.0f ? src0 : src1
|
|
|
|
|
*/
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
2024-12-29 16:06:27 -08:00
|
|
|
CSEL(const brw_reg &dst, const brw_reg &src0, const brw_reg &src1,
|
|
|
|
|
const brw_reg &src2, brw_conditional_mod condition) const
|
|
|
|
|
{
|
|
|
|
|
return set_condmod(condition,
|
|
|
|
|
emit(BRW_OPCODE_CSEL,
|
|
|
|
|
retype(dst, src2.type),
|
|
|
|
|
retype(src0, src2.type),
|
|
|
|
|
retype(src1, src2.type),
|
|
|
|
|
src2));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Emit a linear interpolation instruction.
|
|
|
|
|
*/
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
2024-12-29 16:06:27 -08:00
|
|
|
LRP(const brw_reg &dst, const brw_reg &x, const brw_reg &y,
|
|
|
|
|
const brw_reg &a) const
|
|
|
|
|
{
|
|
|
|
|
if (shader->devinfo->ver <= 10) {
|
|
|
|
|
/* The LRP instruction actually does op1 * op0 + op2 * (1 - op0), so
|
|
|
|
|
* we need to reorder the operands.
|
2021-02-13 13:22:41 -08:00
|
|
|
*/
|
2024-12-29 16:06:27 -08:00
|
|
|
return emit(BRW_OPCODE_LRP, dst, a, y, x);
|
2021-02-13 13:22:41 -08:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
} else {
|
|
|
|
|
/* We can't use the LRP instruction. Emit x*(1-a) + y*a. */
|
|
|
|
|
const brw_reg y_times_a = vgrf(dst.type);
|
|
|
|
|
const brw_reg one_minus_a = vgrf(dst.type);
|
|
|
|
|
const brw_reg x_times_one_minus_a = vgrf(dst.type);
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
MUL(y_times_a, y, a);
|
|
|
|
|
ADD(one_minus_a, negate(a), brw_imm_f(1.0f));
|
|
|
|
|
MUL(x_times_one_minus_a, x, brw_reg(one_minus_a));
|
|
|
|
|
return ADD(dst, brw_reg(x_times_one_minus_a), brw_reg(y_times_a));
|
2015-11-22 20:12:17 -08:00
|
|
|
}
|
2024-12-29 16:06:27 -08:00
|
|
|
}
|
2015-11-22 20:12:17 -08:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Collect a number of registers in a contiguous range of registers.
|
|
|
|
|
*/
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
2024-12-29 16:06:27 -08:00
|
|
|
LOAD_PAYLOAD(const brw_reg &dst, const brw_reg *src,
|
|
|
|
|
unsigned sources, unsigned header_size) const
|
|
|
|
|
{
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *inst = emit(SHADER_OPCODE_LOAD_PAYLOAD, dst, src, sources);
|
2024-12-29 16:06:27 -08:00
|
|
|
inst->header_size = header_size;
|
|
|
|
|
inst->size_written = header_size * REG_SIZE;
|
|
|
|
|
for (unsigned i = header_size; i < sources; i++) {
|
|
|
|
|
inst->size_written += dispatch_width() * brw_type_size_bytes(src[i].type) *
|
|
|
|
|
dst.stride;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return inst;
|
|
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
2024-12-29 16:06:27 -08:00
|
|
|
VEC(const brw_reg &dst, const brw_reg *src, unsigned sources) const
|
|
|
|
|
{
|
|
|
|
|
return sources == 1 ? MOV(dst, src[0])
|
|
|
|
|
: LOAD_PAYLOAD(dst, src, sources, 0);
|
|
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
2024-12-29 16:06:27 -08:00
|
|
|
SYNC(enum tgl_sync_function sync) const
|
|
|
|
|
{
|
|
|
|
|
return emit(BRW_OPCODE_SYNC, null_reg_ud(), brw_imm_ud(sync));
|
|
|
|
|
}
|
2024-01-03 08:06:36 -08:00
|
|
|
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
2024-12-29 16:06:27 -08:00
|
|
|
UNDEF(const brw_reg &dst) const
|
|
|
|
|
{
|
|
|
|
|
assert(dst.file == VGRF);
|
|
|
|
|
assert(dst.offset % REG_SIZE == 0);
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *inst = emit(SHADER_OPCODE_UNDEF,
|
2024-12-29 16:06:27 -08:00
|
|
|
retype(dst, BRW_TYPE_UD));
|
|
|
|
|
inst->size_written = shader->alloc.sizes[dst.nr] * REG_SIZE - dst.offset;
|
2024-04-11 01:31:54 -07:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
return inst;
|
|
|
|
|
}
|
2019-05-29 17:46:55 -05:00
|
|
|
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *
|
2024-12-29 16:06:27 -08:00
|
|
|
DPAS(const brw_reg &dst, const brw_reg &src0, const brw_reg &src1, const brw_reg &src2,
|
|
|
|
|
unsigned sdepth, unsigned rcount) const
|
|
|
|
|
{
|
|
|
|
|
assert(_dispatch_width == 8 * reg_unit(shader->devinfo));
|
|
|
|
|
assert(sdepth == 8);
|
|
|
|
|
assert(rcount == 1 || rcount == 2 || rcount == 4 || rcount == 8);
|
2019-05-29 17:46:55 -05:00
|
|
|
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *inst = emit(BRW_OPCODE_DPAS, dst, src0, src1, src2);
|
2024-12-29 16:06:27 -08:00
|
|
|
inst->sdepth = sdepth;
|
|
|
|
|
inst->rcount = rcount;
|
2023-09-20 12:42:24 -07:00
|
|
|
|
2025-06-03 13:19:03 -04:00
|
|
|
unsigned type_size = brw_type_size_bytes(dst.type);
|
|
|
|
|
assert(type_size == 4 || type_size == 2);
|
|
|
|
|
inst->size_written = rcount * reg_unit(shader->devinfo) * 8 * type_size;
|
2024-07-12 17:26:40 -07:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
return inst;
|
|
|
|
|
}
|
2024-07-12 17:26:40 -07:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
void
|
|
|
|
|
VARYING_PULL_CONSTANT_LOAD(const brw_reg &dst,
|
|
|
|
|
const brw_reg &surface,
|
|
|
|
|
const brw_reg &surface_handle,
|
|
|
|
|
const brw_reg &varying_offset,
|
|
|
|
|
uint32_t const_offset,
|
|
|
|
|
uint8_t alignment,
|
|
|
|
|
unsigned components) const
|
|
|
|
|
{
|
|
|
|
|
assert(components <= 4);
|
|
|
|
|
|
|
|
|
|
/* We have our constant surface use a pitch of 4 bytes, so our index can
|
|
|
|
|
* be any component of a vector, and then we load 4 contiguous
|
|
|
|
|
* components starting from that. TODO: Support loading fewer than 4.
|
|
|
|
|
*/
|
|
|
|
|
brw_reg total_offset = ADD(varying_offset, brw_imm_ud(const_offset));
|
|
|
|
|
|
|
|
|
|
/* The pull load message will load a vec4 (16 bytes). If we are loading
|
|
|
|
|
* a double this means we are only loading 2 elements worth of data.
|
|
|
|
|
* We also want to use a 32-bit data type for the dst of the load operation
|
|
|
|
|
* so other parts of the driver don't get confused about the size of the
|
|
|
|
|
* result.
|
|
|
|
|
*/
|
|
|
|
|
brw_reg vec4_result = vgrf(BRW_TYPE_F, 4);
|
|
|
|
|
|
|
|
|
|
brw_reg srcs[PULL_VARYING_CONSTANT_SRCS];
|
|
|
|
|
srcs[PULL_VARYING_CONSTANT_SRC_SURFACE] = surface;
|
|
|
|
|
srcs[PULL_VARYING_CONSTANT_SRC_SURFACE_HANDLE] = surface_handle;
|
|
|
|
|
srcs[PULL_VARYING_CONSTANT_SRC_OFFSET] = total_offset;
|
|
|
|
|
srcs[PULL_VARYING_CONSTANT_SRC_ALIGNMENT] = brw_imm_ud(alignment);
|
|
|
|
|
|
2024-12-07 00:23:07 -08:00
|
|
|
brw_inst *inst = emit(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_LOGICAL,
|
2024-12-29 16:06:27 -08:00
|
|
|
vec4_result, srcs, PULL_VARYING_CONSTANT_SRCS);
|
|
|
|
|
inst->size_written = 4 * vec4_result.component_size(inst->exec_size);
|
|
|
|
|
|
2024-12-06 22:01:18 -08:00
|
|
|
shuffle_from_32bit_read(dst, vec4_result, 0, components);
|
2024-12-29 16:06:27 -08:00
|
|
|
}
|
2024-07-12 17:26:40 -07:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
brw_reg
|
|
|
|
|
LOAD_SUBGROUP_INVOCATION() const
|
|
|
|
|
{
|
|
|
|
|
brw_reg reg = vgrf(shader->dispatch_width < 16 ? BRW_TYPE_UD : BRW_TYPE_UW);
|
|
|
|
|
exec_all().emit(SHADER_OPCODE_LOAD_SUBGROUP_INVOCATION, reg);
|
|
|
|
|
return reg;
|
|
|
|
|
}
|
2024-07-12 17:26:40 -07:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
brw_reg
|
|
|
|
|
BROADCAST(brw_reg value, brw_reg index) const
|
|
|
|
|
{
|
|
|
|
|
const brw_builder xbld = scalar_group();
|
|
|
|
|
const brw_reg dst = xbld.vgrf(value.type);
|
2024-07-12 17:26:40 -07:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
assert(is_uniform(index));
|
intel/brw: Use CSE for LOAD_SUBGROUP_INVOCATION
Instead of emitting a single one at the top, and making reference to it,
emit the virtual instruction as needed and let CSE do its job.
Since load_subgroup_invocation now can appear not at the start of the
shader, use UNDEF in all cases to ensure that the liveness of the
destination doesn't extend to the first partial write done here (it was
being used only for SIMD > 8 before).
Note this option was considered in the past
6132992cdb858268af0e985727d80e4140be389c but at the time dismissed. The
difference now is that the lowering of the virtual instruction happens
earlier than the scheduling.
The motivation for this change is to allow passes other than the NIR
conversion to use this value. The alternative of storing a `brw_reg` in
the shader (instead of NIR state) gets complicated by passes like
compact_vgrfs, that move VGRFs around (and update the instructions).
This and maybe other passes would have to care about the brw_reg.
Fossil-db numbers, TGL
```
*** Shaders only in 'after' results are ignored:
steam-native/shadow_of_the_tomb_raider/c683ea5067ee157d/fs.32/0, steam-native/shadow_of_the_tomb_raider/f4df450c3cef40b4/fs.32/0, steam-native/shadow_of_the_tomb_raider/94b708fb8e3d9597/fs.32/0, steam-native/shadow_of_the_tomb_raider/19d44c328edabd30/fs.32/0, steam-native/shadow_of_the_tomb_raider/8a7dcbd5a74a19bf/fs.32/0, and 366 more
from 4 apps: steam-dxvk/alan_wake, steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider
*** Shaders only in 'before' results are ignored:
steam-dxvk/octopath_traveler/aaa3d10acb726906/fs.32/0, steam-dxvk/batman_arkham_origins/e6872ae23569c35f/fs.32/0, steam-dxvk/octopath_traveler/fd33a99fa5c271a8/fs.32/0, steam-dxvk/octopath_traveler/9a077cdc16f24520/fs.32/0, steam-dxvk/batman_arkham_city_goty/fac7b438ad52f622/fs.32/0, and 12 more
from 4 apps: steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-dxvk/octopath_traveler, steam-native/shadow_of_the_tomb_raider
Totals:
Instrs: 149752381 -> 149751337 (-0.00%); split: -0.00%, +0.00%
Cycle count: 11553609349 -> 11549970294 (-0.03%); split: -0.06%, +0.03%
Spill count: 42763 -> 42764 (+0.00%); split: -0.01%, +0.01%
Fill count: 75650 -> 75651 (+0.00%); split: -0.00%, +0.01%
Max live registers: 31725096 -> 31671792 (-0.17%)
Max dispatch width: 5546008 -> 5551672 (+0.10%); split: +0.11%, -0.00%
Totals from 52574 (8.34% of 630441) affected shaders:
Instrs: 9535159 -> 9534115 (-0.01%); split: -0.03%, +0.02%
Cycle count: 1006627109 -> 1002988054 (-0.36%); split: -0.65%, +0.29%
Spill count: 11588 -> 11589 (+0.01%); split: -0.03%, +0.03%
Fill count: 21057 -> 21058 (+0.00%); split: -0.01%, +0.02%
Max live registers: 1992493 -> 1939189 (-2.68%)
Max dispatch width: 559696 -> 565360 (+1.01%); split: +1.06%, -0.05%
```
and DG2
```
*** Shaders only in 'after' results are ignored:
steam-native/shadow_of_the_tomb_raider/1f95a9d3db21df85/fs.32/0, steam-native/shadow_of_the_tomb_raider/56b87c4a46613a2a/fs.32/0, steam-native/shadow_of_the_tomb_raider/a74b4137f85dbbd3/fs.32/0, steam-native/shadow_of_the_tomb_raider/e07e38d3f48e8402/fs.32/0, steam-native/shadow_of_the_tomb_raider/206336789c48996c/fs.32/0, and 268 more
from 4 apps: steam-dxvk/alan_wake, steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider
*** Shaders only in 'before' results are ignored:
steam-native/shadow_of_the_tomb_raider/0420d7c3a2ea99ec/fs.32/0, steam-native/shadow_of_the_tomb_raider/2ff39f8bf7d24abb/fs.32/0, steam-native/shadow_of_the_tomb_raider/92d7be2824bd9659/fs.32/0, steam-native/shadow_of_the_tomb_raider/f09ca6d2ecf18015/fs.32/0, steam-native/shadow_of_the_tomb_raider/490f8ffd59e52949/fs.32/0, and 205 more
from 3 apps: steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider
Totals:
Instrs: 151597619 -> 151599914 (+0.00%); split: -0.00%, +0.00%
Subgroup size: 7699776 -> 7699784 (+0.00%)
Cycle count: 12738501989 -> 12739841170 (+0.01%); split: -0.01%, +0.02%
Spill count: 61283 -> 61274 (-0.01%)
Fill count: 119886 -> 119849 (-0.03%)
Max live registers: 31810432 -> 31758920 (-0.16%)
Max dispatch width: 5540128 -> 5541136 (+0.02%); split: +0.08%, -0.06%
Totals from 49286 (7.81% of 631231) affected shaders:
Instrs: 8607753 -> 8610048 (+0.03%); split: -0.01%, +0.04%
Subgroup size: 857752 -> 857760 (+0.00%)
Cycle count: 305939495 -> 307278676 (+0.44%); split: -0.28%, +0.72%
Spill count: 6339 -> 6330 (-0.14%)
Fill count: 12571 -> 12534 (-0.29%)
Max live registers: 1788346 -> 1736834 (-2.88%)
Max dispatch width: 510920 -> 511928 (+0.20%); split: +0.85%, -0.66%
```
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30489>
2024-07-31 22:46:20 -07:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/* A broadcast will always be at the full dispatch width even if the
|
|
|
|
|
* use of the broadcast result is smaller. If the source is_scalar,
|
|
|
|
|
* it may be allocated at less than the full dispatch width (e.g.,
|
|
|
|
|
* allocated at SIMD8 with SIMD32 dispatch). The input may or may
|
|
|
|
|
* not be stride=0. If it is not, the generated broadcast
|
|
|
|
|
*
|
|
|
|
|
* broadcast(32) dst, value<1>, index<0>
|
|
|
|
|
*
|
|
|
|
|
* is invalid because it may read out of bounds from value.
|
|
|
|
|
*
|
|
|
|
|
* To account for this, modify the stride of an is_scalar input to be
|
|
|
|
|
* zero.
|
|
|
|
|
*/
|
|
|
|
|
if (value.is_scalar)
|
|
|
|
|
value = component(value, 0);
|
2024-12-11 12:49:55 -08:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/* Ensure that the source of a broadcast is always register aligned.
|
|
|
|
|
* See brw_broadcast() non-scalar case for more details.
|
|
|
|
|
*/
|
|
|
|
|
if (reg_offset(value) % (REG_SIZE * reg_unit(shader->devinfo)) != 0)
|
|
|
|
|
value = MOV(value);
|
2024-09-06 15:11:34 -07:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/* BROADCAST will only write a single component after lowering. Munge
|
|
|
|
|
* size_written here to match the allocated size of dst.
|
|
|
|
|
*/
|
|
|
|
|
exec_all().emit(SHADER_OPCODE_BROADCAST, dst, value, index)
|
|
|
|
|
->size_written = dst.component_size(xbld.dispatch_width());
|
brw/build: Use SIMD8 temporaries in emit_uniformize
The fossil-db results are very different from v1. This is now mostly
helpful on older platforms.
v2: When optimizing BROADCAST or FIND_LIVE_CHANNEL to a simple MOV,
adjust the exec_size to match the size allocated for the destination
register. Fixes EU validation failures in some piglit OpenCL tests
(e.g., atomic_add-global-return.cl).
v3: Use component_size() in emit_uniformize and BROADCAST to properly
account for UQ vs UD destination. This doesn't matter for
emit_uniformize because the type is always UD, but it is technically
more correct.
v4: Update trace checksums. Now amly expects the same checksum as
several other platforms.
v5: Use xbld.dispatch_width() in the builder for when scalar_group()
eventually becomes SIMD1. Suggested by Lionel.
shader-db:
Lunar Lake, Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown)
total instructions in shared programs: 18091701 -> 18091586 (<.01%)
instructions in affected programs: 29616 -> 29501 (-0.39%)
helped: 28 / HURT: 18
total cycles in shared programs: 919250494 -> 919123828 (-0.01%)
cycles in affected programs: 12201102 -> 12074436 (-1.04%)
helped: 124 / HURT: 108
LOST: 0
GAINED: 1
Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20480808 -> 20480624 (<.01%)
instructions in affected programs: 58465 -> 58281 (-0.31%)
helped: 61 / HURT: 20
total cycles in shared programs: 874860168 -> 874960312 (0.01%)
cycles in affected programs: 18240986 -> 18341130 (0.55%)
helped: 113 / HURT: 158
total spills in shared programs: 4557 -> 4555 (-0.04%)
spills in affected programs: 93 -> 91 (-2.15%)
helped: 1 / HURT: 0
total fills in shared programs: 5247 -> 5243 (-0.08%)
fills in affected programs: 224 -> 220 (-1.79%)
helped: 1 / HURT: 0
fossil-db:
Lunar Lake
Totals:
Instrs: 220486064 -> 220486959 (+0.00%); split: -0.00%, +0.00%
Subgroup size: 14102592 -> 14102624 (+0.00%)
Cycle count: 31602733838 -> 31604733270 (+0.01%); split: -0.01%, +0.02%
Max live registers: 65371025 -> 65355084 (-0.02%)
Totals from 12130 (1.73% of 702392) affected shaders:
Instrs: 5162700 -> 5163595 (+0.02%); split: -0.06%, +0.08%
Subgroup size: 388128 -> 388160 (+0.01%)
Cycle count: 751721956 -> 753721388 (+0.27%); split: -0.54%, +0.81%
Max live registers: 1538550 -> 1522609 (-1.04%)
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 241601142 -> 241599114 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 9631168 -> 9631216 (+0.00%)
Cycle count: 25101781573 -> 25097909570 (-0.02%); split: -0.03%, +0.01%
Max live registers: 41540611 -> 41514296 (-0.06%)
Max dispatch width: 6993456 -> 7000928 (+0.11%); split: +0.15%, -0.05%
Totals from 16852 (2.11% of 796880) affected shaders:
Instrs: 6303937 -> 6301909 (-0.03%); split: -0.11%, +0.07%
Subgroup size: 323592 -> 323640 (+0.01%)
Cycle count: 625455880 -> 621583877 (-0.62%); split: -1.20%, +0.58%
Max live registers: 1072491 -> 1046176 (-2.45%)
Max dispatch width: 76672 -> 84144 (+9.75%); split: +14.04%, -4.30%
Tiger Lake
Totals:
Instrs: 235190395 -> 235193286 (+0.00%); split: -0.00%, +0.00%
Cycle count: 23130855720 -> 23128936334 (-0.01%); split: -0.02%, +0.01%
Max live registers: 41644106 -> 41620052 (-0.06%)
Max dispatch width: 6959160 -> 6981512 (+0.32%); split: +0.34%, -0.02%
Totals from 15102 (1.90% of 793371) affected shaders:
Instrs: 5771042 -> 5773933 (+0.05%); split: -0.06%, +0.11%
Cycle count: 371062226 -> 369142840 (-0.52%); split: -1.04%, +0.52%
Max live registers: 989858 -> 965804 (-2.43%)
Max dispatch width: 61344 -> 83696 (+36.44%); split: +38.42%, -1.98%
Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 236063150 -> 236063242 (+0.00%); split: -0.00%, +0.00%
Cycle count: 24516187174 -> 24516027518 (-0.00%); split: -0.00%, +0.00%
Spill count: 567071 -> 567049 (-0.00%)
Fill count: 701323 -> 701273 (-0.01%)
Max live registers: 41914047 -> 41913281 (-0.00%)
Max dispatch width: 7042608 -> 7042736 (+0.00%); split: +0.00%, -0.00%
Totals from 3904 (0.49% of 798473) affected shaders:
Instrs: 2809690 -> 2809782 (+0.00%); split: -0.02%, +0.03%
Cycle count: 182114259 -> 181954603 (-0.09%); split: -0.34%, +0.25%
Spill count: 1696 -> 1674 (-1.30%)
Fill count: 2523 -> 2473 (-1.98%)
Max live registers: 341695 -> 340929 (-0.22%)
Max dispatch width: 32752 -> 32880 (+0.39%); split: +0.44%, -0.05%
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097>
2024-10-15 15:51:22 -07:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
return component(dst, 0);
|
|
|
|
|
}
|
2024-09-06 15:15:19 -07:00
|
|
|
|
2024-09-18 18:11:13 -07:00
|
|
|
brw_reg
|
|
|
|
|
LOAD_REG(const brw_reg &src0, brw_inst **out = NULL) const
|
|
|
|
|
{
|
|
|
|
|
/* LOAD_REG is a raw, bulk copy of one VGRF to another. The type is
|
|
|
|
|
* irrelevant. The pass that inserts LOAD_REG to encourage results to be
|
|
|
|
|
* defs will force all types to be integer types. Forcing the type to
|
|
|
|
|
* always be integer here helps with uniformity, and it will also help
|
|
|
|
|
* implement unit tests that want to compare two shaders for equality.
|
|
|
|
|
*/
|
|
|
|
|
brw_reg_type t = brw_type_with_size(BRW_TYPE_UD,
|
|
|
|
|
brw_type_size_bits(src0.type));
|
|
|
|
|
brw_reg dst = retype(brw_allocate_vgrf_units(*shader,
|
|
|
|
|
shader->alloc.sizes[src0.nr]),
|
|
|
|
|
t);
|
|
|
|
|
|
|
|
|
|
assert(src0.file == VGRF);
|
|
|
|
|
assert(shader->alloc.sizes[dst.nr] == shader->alloc.sizes[src0.nr]);
|
|
|
|
|
|
|
|
|
|
brw_inst *inst = emit(SHADER_OPCODE_LOAD_REG, dst, retype(src0, t));
|
|
|
|
|
|
|
|
|
|
inst->size_written = REG_SIZE * shader->alloc.sizes[src0.nr];
|
|
|
|
|
|
|
|
|
|
assert(shader->alloc.sizes[inst->dst.nr] * REG_SIZE == inst->size_written);
|
|
|
|
|
assert(!inst->is_partial_write());
|
|
|
|
|
|
|
|
|
|
if (out) *out = inst;
|
|
|
|
|
return retype(inst->dst, src0.type);
|
|
|
|
|
}
|
|
|
|
|
|
2024-12-07 10:25:45 -08:00
|
|
|
brw_shader *shader;
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2025-02-13 19:20:38 -08:00
|
|
|
brw_inst *BREAK() const { return emit(BRW_OPCODE_BREAK); }
|
2025-04-02 23:23:31 -07:00
|
|
|
brw_inst *ELSE() const { return emit(BRW_OPCODE_ELSE); }
|
2025-02-13 19:20:38 -08:00
|
|
|
brw_inst *ENDIF() const { return emit(BRW_OPCODE_ENDIF); }
|
|
|
|
|
brw_inst *NOP() const { return emit(BRW_OPCODE_NOP); }
|
|
|
|
|
brw_inst *CONTINUE() const { return emit(BRW_OPCODE_CONTINUE); }
|
2023-11-03 21:50:18 -07:00
|
|
|
|
2025-04-02 23:23:31 -07:00
|
|
|
brw_inst *
|
|
|
|
|
IF(brw_predicate predicate = BRW_PREDICATE_NORMAL) const
|
|
|
|
|
{
|
|
|
|
|
return set_predicate(predicate, emit(BRW_OPCODE_IF));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
brw_inst *
|
|
|
|
|
WHILE(brw_predicate predicate = BRW_PREDICATE_NONE) const
|
|
|
|
|
{
|
|
|
|
|
return set_predicate(predicate, emit(BRW_OPCODE_WHILE));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void
|
|
|
|
|
DO() const
|
|
|
|
|
{
|
2025-02-13 18:38:00 -08:00
|
|
|
emit(BRW_OPCODE_DO);
|
|
|
|
|
/* Ensure that there'll always be a block after DO to add
|
|
|
|
|
* instructions and serve as sucessor for predicated WHILE
|
|
|
|
|
* and CONTINUE.
|
|
|
|
|
*
|
|
|
|
|
* See more details in brw_cfg::validate().
|
|
|
|
|
*/
|
|
|
|
|
emit(SHADER_OPCODE_FLOW);
|
|
|
|
|
}
|
|
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
bool has_writemask_all() const {
|
|
|
|
|
return force_writemask_all;
|
|
|
|
|
}
|
2024-11-13 11:26:53 +02:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
private:
|
|
|
|
|
/**
|
|
|
|
|
* Workaround for negation of UD registers. See comment in
|
|
|
|
|
* brw_generator::generate_code() for more details.
|
|
|
|
|
*/
|
|
|
|
|
brw_reg
|
|
|
|
|
fix_unsigned_negate(const brw_reg &src) const
|
|
|
|
|
{
|
|
|
|
|
if (src.type == BRW_TYPE_UD &&
|
|
|
|
|
src.negate) {
|
|
|
|
|
brw_reg temp = vgrf(BRW_TYPE_UD);
|
|
|
|
|
MOV(temp, src);
|
|
|
|
|
return brw_reg(temp);
|
|
|
|
|
} else {
|
|
|
|
|
return src;
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
}
|
2024-12-29 16:06:27 -08:00
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/**
|
|
|
|
|
* Workaround for source register modes not supported by the ternary
|
|
|
|
|
* instruction encoding.
|
|
|
|
|
*/
|
|
|
|
|
brw_reg
|
|
|
|
|
fix_3src_operand(const brw_reg &src) const
|
|
|
|
|
{
|
|
|
|
|
switch (src.file) {
|
|
|
|
|
case FIXED_GRF:
|
|
|
|
|
/* FINISHME: Could handle scalar region, other stride=1 regions */
|
|
|
|
|
if (src.vstride != BRW_VERTICAL_STRIDE_8 ||
|
|
|
|
|
src.width != BRW_WIDTH_8 ||
|
|
|
|
|
src.hstride != BRW_HORIZONTAL_STRIDE_1)
|
2019-04-18 14:29:03 -07:00
|
|
|
break;
|
2024-12-29 16:06:27 -08:00
|
|
|
FALLTHROUGH;
|
|
|
|
|
case ATTR:
|
|
|
|
|
case VGRF:
|
|
|
|
|
case UNIFORM:
|
|
|
|
|
case IMM:
|
|
|
|
|
return src;
|
|
|
|
|
default:
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
brw_reg expanded = vgrf(src.type);
|
|
|
|
|
MOV(expanded, src);
|
|
|
|
|
return expanded;
|
|
|
|
|
}
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-06 22:01:18 -08:00
|
|
|
void shuffle_from_32bit_read(const brw_reg &dst,
|
|
|
|
|
const brw_reg &src,
|
|
|
|
|
uint32_t first_component,
|
|
|
|
|
uint32_t components) const;
|
|
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
bblock_t *block;
|
2025-07-28 16:07:44 -04:00
|
|
|
brw_exec_node *cursor;
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
unsigned _dispatch_width;
|
|
|
|
|
unsigned _group;
|
|
|
|
|
bool force_writemask_all;
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
2024-12-29 16:06:27 -08:00
|
|
|
/** Debug annotation info. */
|
|
|
|
|
struct {
|
|
|
|
|
const char *str;
|
|
|
|
|
} annotation;
|
|
|
|
|
};
|
i965/fs: Introduce FS IR builder.
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-22 14:02:47 +03:00
|
|
|
|
brw: Basic infrastructure to store convergent values as scalars
In SIMD16 and SIMD32, storing convergent values in full 16- or
32-channel registers is wasteful. It wastes register space, and in most
cases on SIMD32, it wastes instructions. Our register allocator is not
clever enough to handle scalar allocations. It's fundamental unit of
allocation is SIMD8. Start treating convergent values as SIMD8.
Add a tracking bit in brw_reg to specify that a register represents a
convergent, scalar value. This has two implications:
1. All channels of the SIMD8 register must contain the same value. In
general, this means that writes to the register must be
force_writemask_all and exec_size = 8;
2. Reads of this register can (and should) use <0,1,0> stride. SIMD8
instructions that have restrictions on source stride can us <8,8,1>.
Values that are vectors (e.g., results of load_uniform or texture
operations) will be stored as multiple SIMD8 hardware registers.
v2: brw_fs_opt_copy_propagation_defs fix from Ken. Fix for Xe2.
v3: Eliminte offset_to_scalar(). Remove mention of vec4 backend in
brw_reg.h. Both suggested by Caio. The offset_to_scalar() change
necessitates some trickery in the fs_builder offset() function, but I
think this is an improvement overall. There is also some rework in
find_value_for_offset to account for the possibility that is_scalar
sources in LOAD_PAYLOAD might be <8;8,1> or <0;1,0>.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-02-09 17:12:11 -08:00
|
|
|
/**
|
|
|
|
|
* Offset by a number of components into a VGRF
|
|
|
|
|
*
|
|
|
|
|
* It is assumed that the VGRF represents a vector (e.g., returned by
|
|
|
|
|
* load_uniform or a texture operation). Convergent and divergent values are
|
|
|
|
|
* stored differently, so care must be taken to offset properly.
|
|
|
|
|
*/
|
2024-06-18 23:42:59 -07:00
|
|
|
static inline brw_reg
|
2024-12-29 16:06:27 -08:00
|
|
|
offset(const brw_reg ®, const brw_builder &bld, unsigned delta)
|
2023-11-21 07:49:02 -08:00
|
|
|
{
|
brw: Basic infrastructure to store convergent values as scalars
In SIMD16 and SIMD32, storing convergent values in full 16- or
32-channel registers is wasteful. It wastes register space, and in most
cases on SIMD32, it wastes instructions. Our register allocator is not
clever enough to handle scalar allocations. It's fundamental unit of
allocation is SIMD8. Start treating convergent values as SIMD8.
Add a tracking bit in brw_reg to specify that a register represents a
convergent, scalar value. This has two implications:
1. All channels of the SIMD8 register must contain the same value. In
general, this means that writes to the register must be
force_writemask_all and exec_size = 8;
2. Reads of this register can (and should) use <0,1,0> stride. SIMD8
instructions that have restrictions on source stride can us <8,8,1>.
Values that are vectors (e.g., results of load_uniform or texture
operations) will be stored as multiple SIMD8 hardware registers.
v2: brw_fs_opt_copy_propagation_defs fix from Ken. Fix for Xe2.
v3: Eliminte offset_to_scalar(). Remove mention of vec4 backend in
brw_reg.h. Both suggested by Caio. The offset_to_scalar() change
necessitates some trickery in the fs_builder offset() function, but I
think this is an improvement overall. There is also some rework in
find_value_for_offset to account for the possibility that is_scalar
sources in LOAD_PAYLOAD might be <8;8,1> or <0;1,0>.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-02-09 17:12:11 -08:00
|
|
|
/* If the value is convergent (stored as one or more SIMD8), offset using
|
|
|
|
|
* SIMD8 and select component 0.
|
|
|
|
|
*/
|
|
|
|
|
if (reg.is_scalar) {
|
|
|
|
|
const unsigned allocation_width = 8 * reg_unit(bld.shader->devinfo);
|
|
|
|
|
|
|
|
|
|
brw_reg offset_reg = offset(reg, allocation_width, delta);
|
|
|
|
|
|
|
|
|
|
/* If the dispatch width is larger than the allocation width, that
|
|
|
|
|
* implies that the register can only be used as a source. Otherwise the
|
|
|
|
|
* instruction would write past the allocation size of the register.
|
|
|
|
|
*/
|
|
|
|
|
if (bld.dispatch_width() > allocation_width)
|
|
|
|
|
return component(offset_reg, 0);
|
|
|
|
|
else
|
|
|
|
|
return offset_reg;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Offset to the component assuming the value was allocated in
|
|
|
|
|
* dispatch_width units.
|
|
|
|
|
*/
|
2023-11-21 07:49:02 -08:00
|
|
|
return offset(reg, bld.dispatch_width(), delta);
|
|
|
|
|
}
|
2024-12-06 21:46:48 -08:00
|
|
|
|
|
|
|
|
brw_reg brw_sample_mask_reg(const brw_builder &bld);
|
2024-12-07 00:23:07 -08:00
|
|
|
void brw_emit_predicate_on_sample_mask(const brw_builder &bld, brw_inst *inst);
|
2024-12-06 21:46:48 -08:00
|
|
|
|
|
|
|
|
brw_reg
|
|
|
|
|
brw_fetch_payload_reg(const brw_builder &bld, uint8_t regs[2],
|
|
|
|
|
brw_reg_type type = BRW_TYPE_F,
|
|
|
|
|
unsigned n = 1);
|
|
|
|
|
|
|
|
|
|
brw_reg
|
|
|
|
|
brw_fetch_barycentric_reg(const brw_builder &bld, uint8_t regs[2]);
|
|
|
|
|
|
|
|
|
|
void
|
|
|
|
|
brw_check_dynamic_msaa_flag(const brw_builder &bld,
|
|
|
|
|
const struct brw_wm_prog_data *wm_prog_data,
|
|
|
|
|
enum intel_msaa_flags flag);
|