For the SAMPLE_POS and SAMPLE_INFO opcodes, clarify resource vs. render
target queries, range of postion values, swizzling, etc. We basically
follow the DX10.1 conventions.
For the TXQS opcode and TGSI_SEMANTIC_SAMPLEID, clarify return value
and type.
For the TGSI_SEMANTIC_SAMPLEPOS system value, clarify the range of
positions returned.
v2: use 'undef' for unused vector components. Use (0.5, 0.5, undef, undef)
for sample pos when MSAA not applicable.
v3: Add note that OPCODE_SAMPLE_INFO, OPCODE_SAMPLE_POS are not used yet
and the information is subject to change.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
I've since discovered the fragment shader sample mask system value (which
corresponds to gl_SampleMaskIn).
v2: It's a system value, not a shader input.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Depending on pipe caps they can be writable in all vertex processing
stages, but only the output of the last stage counts.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
These can operate on MEMORY[], in addition to BUFFER[] and IMAGE[]
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
v2 (Nicolai):
- BALLOT isn't per-channel
- expand the documentation (also for VOTE_*)
v3:
- only BALLOT returns a 64-bit lanemask (Boyan)
- relax the requirement on READ_INVOC: the invocation number to read
from must be uniform within a sub-group. This matches the
GL_ARB_shader_ballot spect (and the v_readlane instruction of AMD
GCN)
v4:
- hopefully really fix the doc of VOTE_* returns (Ilia)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
Currently the GLSL-to-TGSI translation pass assumes it can use
floating point source modifiers on the UCMP instruction. See the bug
report linked below for an example where an unrelated change in the
GLSL built-in lowering code for atan2 (e9ffd12827)
caused the generation of floating-point ir_unop_neg instructions
followed by ir_triop_csel, which is translated into UCMP with a negate
modifier on back-ends with native integer support.
Allowing floating-point source modifiers on an integer instruction
seems like rather dubious design for a transport IR, since the same
semantics could be represented as a sequence of MOV+UCMP instructions
instead, but supposedly this matches the expectations of TGSI
back-ends other than tgsi_exec, and the expectations of the DX10 API.
I take no responsibility for future headaches caused by this
inconsistency.
Fixes a regression of piglit glsl-fs-tan-1 on softpipe introduced by
the above-mentioned glsl front-end commit. Even though the commit
that triggered the regression doesn't seem to have made it to any
stable branches yet, this might be worth back-porting since I don't
see any reason why the bug couldn't have been reproduced before that
point.
Suggested-by: Roland Scheidegger <sroland@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99817
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
src/gallium/docs/source/tgsi.rst:3488: WARNING: Title underline too short.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Without these, mathjax considers these as the continuation of the
previous line.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Not used and not widely supported. Use MIN+MAX instead.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will be useful for proper D3D9 emulation, where this behavior is
expected by some shaders.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Double-precision division, to allow more precision than a DRCP + DMUL
sequence.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
As previously written, these opcodes use the SM5 semantics which is
incompatible with GLSL when bits == 0, offset == 32.
At some point we may want to add BFI_SM5 etc. opcodes, but all users
currently either want (and expect!) the GLSL semantics or don't care.
Bitfield inserts are generated by the GLSL lower_instructions and
lower_packing_builtins passes with constant bits and offset arguments,
so any workaround code that drivers may have to emit to follow GLSL
semantics should be optimized away easily for those uses.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This just adds the basic support for 64-bit opcodes,
and the new types.
v2: add conversion opcodes.
add documentation.
v3:
- make docs more consistent
- change TGSI_OPCODE_I2U64 to TGSI_OPCODE_U2I64
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Add a new WORK_DIM SV type, this is will return the grid dimensions
(1-4) for compute (opencl) kernels.
This is necessary to implement the opencl get_work_dim() function.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This isn't used anymore in the tree, culldist's
are part of the clipdist semantic, we could in theory
rename it, but I'm not sure there is much point, and
I'd have to be careful with virgl.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The llvm TGSI backend uses pointers in registers and does things
like:
LOAD TEMP[0].y, MEMORY[0], TEMP[0]
Expecting the data at address TEMP[0].x to get loaded to
TEMP[0].y. But this will cause the data at TEMP[0].x + 4 to be
loaded instead.
This commit adds support for a swizzle suffix for the 1st source
operand, which allows using:
LOAD TEMP[0].y, MEMORY[0].xxxx, TEMP[0]
And actually getting the desired behavior
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The value 0 for unknown has been chosen to so that
drivers using tgsi_scan_shader do not need to detect
missing properties if they zero-initialize the struct.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Radeonsi needs to know which shader stage will execute after a shader
in order to make the best decision about which shader variant to compile
first.
This is only set for VS and TES, because we don't need it elsewhere.
VS has 3 variants:
- next shader is FS
- next shader is GS
- next shader is TCS
TES has 2 variants:
- next shader is FS
- next shader is GS
Currently, radeonsi always assumes the next shader is FS, which is suboptimal,
since st/mesa always knows which shader is next if the GLSL program is not
a "separate shader".
By default, ureg always sets "next shader is FS".
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
v1 -> v2: add defines for the various bits
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Specify that the operation only applies to the x component, not
per-component as previously specified. This is unnecessary for GL and
creates additional complications for images which need to support these
operations as well.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The TGSI usage mask can't be used, because these are declared as an output
array of 2 elements.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>