The status quo is quite the mess:
1. tgsi_exec will do a per-channel computation, and store the dst[0]
result (significand) correctly for each channel. The dst[1] result
(exponent) will be written to the first bit set in the writemask.
So per-component calculation only works partially.
2. r600 will only do a single computation. It will replicate the
exponent but not the significand.
3. The docs pretend that there's per-component calculation, but even
get dst[0] and dst[1] confused.
4. Luckily, st_glsl_to_tgsi only ever emits single-component instructions,
and kind-of assumes that everything is replicated, generating this for
the dvec4 case:
DFRACEXP TEMP[0].xy, TEMP[1].x, CONST[0][0].xyxy
DFRACEXP TEMP[0].zw, TEMP[1].y, CONST[0][0].zwzw
DFRACEXP TEMP[2].xy, TEMP[1].z, CONST[0][1].xyxy
DFRACEXP TEMP[2].zw, TEMP[1].w, CONST[0][1].zwzw
Settle on the simplest behavior, which is single-component calculation
with replication, document it, and adjust tgsi_exec and r600.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Sourcing the exponent for the zw destination pair from Z is consistent
with both tgsi_exec and gallivm. In practice, st_glsl_to_tgsi always
generates per-channel instructions anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
It adds reference links for arguments usage and bind of resource_create().
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previous get_paramf links same as get_param. It changes the reference link to
PIPE_CAPF_*
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Denotes availability of 64bit int atomic instructions
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
To be able to properly distinguish between GL_ANY_SAMPLES_PASSED
and GL_ANY_SAMPLES_PASSED_CONSERVATIVE.
This patch goes through all drivers, having them treat the two
query types identically, except:
1. radeon incorrectly enabled conservative mode on
PIPE_QUERY_OCCLUSION_PREDICATE. We now do it correctly, only
on PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE.
2. st/mesa uses the new query type.
Fixes dEQP-GLES31.functional.fbo.no_attachments.*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Most older drivers seem to just ignore the Dimension setting, so virtually
no changes should be needed.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
A previous expression presents same as TGSI_SEMANTIC_SUBGROUP_GT_MASK.
It fixes a direction of an inequality for TGSI_SEMANTIC_SUBGROUP_LT_MASK.
before:
bit index > TGSI_SEMANTIC_SUBGROUP_INVOCATION
after:
bit index < TGSI_SEMANTIC_SUBGROUP_INVOCATION
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Both the GLSL 4.00 specs and DX10.1 specs specify that if a fragment
shader uses the sample ID or sample position inputs, the shader is
automatically run at per sample frequency. Document that expectation
for gallium fragment shaders.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This can be used to guard support for EXT_memory_object and related
extensions.
v2: update gallium docs
v3 (Timothy Arceri):
- add cap to nv50
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
v2: rename cap to PIPE_CAP_QUERY_SO_OVERFLOW and be a bit more explicit
in the documentation
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Commit 8aba778fa2 "st/mesa: don't set
sampler states for TBOs" changed how texture buffer objects are handled.
Document the new convention.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
For the SAMPLE_POS and SAMPLE_INFO opcodes, clarify resource vs. render
target queries, range of postion values, swizzling, etc. We basically
follow the DX10.1 conventions.
For the TXQS opcode and TGSI_SEMANTIC_SAMPLEID, clarify return value
and type.
For the TGSI_SEMANTIC_SAMPLEPOS system value, clarify the range of
positions returned.
v2: use 'undef' for unused vector components. Use (0.5, 0.5, undef, undef)
for sample pos when MSAA not applicable.
v3: Add note that OPCODE_SAMPLE_INFO, OPCODE_SAMPLE_POS are not used yet
and the information is subject to change.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Whether bindless texture operations are supported by the
underlying driver.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
I've since discovered the fragment shader sample mask system value (which
corresponds to gl_SampleMaskIn).
v2: It's a system value, not a shader input.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The next patch will use it. This is really for svga and GL2-level drivers.
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
pipe_draw_info::indexed is replaced with index_size. index_size == 0 means
non-indexed.
Instead of pipe_index_buffer::offset, pipe_draw_info::start is used.
For indexed indirect draws, pipe_draw_info::start is added to the indirect
start. This is the only case when "start" affects indirect draws.
pipe_draw_info::index is a union. Use either index::resource or
index::user depending on the value of pipe_draw_info::has_user_indices.
v2: fixes for nine, svga
Depending on pipe caps they can be writable in all vertex processing
stages, but only the output of the last stage counts.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
These can operate on MEMORY[], in addition to BUFFER[] and IMAGE[]
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
v2 (Nicolai):
- BALLOT isn't per-channel
- expand the documentation (also for VOTE_*)
v3:
- only BALLOT returns a 64-bit lanemask (Boyan)
- relax the requirement on READ_INVOC: the invocation number to read
from must be uniform within a sub-group. This matches the
GL_ARB_shader_ballot spect (and the v_readlane instruction of AMD
GCN)
v4:
- hopefully really fix the doc of VOTE_* returns (Ilia)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)