For sw cursor we do not tell wine the cursor position (the app
tells us directly). We shouldn't use ID3DPresent_GetCursorPos.
device->cursor.pos already contains the coordinates the app
gave us.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: David Heidelberg <david@ixit.cz>
We have either hardware cursor or software cursor.
When we use software cursor, we should hide the hardware
cursor.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: David Heidelberg <david@ixit.cz>
D3DRS_DITHERENABLE was assigned to the rasterizer state
group, but it was used for the blend group.
Assign it to the blend group.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
When using D3DRS_POINTSIZE make sure the value is at least
D3DRS_POINTSIZE_MIN but not greater than D3DRS_POINTSIZE_MAX.
Fixes some Wine tests.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Align texture memory on 32 byte boundry to allow
SSE/AVX memcpy to work on locked rects.
This fixes some crashes with games using SSE.
Reviewed-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
We had red and green in the wrong channels
for the ATI2 format (RGTC2).
Found thanks to wine tests.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: David Heidelberg <david@ixit.cz>
Add support for multiple cards and fill in Win
like card name, driver name and version info.
Use fallback for unknown vendors and unknown card names.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
There is no MDOperand in llvm 3.5.
v2: Check if kernel metadata is present to avoid crash (EdB).
v3: Second attempt to avoid crash: switch off metadata query for llvm < 3.6.
Reviewed-by: Serge Martin (EdB) <edb+mesa@sigluy.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We're generating rcps as part of backend lowering of the packed coordinate
in the CS, and we don't want to lower them in NIR because of the extra
newton-raphson steps in the common case. However, GLB2.7 is moving a
vertex attribute with a 1.0 W component to the position, and that makes us
produce some silly RCPs.
total instructions in shared programs: 97590 -> 97580 (-0.01%)
instructions in affected programs: 74 -> 64 (-13.51%)
I had QPU emit code to do it, but forgot to flag the register class.
total instructions in shared programs: 97974 -> 97590 (-0.39%)
instructions in affected programs: 25291 -> 24907 (-1.52%)
Now that we do non-SSA QIR instructions, we can take a NIR SSA src that's
only used by the unorm packing and just stuff the pack bits into it.
total instructions in shared programs: 98136 -> 97974 (-0.17%)
instructions in affected programs: 4149 -> 3987 (-3.90%)
This helps ensure that the register allocator doesn't force the later pack
operations to insert extra MOVs.
total instructions in shared programs: 98170 -> 98159 (-0.01%)
instructions in affected programs: 2134 -> 2123 (-0.52%)
Now that we have NIR, most of the optimization we still need to do is
peepholes on instruction selection rather than general dataflow
operations. This means we want to be able to have QIR be a lot closer to
the actual QPU instructions, just with virtual registers. Allowing
multiple instructions writing the same register opens up a lot of
possibilities.
Due to a quirk in how the nv50 opt passes run, the algebraic
optimization that looks for these BFE's happens before the constant
folding pass. Rearranging these passes isn't a great idea, but this is
easy enough to fix. Allows a following cvt to eliminate the bfe in
certain situations.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
FLT_TO_INT goes in the vector pipes on evergreen/NI,
not the trans unit as on earlier chips.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This struct was getting a bit crowded, following the lead of
radeonsi, mirror the idea of having sub-structures for each
shader type. Turning 'r600_shader_key' into an union saves
some trivial memory and CPU cycles for the shader keys.
[airlied: drop as_ls, and reorder so larger fields at start.]
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This happens with unpackSnorm lowering. There's yet another
bitfield-extract behind it, but there's too much variation to be worth
cutting through.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
unpackUnorm* lowering doesn't AND the high byte/word as it's
unnecessary. Detect that situation as well.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Some Unigine shaders have been observed to unpack bytes out of 32-bit
integers and convert them to floats. I2F/I2I can handle this sort of
thing directly. Detect the handleable situations.
This misses 16-bit word capabilities in nv50, but I haven't seen shaders
that would actually make use of that.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Some shaders appear to extract bits using shift/and combos. Detect
(some) of those and convert to EXTBF instead.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
If build with C++11 standard, use std::unordered_set.
Otherwise if build on old Android version with stlport,
use std::tr1::unordered_set with a wrapper class.
Otherwise use std::tr1::unordered_set.
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
If the 32-bit types overflowed, the driver could submit an IB that uses much
more memory than is available.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Since i965 is now using make_reg_conflicts_transitive and doesn't need
q-value computations, they are disabled on i965. They are enabled
everywhere else so that they get the old behavior. This reduces the time
spent in eglInitialize() on BDW by around 10-15%.
Reviewed-by: Eric Anholt <eric@anholt.net>
To properly support the case of waiting on a fence with a 0 timeout, we
still need to call down to the kernel. Which requires the use of the
new fd_pipe_wait_timeout() API.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Don't take current timestamp/fence from current ring, as we might have
already rolled over to new rb.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Since 4d7e0fa8c7 this file is generated by the configure script.
Reviewed-by: Tapani Palli <tapani.palli@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Tessellation control shaders write to outputs as OUT[ADDR[0].x][0], make
sure to parse the indirect dimension on outputs.
Also tess control inputs/outputs and tess eval input declarations need
to receive the same treatment as geometry shader inputs.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This only appears in cubemaps which have have packed layers, so are very
sensitive to any layout disagreement between sw and hw.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The hardware checks for multisampling being enabled, but does not have
the rule about cbuf0 being an integer format. Only enable
alpha-to-one/alpha-to-coverage if cbuf0 is not an integer format.
Fixes piglits
ext_framebuffer_multisample-int-draw-buffers-alpha-to-one
ext_framebuffer_multisample-int-draw-buffers-alpha-to-coverage
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
GK110/GK208 have 256 registers, not 64. Find out the number of registers
from the target to avoid unnecessary iteration for pre-GK110.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
There are separate line widths for smooth and aliased lines. The smooth
one is selected when multisampling is enabled even if line smoothing
isn't explicitly turned on.
Fixes the ext_framebuffer_multisample-line-smooth piglits
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Apparently this is necessary in order for tess factors to work in a tess
eval program without a tess control program bound. Probably because it
uses the fake program's shader header to work out the number of patch
constants.
Fixes vs-tes-tessinner-tessouter-inputs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
There's a lot of functionality duplicated in the gm107 lowering pass
from the nvc0 pass. As that one gets updated, the gm107 one falls
behind. Avoid this by sharing the code.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This fixes arb_get_texture_sub_image-get, and any situation where the 2d
engine was being used for multi-layer blits to a non-0 level.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>