nv50/ir: set number of threads/block for variable local size

When a variable local size is defined as specified by
ARB_compute_variable_group_size, the fixed local size is set to 0
and a SIGFPE occurs when we compute the maximum number of regs.

This allows to use 64 GPRs/thread.

v4: - use 512 threads on Fermi, 1024 on Kepler+

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This commit is contained in:
Samuel Pitoiset 2016-09-07 00:12:51 +02:00
parent 590734fa0d
commit 11e75fffeb

View file

@ -175,6 +175,8 @@ public:
virtual void parseDriverInfo(const struct nv50_ir_prog_info *info) {
threads = info->prop.cp.numThreads;
if (threads == 0)
threads = info->target >= NVISA_GK104_CHIPSET ? 1024 : 512;
}
virtual bool runLegalizePass(Program *, CGStage stage) const = 0;