Skip to content

ormqr/ormlq/ormrq/ormql, unmqr/unmlq/unmrq/unmql: fix minimum workspa…#1292

Open
jschueller wants to merge 1 commit into
Reference-LAPACK:masterfrom
jschueller:issue546
Open

ormqr/ormlq/ormrq/ormql, unmqr/unmlq/unmrq/unmql: fix minimum workspa…#1292
jschueller wants to merge 1 commit into
Reference-LAPACK:masterfrom
jschueller:issue546

Conversation

@jschueller

Copy link
Copy Markdown
Contributor

…ce for tiny M,N,K

The workspace query formula LWKOPT = NWNB + TSIZE used TSIZE = LDTNBMAX = 6564 = 4160, a hardcoded constant. The blocked algorithm only stores one T matrix block at a time (reused across loop iterations), so the per-iteration workspace is LDTNB, not LDT*NBMAX. For tiny M,N,K where a single block suffices (NB >= N or NB >= K), LWKOPT was always >= 4160 regardless of problem size.

Fix: change LWKOPT = NWNB + TSIZE -> LWKOPT = NWNB + LDT*NB.

The NB adjustment formula when LWORK is limited must consistently use the per-iteration T storage instead of TSIZE:

NB = (LWORK - TSIZE) / LDWORK -> NB = LWORK / (LDWORK + LDT)

Applied to all 16 routines (s,d,c,z x {orm,unm}{qr,rq,lq,ql}).

Closes #546

…ce for tiny M,N,K

The workspace query formula LWKOPT = NW*NB + TSIZE used TSIZE = LDT*NBMAX
= 65*64 = 4160, a hardcoded constant.  The blocked algorithm only stores
one T matrix block at a time (reused across loop iterations), so the
per-iteration workspace is LDT*NB, not LDT*NBMAX.  For tiny M,N,K where
a single block suffices (NB >= N or NB >= K), LWKOPT was always >= 4160
regardless of problem size.

Fix: change LWKOPT = NW*NB + TSIZE  ->  LWKOPT = NW*NB + LDT*NB.

The NB adjustment formula when LWORK is limited must consistently use
the per-iteration T storage instead of TSIZE:

  NB = (LWORK - TSIZE) / LDWORK  ->  NB = LWORK / (LDWORK + LDT)

Applied to all 16 routines (s,d,c,z x {orm,unm}{qr,rq,lq,ql}).

Closes Reference-LAPACK#546
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

x{OR,UN}M{QR,RQ,LQ,QL} optimal workspace size is always > 4096

1 participant