Fast orbital-dependent exchange
Simen Reine
Centre for Theoretical and Computational Chemistry (CTCC),Department of Chemistry, University of Oslo, Norway
CTCC Seminar, Tromsø
February 13, 2015
Simen Reine (CTCC, University of Oslo) February 13, 2015 1 / 29
Outline DFT – MP2 electrostatic potential
B3LYP-MP2 cam-B3LYP-MP2
Kristensen et al., Phys. Chem. Chem. Phys., (2012)Kohn-Sham DFT
Why exchange?
RI approximation
ADMM exchange
Results
Simen Reine (CTCC, University of Oslo) February 13, 2015 2 / 29
Kohn-Sham DFTDensity-functional theory P. Hohenberg and W. Kohn, Phys. Rev. B. 136, 864 (1964)
wave function Ψ(x1, x2, . . . , xN ) replaced by electron density ρ(r)
HΨ = EΨ→ E [v ] = infρ
(F [ρ] +
∫v(r)ρ(r)dr
)the universal density functional F [ρ] is unknown
Kohn-Sham DFT L. J. Sham and W. Kohn, Phys. Rev. A. 140, 1133 (1965)
density represented by a single Slater determinant ΨKS = |φ1, . . . , φN |
F [ρ] = TS[ρ] + J[ρ] + X [ρ] + C[ρ] + ∆T [ρ]
≈ TS[φ] + J[ρ]− µK [φ] + XC[ρ]
reintroduce orbitals to get good estimate of the kinetic energyfor pure DFT there is no orbital-dependent exchange (µ = 0)XC[ρ] local in ρ and∇ρfor range-separated functionals, K and X are separated into short- and long-range contributionsby the operator splitting
1r12
=erf(ωr12)
r12+
erfc(ωr12)
r12
orbital-dependent exchange especially important for the long range/large systems
Simen Reine (CTCC, University of Oslo) February 13, 2015 3 / 29
Kohn-Sham DFT
LCAO, AO density matrix
φi =occ∑
i
Cai a(r), Dab =occ∑
i
Cai Cbi
Iterative SCF optimization procedure
F (k)(D(k))→ D(k+1), Fab(D) =
dEdDab
Kohn-Sham matrix
Fab(D) = hab + Jab(ρ)− µKab(D) + XCab(D)
Four bottlenecksCoulomb J(D)exchange K (D)exchange-correlation XC(D)wave-function optimization
Jab =∑
cd
(ab|cd)Dcd
Kab =∑
cd
(ac|bd)Dcd
XCab =∑
g
wgvXC(rg)a(rg)b(rg)
(ab|cd) =∑
tEab
t∑u
Ecdu (t|u)
Jab =∑
tEab
t∑u
Fu(t|u), J-engine
Fu =∑
cd
Ecdu Dcd
L. E. McMurchie and E. R. Davidson, J. Comp. Phys. 26, 218 (1978)
S. Reine et. al Phys Chem Chem Phys, 9, 4771 (2007)
S. Reine, T. Helgaker, R. Lindh, WIREs Comput Mol Sci 2, 290 (2012)
G. R. Ahmadi and J. Almlof, Chem. Phys. Lett. 246, 364 (1995)
C. A. White and M. Head-Gordon, J. Chem. Phys. 104, 2620 (1996)
Simen Reine (CTCC, University of Oslo) February 13, 2015 4 / 29
Linear scaling
Cauchy-Schwartz screening M. Haser and R. Ahlrichs, J. Comp. Chem. 10, 104 (1989)
|(ab|cd)| ≤√
(ab|ab)(cd |cd)
reduces scaling from O(N4) to O(N2)
LinK/ONX for exchange C. Ochsenfeld, C. A. White and M. Head-Gordon, J. Chem. Phys. 109, 1663 (1998) , E. Schwegler, M.
Challacombe and M. Head-Gordon, J. Chem. Phys. 106, 9708 (1997)
Kab =∑cd
(ac|bd)Dcd
Linear scaling Kohn-Sham DFT Cauchy-Schwartz screening M. Häser and R. Ahlrichs, J. Comp. Chem. 10, 104 (1989)
LinK for exchange C. Ochsenfeld, C. A. White and M. Head-Gordon, J. Chem. Phys. 109, 1663 (1998) E. Schwegler, M. Challacombe and M. Head-Gordon, J. Chem. Phys. 106, 9708 (1997)
FMM for Coulomb C. A. White, B. G. Johnson, P. M. W. Gill and M. Head-Gordon, Chem. Phys. Lett. 230, 8 (1994)
XC by atomic grids - linear scaling in nature A. D. Becke, J. Chem. Phys. 88, 2547 (1988), J. M. Perez-Jorda, W. Jang, Chem. Phys. Lett. 241 (1995) 469, O. Treutler, R. Ahlrichs, J. Chem. Phys. 102, 346 (1995)
Wave-function optimization - linear scaling for sparse matrices P. Sałek, et. al, J. Chem. Phys. 126, 114110 (2007)
€
(ab | cd) ≤ (ab | ab) (cd | cd)
€
Kab = (ac |bd)Dcdcd∑
€
(ab | cd) = qlmab (P)Tlm,l 'm ' (P,Q)ql 'm'
cd (Q)lm,l 'm'∑
b a c d
FMM for Coulomb C. A. White, B. G. Johnson, P. M. W. Gill and M. Head-Gordon, Chem. Phys. Lett. 230, 8 (1994)
(ab|cd) =∑
lm,l′n′qab
lm (P)Tlm,l′m′ (P,Q)qcdl′m′ (Q)
XC by atomic grids - linear scaling in nature A. D. Becke, J. Chem. Phys. 88, 2547 (1988) , J. M. Perez-Jorda, W. Jang,
Chem. Phys. Lett. 241 (1995) , O. Treutler, R. Ahlrichs, J. Chem. Phys. 102, 346 (1995)
Wave-function optimization - linear scaling for sparse matrices P. Sałek et. al, J. Chem. Phys. 126, 114110
(2007)
Simen Reine (CTCC, University of Oslo) February 13, 2015 5 / 29
Why exchange?
B3LYP - MP220% long-range exchange
DFT - MP2 difference in electrostatic potential
B3LYP - MP2
Blue/red regions correspond to increased/decreased
electrostatic potential for DFT compared to MP2
(no long-range correction)
CAMB3LYP - MP2(includes long-range correction)
26
See Frank Jensen, J. Chem Theory Comput. 6, 2726 (2010) for related discussion on electron affinity
camB3LYP - MP265% long-range exchange
DFT - MP2 difference in electrostatic potential
B3LYP - MP2
Blue/red regions correspond to increased/decreased
electrostatic potential for DFT compared to MP2
(no long-range correction)
CAMB3LYP - MP2(includes long-range correction)
26
See Frank Jensen, J. Chem Theory Comput. 6, 2726 (2010) for related discussion on electron affinity
Jakobsen S, Kristensen K and Jensen F, JCTC 9, 3978 (2013)
Simen Reine (CTCC, University of Oslo) February 13, 2015 6 / 29
Why exchange?
Alanine residue peptides, 6-31GHessian eigenvalues and homo-lumo gap
SCF optimizations in small and large molecules
• Diagonalization can be avoided by solving Newton equations
• However, SCF convergence is typically more difficult in larger systems
– small (or negative) HOMO-LUMO gaps and small Hessian eigenvalues in DFT
– lowest Hessian eigenvalue and HOMO-LUMO gap in alanine residue peptides (6-31G)
100 150 200 250 300 3500
0.1
0.2
0.3
0.4alanine residue peptides
HF HOMO!LUMO gap
B3LYP HOMO!LUMO gap
lowest HF Hessian eigenvalue
B3LYP eigenvalue
• We have modified the standard SCF scheme, to make it more robust
11
SCF convergence is typically more difficult in larger systemssmall (or negative) HOMO-LUMO gaps and small Hessian eigenvalues in DFT
(long-range) exchange becomes essential
Simen Reine (CTCC, University of Oslo) February 13, 2015 7 / 29
Why exchange?
Alanine residue peptides, timings HF/6-31G
Illustration: alanine residue peptides
• Features of the code
– diagonalization-free trust-region Roothaan–Hall (TRRH) energy minimization
– trust-region density-subspace minimization (TRDSM) for density averaging
– boxed density-fitting with FMM for Coulomb evaluation (Simen Reine)
– LinK for exact exchange, linear-scaling exchange-correlation evaluation
– compressed sparse-row (CSR) representation of few-atom blocks
• alanine residue peptides
– CPU time against atoms
– HF/6-31G
– 5th SCF iteration
– dominated by exchange
– RH step least expensive
– full lines: sparse algebra
– dashed lines: dens algebra100 200 300 400 500 600
2500
5000
7500
10000
12500
15000
exchange
Coulomb
DSM
RH
14exchange is the bottleneck
even more prominent with increasing basis set size
Simen Reine (CTCC, University of Oslo) February 13, 2015 8 / 29
RI approximation
”Standard” resolution-of-the-identity (RI) approximation
(ab|cd) ≈∑
α,β∈M(ab|α)(α|β)−1(β|cd)
Coulomb J. L. Whitten, J. Chem. Phys. 58, 4496 (1973) , E. J. Baerends, D. E. Ellis and P. Ros, Chem. Phys. 2, 41 (1973) , B. I. Dunlap, J.
W. D. Connolly and J. R. Sabin, J. Chem. Phys. 71, 4993 (1979)
Jab = (ab|ρ) ≈ (ab|ρ) =∑α
(ab|α)cα, cα =∑β
(α|β)−1(β|ρ)
exchange F. Weigend, Phys. Chem. Chem. Phys. 4, 4285 (2002) , R. Polly, H.-J. Werner, F. R. Manby and P. J. Knowles, Mol. Phys.
102,2311 (2004)
Kab =occ∑
i
(ai|bi) ≈occ∑
i
(ai|bi) =occ∑
i
∑α
(ai|α)cbiα , cbi
α =∑β
(α|β)−1(β|bi)
scaling wall at about 1000 basis functions
Pair-atomic RI (PARI) Merlot et. al, JCC 34, 1486 (2013) , D. S. Hollman, H. F. Schaefer, and E. F. Valeev, J. Chem. Phys. 140,
064109 (2014) , S. F. Manzer , E. Epifanovsky and M. Head-Gordon, JCTC (2014) ,
(ab|cd) ≈∑
α∈A∪B
cabα (α|cd) +
∑β∈C∪D
(ab|β)ccdβ −
∑α∈A∪B
∑β∈C∪D
cabα (α|β)ccd
β
Simen Reine (CTCC, University of Oslo) February 13, 2015 9 / 29
Performance of RI (and J-engine)
B3LYP, naphthalene
Coulomb
2 3 4 5
1
10
100
1000
10000
Cardinal number X (cc-pVXZ)
Tim
ings
(s)
Simen Reine (CTCC, University of Oslo) February 13, 2015 10 / 29
Performance of RI (and J-engine)
B3LYP, naphthalene
J-engine
2 3 4 5
1
10
100
1000
10000
Cardinal number X (cc-pVXZ)
Tim
ings
(s)
Simen Reine (CTCC, University of Oslo) February 13, 2015 11 / 29
Performance of RI (and J-engine)
B3LYP, naphthalene
RI-J
2 3 4 5
1
10
100
1000
10000
Cardinal number X (cc-pVXZ)
Tim
ings
(s)
Simen Reine (CTCC, University of Oslo) February 13, 2015 12 / 29
Performance of RI (and J-engine)
B3LYP, naphthalene
XC
2 3 4 5
1
10
100
1000
10000
Cardinal number X (cc-pVXZ)
Tim
ings
(s)
Simen Reine (CTCC, University of Oslo) February 13, 2015 13 / 29
Performance of RI (and J-engine)
B3LYP, naphthalene
LinK
2 3 4 5
1
10
100
1000
10000
Cardinal number X (cc-pVXZ)
Tim
ings
(s)
Simen Reine (CTCC, University of Oslo) February 13, 2015 14 / 29
Performance of RI (and J-engine)
B3LYP, naphthalene
PARI-K
2 3 4 5
1
10
100
1000
10000
Cardinal number X (cc-pVXZ)
Tim
ings
(s)
Simen Reine (CTCC, University of Oslo) February 13, 2015 15 / 29
RI error
B3LYP, naphthalene, basis–set limit −385.822
RI-J
2 2,5 3 3,5 4
100
1000
10000
100000
Cardinal number X (cc-pVXZ)
Erro
r (m
icro
Hartr
ee)
Simen Reine (CTCC, University of Oslo) February 13, 2015 16 / 29
RI error
B3LYP, naphthalene, basis–set limit −385.822
PARI-K
2 2,5 3 3,5 4
100
1000
10000
100000
Cardinal number X (cc-pVXZ)
Erro
r (m
icro
Hartr
ee)
Simen Reine (CTCC, University of Oslo) February 13, 2015 17 / 29
RI error
B3LYP, naphthalene, basis–set limit −385.822
Basis-set error
2 2,5 3 3,5 4
100
1000
10000
100000
Cardinal number X (cc-pVXZ)
Erro
r (m
icro
Hartr
ee)
Simen Reine (CTCC, University of Oslo) February 13, 2015 18 / 29
RI summary
RI error three orders of magnitude smaller than regular basis-set error
RI-J, speed-up factor 19–174
J-engine, speed-up factor 1.3–4
Combined speed-up for Coulomb factor 25–800
PARI-K, speed-up factor 1.4–9
Coulomb is an order of magnitude (or more) faster than exchange
Greater speed ups for larger systems
Simen Reine (CTCC, University of Oslo) February 13, 2015 19 / 29
ADMM approximation
The expression for the auxiliary density matrix method (ADMM) Guidon et. al, JCTC 6, 2348 (2010) isbased on the following trivial rearrangement of the exchange energy
K (D) = k(d) + K (D)− k(d)
with capital letters representing the regular basis and small letters a smaller auxiliary basis
In the ADMM approximation the two last terms are replaced by a GGA-type exchange
K (D) = k(d) + X(D)− x(d)
in ADMM2 the auxiliary density is obtained by least-square fitting of the projected occupiedorbitals, which gives
d2 = TDT T, T ≡ s−1Q
with s the overlap matrix in the small basis and Q the mixed overlap between the small and theregular basis. This gives the ADMM2 exchange matrix
K 2 = X (D) + T T(k(d2)− x(d2))T
in ADMM1 the projection is subject to the constraint that the projected MOs are orthonormal,giving a density d1 that cannot be expressed directly in terms of the regular AO density matrix D
Simen Reine (CTCC, University of Oslo) February 13, 2015 20 / 29
Charge-constrained ADMM
We have tested the ADMM approximation for Merlot et. al, JCP 141, 094104 (2014)
all electron calculationsvarious GGA correctionfour different basis-set combinationswith three new ADMM variants, ADMMQ, ADMMS and ADMMP
In the ADMMQ approximation the projection is made subject to the charge constraint∫ρ(r)dr =
∫ρ(r)dr → dQ = ξd2, ξ =
NN2
which for the energy gives
KQ(D) = k(dQ)+X(D)−x(dQ)+2Λ [Tr(DS)− Tr(dQs)] , Λ =2N
Tr ((k(dQ)− x(dQ))dQ)
works well in many cases, but in some cases ξ is artificially increased through SCF in turnincreasing the difference k(dQ)− x(dQ)
explained for LDA by the ξ2 dependence for k(dQ) versus the ξ4/3 dependence for x(dQ)
In ADMMP and ADMMS we include the missing ξ2/3 dependence directly in the energyexpression to avoid the artificial increase in ξ
KS(D) = k(dQ) + X(D)− ξ2/3x(dQ) + 2Λ [Tr(DS)− Tr(dQs)]
KP(D) = ξ2k(d2) + X(D)− ξ2x(d2) + 2Λ [Tr(DS)− Tr(dQs)]
Simen Reine (CTCC, University of Oslo) February 13, 2015 21 / 29
M19 ADMM benchmark, 6-31G**/3-21G
�20 �10 0 10 20
0
0.1
0.2
0.3
0.4
Error (mEh)
TZVPP/SVP
ADMM2/PBEX
ADMM2/KT3X
ADMM2/OPTX
ADMMS/PBEX
ADMMS/KT3X
ADMMS/OPTX
�20 �10 0 10 20
0
0.1
0.2
0.3
0.4
Error (mEh)
cc-pVTZ/cc-pVDZ
ADMM2/PBEX
ADMM2/KT3X
ADMM2/OPTX
ADMMS/PBEX
ADMMS/KT3X
ADMMS/OPTX
�20 �10 0 10 20
0
0.1
0.2
0.3
0.4
Error (mEh)
cc-pVTZ/3-21G
ADMM2/PBEX
ADMM2/KT3X
ADMM2/OPTX
ADMMS/PBEX
ADMMS/KT3X
ADMMS/OPTX
�20 �10 0 10 20
0
0.1
0.2
0.3
0.4
Error (mEh)
6-31G**/3-21G
ADMM2/PBEX
ADMM2/KT3X
ADMM2/OPTX
ADMMS/PBEX
ADMMS/KT3X
ADMMS/OPTX
Simen Reine (CTCC, University of Oslo) February 13, 2015 22 / 29
M19 ADMM benchmark, cc-pVTZ/3-21G
�20 �10 0 10 20
0
0.1
0.2
0.3
0.4
Error (mEh)
TZVPP/SVP
ADMM2/PBEX
ADMM2/KT3X
ADMM2/OPTX
ADMMS/PBEX
ADMMS/KT3X
ADMMS/OPTX
�20 �10 0 10 20
0
0.1
0.2
0.3
0.4
Error (mEh)
cc-pVTZ/cc-pVDZ
ADMM2/PBEX
ADMM2/KT3X
ADMM2/OPTX
ADMMS/PBEX
ADMMS/KT3X
ADMMS/OPTX
�20 �10 0 10 20
0
0.1
0.2
0.3
0.4
Error (mEh)
cc-pVTZ/3-21G
ADMM2/PBEX
ADMM2/KT3X
ADMM2/OPTX
ADMMS/PBEX
ADMMS/KT3X
ADMMS/OPTX
�20 �10 0 10 20
0
0.1
0.2
0.3
0.4
Error (mEh)
6-31G**/3-21G
ADMM2/PBEX
ADMM2/KT3X
ADMM2/OPTX
ADMMS/PBEX
ADMMS/KT3X
ADMMS/OPTX
Simen Reine (CTCC, University of Oslo) February 13, 2015 23 / 29
Performance of ADMM
B3LYP, naphthalene
PARI-K
2 3 4 5
1
10
100
1000
10000
Cardinal number X (cc-pVXZ)
Tim
ings
(s)
Simen Reine (CTCC, University of Oslo) February 13, 2015 24 / 29
Performance of ADMM
B3LYP, naphthalene
ADMM
2 3 4 5
1
10
100
1000
10000
Cardinal number X (cc-pVXZ)
Tim
ings
(s)
Simen Reine (CTCC, University of Oslo) February 13, 2015 25 / 29
RI error
B3LYP, naphthalene, basis–set limit −385.822
Basis-set error
2 2,5 3 3,5 4
100
1000
10000
100000
Cardinal number X (cc-pVXZ)
Erro
r (m
icro
Hartr
ee)
Simen Reine (CTCC, University of Oslo) February 13, 2015 26 / 29
ADMM error
B3LYP, naphthalene, basis–set limit −385.822
ADMM
2 2,5 3 3,5 4
100
1000
10000
100000
Cardinal number X (cc-pVXZ)
Erro
r (m
icro
Hartr
ee)
Simen Reine (CTCC, University of Oslo) February 13, 2015 27 / 29
Example calculation - Titin
Model I27SS, 392 atoms, 8 MPI nodes, 16 cores/nodeB3LYP/cc-pVTZ(df-def2/3-21G)
8700 regular, 18761 RI and 2196 ADMM basis functionsIntel Xeon Processor E5-2670, 2.60 GHz
KS matrix 86ADMM-K 32XC 24RI-J 30RH diag 76
LinK 2499J-engine 2121
RI-J 4mHADMMS/KT3 −34mHADMM2/PBE −114mH
#SCF 15RI+ADMM total 3256RI+LinK total 40436
Simen Reine (CTCC, University of Oslo) February 13, 2015 28 / 29