Next: 17. MP2 Up: user Previous: 15. CIS, TDHF, and Contents

Subsections

16. Tensor Contraction Engine Module: CI, MBPT, and CC

16.1 Overview

The Tensor Contraction Engine (TCE) Module of NWChem implements a variety of approximations that converge at the exact solutions of Schrödinger equation. They include configuration interaction theory through singles, doubles, triples, and quadruples substitutions, coupled-cluster theory through connected singles, doubles, triples, and quadruples substitutions, and many-body perturbation theory through fourth order in its tensor formulation. Not only optimized parallel programs of some of these high-end correlation theories are new, but also the way in which they have been developed is unique. The working equations of all of these methods have been derived completely automatically by a symbolic manipulation program called a Tensor Contraction Engine (TCE), and the optimized parallel programs have also been computer-generated by the same program, which were interfaced to NWChem. The development of the TCE program and this portion of the NWChem program has been financially supported by the United States Department of Energy, Office of Science, Office of Basic Energy Science, through the SciDAC program.

The capabilities of the module include:

Restricted Hartree-Fock, unrestricted Hartree-Fock, and restricted open-shell Hartree-Fock references,
Restricted KS DFT and unrestricted KS DFT references,
Unrestricted configuration interaction theory (CISD, CISDT, and CISDTQ),
Unrestricted coupled-cluster theory (LCCD, CCD, LCCSD, CCSD, QCISD, CCSDT, CCSDTQ),
Unrestricted iterative many-body perturbation theory [MBPT(2), MBPT(3), MBPT(4)] in its tensor formulation,

and the following optimizations have been used in the module:

Spin symmetry (spin integration is performed wherever possible within the unrestricted framework, making the present unrestricted program optimal for an open-shell system. The spin adaption was not performed, although in a restricted calculation for a closed-shell system, certain spin blocks of integrals and amplitudes are further omitted by symmetry, and consequently, the present unrestricted CCSD requires only twice as many operations as a spin-adapted restricted CCSD for a closed-shell system),
Point-group symmetry,
Index permutation symmetry,
Runtime orbital range tiling for memory management,
Dynamic load balancing (local index sort and matrix multiplications) parallelism,
Multiple parallel I/O schemes including fully incore algorithm using Global Arrays,
Frozen core and virtual approximation.
DIIS extrapolation and Jacobi update of excitation amplitudes

This extensible module is designed such that an existing or new model of many-electron theory can be added and further optimization can be incorporated with ease by virtue of the TCE. This module is still being actively enhanced by the TCE and we hope to include more models and optimizations in future releases!

16.2 Performance of CI, MBPT, and CC methods

For reviews or tutorials of these highly-accurate correlation methods, the user is referred to:

A. Szabo and N. S. Ostlund, Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory,
R. J. Bartlett and J. F. Stanton, Applications of Post-Hartree-Fock Methods: A Tutorial, in Reviews in Computational Chemistry, Volume V,
R. J. Bartlett, Coupled-Cluster Theory: An Overview of Recent Developments, in Modern Electronic Structure Theory, Part II,
B. O. Roos (editor), Lecture Notes in Quantum Chemistry I and II.

16.3 Algorithms of CI, MBPT, and CC methods

16.3.1 Spin, spatial, and index permutation symmetry

The TCE thoroughly analyzes the working equation of many-electron theory models and automatically generates a program that takes full advantage of these symmetries at the same time. To do so, the TCE first recognizes the index permutation symmetries among the working equations, and perform strength reduction and factorization by carefully monitoring the index permutation symmetries of intermediate tensors. Accordingly, every input and output tensor (such as integrals, excitation amplitudes, residuals) has just two independent but strictly ordered index strings, and each intermediate tensor has just four independent but strictly ordered index strings. The operation cost and storage size of tensor contraction is minimized by using the index range restriction arising from these index permutation symmetries and also spin and spatial symmetry integration.

16.3.2 Runtime orbital range tiling

To maintain the peak local memory usage at a manageable level, in the beginning of the calculation, the orbitals are rearranged into tiles (blocks) that contains orbitals with the same spin and spatial symmetries. So the tensor contractions in these methods are carried out at the tile level; the spin, spatial, and index permutation symmetry is employed to reduce the operation and storage cost at the tile level also.

16.3.3 Dynamic load balancing parallelism

In a parallel execution, dynamic load balancing of tile-level local tensor index sorting and local tensor contraction (matrix multiplication) will be invoked.

16.3.4 Parallel I/O schemes

Each process is assigned a local tensor index sorting and tensor contraction dynamically. It must first retrieve the tiles of input tensors, and perform these local operations, and accumulate the output tensors to the storage. We have developed a uniform interface for these I/O operations to either (1) a global file on a global file system, (2) a global memory on a global or distributed memory system, and (3) semi-replicated files on a distributed file systems. Some of these operations depend on the ParSoft library.

16.4 Input syntax

The keyword to invoke the many-electron theories in the module is TCE. To perform a single-point energy calculation, include

      TASK TCE ENERGY

in the input file, which may be preceeded by the TCE input block that details the calculations:

  TCE
    [(DFT||HF||SCF) default HF=SCF]
    [(LCCD||CCD||CCSD||LCCSD||CCSDT||CCSDTQ|| \
      QCISD||CISD||CISDT||CISDTQ|| \
      MBPT2||MBPT3||MBPT4||MP2||MP3||MP4) default CCSD]
    [THRESH <double thresh default 1e-6>]
    [MAXITER <integer maxiter default 100>]
    [IO (fortran||eaf||ga||sf||replicated) default ga]
    [DIIS <integer diis default 5>]
    [FREEZE [[core] (atomic || <integer nfzc default 0>)] \
             [virtual <integer nfzv default 0>]]
    [PRINT (none||low||medium||high||debug)
      <string list_of_names ...>]
  END

Also supported are energy gradient calculation, geometry optimization, and vibrational frequency (or hessian) calculation, on the basis of numerical differentiation. To perform these calculations, use

      TASK TCE GRADIENT

      TASK TCE OPTIMIZE

      TASK TCE FREQUENCIES

Alternatively, more descriptive keywords for each individual method can be used. For instance, to perform a CCSDT energy, gradient, etc. calculation, use

      TASK UCCSDT ENERGY

      TASK UCCSDT GRADIENT

      TASK UCCSDT OPTIMIZE

      TASK UCCSDT FREQUENCIES

with an (optional) input block enclosed either by UCCSDT and END or by UCC and END. The keywords for individual methods of TCE module always start with letter U which stands for ``unrestricted'' to avoid confusion with other related methods (such as spin-restricted CCSD and various canonical MP2 implementation) already in place in NWChem.

  (UCCSDT||UCC)
    [(DFT||HF||SCF) default HF=SCF]
    [THRESH <double thresh default 1e-6>]
    [MAXITER <integer maxiter default 100>]
    [IO (fortran||c||ga||sf||replicated) default ga]
    [DIIS <integer diis default 5>]
    [FREEZE [[core] (atomic || <integer nfzc default 0>)] \
             [virtual <integer nfzv default 0>]]
    [PRINT (none||low||medium||high||debug)
      <string list_of_names ...>]
  END

When a method (CCSDT in this example) is specified in the task directive, a duplicate method specification is not necessary (indeed not allowed) in the corresponding (UCCSDT or UCC in this case) input block. The keywords of the other methods for task directive are:

      TASK (UCCD||ULCCD||UCCSD||ULCCSD||UQCISD||UCCSDT||UCCSDTQ) ENERGY

      TASK (UCISD||UCISDT||UCISDTQ) ENERGY

      TASK (UMP2||UMP3||UMP4||UMBPT2||UMBPT3||UMBPT4) ENERGY

etc. The input block can be specified by the same name (UCISDT and END block for TASK UCISDT ENERGY) or UCC for the CC family, UCI for the CI family, and UMP or UMBPT for the MP family of methods.

The user may also specify the parameters of reference wave function calculation in a separate block for either HF (SCF) or DFT, depending on the first keyword in the above syntax.

Since each keyword has a default value, a minimal input file will be

  GEOMETRY
  Be 0.0 0.0 0.0
  END

  BASIS
  Be library cc-pVDZ
  END

  TASK TCE ENERGY

which performs a CCSD/cc-pVDZ calculation of the Be atom in its singlet ground state with a spin-restricted HF reference.

16.5 Keywords of `TCE` input block

16.5.1 `HF`, `SCF`, or `DFT` -- the reference wave function

This keyword tells the module which of the HF (SCF) or DFT module is going to be used for the calculation of a reference wave function. The keyword HF and SCF are one and the same keyword internally, and are default. When these are used, the details of the HF (SCF) calculation can be specified in the SCF input block, whereas if DFT is chosen, DFT input block may be provided.

For instance, RHF-RCCSDT calculation (R standing for spin-restricted) can be performed with the following input blocks:

  SCF
  SINGLET
  RHF
  END

  TCE
  SCF
  CCSDT
  END

  TASK TCE ENERGY

  SCF
  SINGLET
  RHF
  END

  UCCSDT
  SCF
  END

  TASK UCCSDT ENERGY

  SCF
  SINGLET
  RHF
  END

  UCC
  SCF
  END

  TASK UCCSDT ENERGY

This calculation (and any correlation calculation in the TCE module using a RHF or RDFT reference for a closed-shell system) skips the storage and computation of all $\beta$ spin blocks of integrals and excitation amplitudes. ROHF-UCCSDT (U standing for spin-unrestricted) for an open-shell doublet system can be requested by

  SCF
  DOUBLET
  ROHF
  END

  TCE
  SCF
  CCSDT
  END

  TASK TCE ENERGY

and likewise, UHF-UCCSDT for an open-shell doublet system can be specified with

  SCF
  DOUBLET
  UHF
  END

  TCE
  SCF
  CCSDT
  END

  TASK TCE ENERGY

The operation and storage costs of the last two calculations are identical. To use the KS DFT reference wave function for a UCCSD calculation of an open-shell doublet system,

  DFT
  ODFT
  MULT 2
  END

  TCE
  DFT
  CCSD
  END

  TASK TCE ENERGY

Note that the default model of the DFT module is LDA.

16.5.2 `CCSD`,`CCSDT`,`CCSDTQ`,`CISD`,`CISDT`,`CISDTQ`, `MBPT2`,`MBPT3`,`MBPT4`, etc. -- the correlation model

These keywords stand for the following models:

LCCD: linearized coupled-cluster doubles,
CCD: coupled-cluster doubles,
LCCSD: linearized coupled-cluster singles & doubles,
CCSD: coupled-cluster singles & doubles,
CCSDT: coupled-cluster singles, doubles, & triples,
CCSDTQ: coupled-cluster singles, doubles, triples, & quadruples,
QCISD: quadratic configuration interaction singles & doubles,
CISD: configuration interaction singles & doubles,
CISDT: configuration interaction singles, doubles, & triples,
CISDTQ: configuration interaction singles, doubles, triples, & quadruples,
MBPT2=MP2: iterative tensor second-order many-body or Møller-Plesset perturbation theory,
MBPT3=MP3: iterative tensor third-order many-body or Møller-Plesset perturbation theory,
MBPT4=MP4: iterative tensor fourth-order many-body or Møller-Plesset perturbation theory,

All of these models are based on spin-orbital expressions of the amplitude and energy equations, and designed primarily for spin-unrestricted reference wave functions. However, for a restricted reference wave function of a closed-shell system, some further reduction of operation and storage cost will be made. Within the unrestricted framework, all these methods take full advantage of spin, spatial, and index permutation symmetries to save operation and storage costs at every stage of the calculation. Consequently, these computer-generated programs will perform significantly faster than, for instance, a hand-written spin-adapted CCSD program in NWChem, although the nominal operation cost for a spin-adapted CCSD is just one half of that for spin-unrestricted CCSD (in spin-unrestricted CCSD there are three independent sets of excitation amplitudes, whereas in spin-adapted CCSD there is only one set, so the nominal operation cost for the latter is one third of that of the former. For a restricted reference wave function of a closed-shell system, all $\beta$ spin block of the excitation amplitudes and integrals can be trivially mapped to the all $\alpha$ spin block, reducing the ratio to one half).

While the MBPT (MP) models implemented in the TCE module give identical correlation energies as conventional implementation for a canonical HF reference of a closed-shell system, the former are intrinsically more general and theoretically robust for other less standard reference wave functions and open-shell systems. This is because the zeroth order of Hamiltonian is chosen to be the full Fock operatior (not just the diagonal part), and no further approximation was invoked. So unlike the conventional implementation where the Fock matrix is assumed to be diagonal and a correlation energy is evaluated in a single analytical formula that involves orbital energies (or diagonal Fock matrix elements), the present tensor MBPT requires the iterative solution of amplitude equations and subsequent energy evaluation and is generally more expensive than the former. For example, the operation cost of many conventional implementation of MBPT(2) scales as the fourth power of the system size, but the cost of the present tensor MBPT(2) scales as the fifth power of the system size, as the latter permits non-canonical HF reference and the former does not (to reinstate the non-canonical HF reference in the former makes it also scale as the fifth power of the system size).

16.5.3 `THRESH` -- the convergence threshold of iterative solutions of amplitude equations

This keyword specifies the convergence threshold of iterative solutions of amplitude equations, and applies to all of the CI, CC, and MBPT models. The threshold refers to the norm of residual, namely, the deviation from the amplitude equations. The default value is 1e-6.

16.5.4 `MAXITER` -- the maximum number of iterations

It sets the maximum allowed number iterations for the iterative solutions of amplitude equations. The default value is 100.

16.5.5 `IO` -- parallel I/O scheme

There are five parallel I/O schemes implemented for all the models, which need to be wisely chosen for a particular problem and computer architecture.

fortran : Fortran77 direct access,
eaf : Exclusive Access File library,
ga : Fully incore, Global Array virtual file,
sf : Shared File library,
replicated : Semi-replicated file on distributed file system with EAF library.

The GA algorithm, which is default, stores all input (integrals and excitation amplitudes), output (residuals), and intermediate tensors in the shared memory area across all nodes by virtue of GA library. This fully incore algorithm replaces disk I/O by inter-process communications. This is a recommended algorithm whenever feasible. Note that the memory management through runtime orbital range tiling described above applies to local (unshared) memory of each node, which may be separately allocated from the shared memory space for GA. So when there is not enough shared memory space (either physically or due to software limitations, in particular, shmmax setting), the GA algorithm can crash due to an out-of-memory error. The replicated scheme is the currently the only disk-based algorithm for a genuinely distributed file system. This means that each node keeps an identical copy of input tensors and it holds non-identical overlapping segments of intermediate and output tensors in its local disk. Whenever data coherency is required, a file reconcilation process will take place to make the intermediate and output data identical throughout the nodes. This algorithm, while requiring redundant data space on local disk, performs reasonably efficiently in parallel. For sequential execution, this reduces to the EAF scheme. For a global file system, the SF scheme is recommended. This together with the Fortran77 direct access scheme does not usually exhibit scalability unless shared files on the global file system also share the same I/O buffer. For sequential executions, the SF, EAF, and replicated schemes are interchangeable, while the Fortran77 scheme is appreciably slower.

16.5.6 `DIIS` -- the convergence acceleration

It sets the number iterations in which a DIIS extrapolation is performed to accelerate the convergence of excitation amplitudes. The default value is 5, which means in every five iteration, one DIIS extrapolation is performed (and in the rest of the iterations, Jacobi rotation is used). When zero or negative value is specified, the DIIS is turned off. It is not recommended to perform DIIS every iteration, whereas setting a large value for this parameter necessitates a large memory (disk) space to keep the excitation amplitudes of previous iterations.

16.5.7 `FREEZE` -- the frozen core/virtual approximation

Some of the lowest-lying core orbitals and/or some of the highest-lying virtual orbitals may be excluded in the calculations by this keyword (this does not affect the ground state HF or DFT calculation). No orbitals are frozen by default. To exclude the atom-like core regions altogether, one may request

  FREEZE atomic

To specify the number of lowest-lying occupied orbitals be excluded, one may use

  FREEZE 10

which causes 10 lowest-lying occupied orbitals excluded. This is equivalent to writing

  FREEZE core 10

To freeze the highest virtual orbitals, use the virtual keyword. For instance, to freeze the top 5 virtuals

  FREEZE virtual 5

16.5.8 `PRINT` -- the verbosity

This keyword changes the level of output verbosity. One may also request some particular items in Table 16.1 printed.

**Table 16.1:** Printable items in the TCE modules and their default print levels.
Item	Print Level	Description
``time''	vary	CPU and wall times
``tile''	vary	Orbital range tiling information
``t1''	debug	excitation amplitude dumping
``t2''	debug	excitation amplitude dumping
``t3''	debug	excitation amplitude dumping
``t4''	debug	excitation amplitude dumping
``general information''	default	General information
``correlation information''	default	TCE information
``mbpt2''	debug	Caonical HF MBPT2 test
``get_block''	debug	I/O information
``put_block''	debug	I/O information
``add_block''	debug	I/O information
``files''	debug	File information
``offset''	debug	File offset information
``ao1e''	debug	AO one-electron integral evaluation
``ao2e''	debug	AO two-electron integral evaluation
``mo1e''	debug	One-electron integral transformation
``mo2e''	debug	Two-electron integral transformation

16.6 Sample input

The following is a sample input for a ROHF-UCCSD energy calculation of a water radical cation.

START h2o

TITLE "ROHF-UCCSD/cc-pVTZ H2O"

CHARGE 1

GEOMETRY
O     0.00000000     0.00000000     0.12982363
H     0.75933475     0.00000000    -0.46621158
H    -0.75933475     0.00000000    -0.46621158
END

BASIS
* library cc-pVTZ
END

SCF
ROHF
DOUBLET
THRESH 1.0e-10
TOL2E  1.0e-10
END

TCE
CCSD
END

TASK TCE ENERGY

The same result can be obtained by the following input:

START h2o

TITLE "ROHF-UCCSD/cc-pVTZ H2O"

CHARGE 1

GEOMETRY
O     0.00000000     0.00000000     0.12982363
H     0.75933475     0.00000000    -0.46621158
H    -0.75933475     0.00000000    -0.46621158
END

BASIS
* library cc-pVTZ
END

SCF
ROHF
DOUBLET
THRESH 1.0e-10
TOL2E  1.0e-10
END

TASK UCCSD ENERGY

Next: 17. MP2 Up: user Previous: 15. CIS, TDHF, and Contents

2003-10-08