Domain Decomposition Method

Domain Decomposition Method (DDM) is available for analysis and optimization. DDM is a parallelization option in OptiStruct that can help significantly reduce model runtime with improved scalability compared to legacy shared memory parallelization approaches, especially on machines with a high number of cores (for example, greater than 8).

DDM allows two main levels of parallelization depending on the attributes of the model.

DDM Level 1 – Task-based parallelization

The first level of parallelization is task-based parallelization. The following model attributes can be parallelized at the first level of distributing tasks to different MPI processes.


Figure 1. Example DDM Setup with Four MPI Processes (-np=4) which are divided into 3 groups. This is an example of how grouping can be accomplished to parallelize a model with multiple parallelizable tasks in DDM Level 1
Table 1. Domain Decomposition Method (DDM) – Support for Task-based Distribution in DDM
Supported Solutions Task-based Parallelizable Entities in DDM Non-Parallelizable Tasks
Linear Static Analysis √ Two or more static Boundary Conditions are parallelized (Matrix Factorization is the step that is parallelized since it is computationally intensive).

√ Sensitivities are parallelized for Optimization.

√ Iterative Solution is not parallelized. (Only Direct Solution is parallelized).
Nonlinear Static Analysis √ Two or more Nonlinear Static Subcases are parallelized.

√ Optimization is not parallelized.

See Note 1
Buckling Analysis √ Two or more Buckling Subcases are parallelized.

√ Sensitivities are parallelized for Optimization.

Direct Frequency Response Analysis √ Loading Frequencies are parallelized.

√ Optimization is not parallelized.

Modal Frequency Response Analysis √ Different modal Spaces are parallelized.

√ Optimization is not parallelized.

 
DGLOBAL Global Search Optimization with multiple starting points √ Starting Points are parallelized.
Multi-Model Optimization √ Models listed in the master file are parallelized depending on the number of specified -np overall and/or the number of -np defined for each model in the master file. √ Task-based parallelization is not applicable within each MMO model. Only Geometric partitioning (DDM level 2) is conducted for each individual model in MMO.
Note:
  1. For Nonlinear Static Analysis, Level 1 DDM (Task-based Distribution) is:
    1. Supported for both Small Displacement and Large Displacement Nonlinear Static Analysis.
    2. Is not active by default, and can be activated by using PARAM,DDMNGRPS or -ddmngrps run option. For other supported solutions above, multi-level DDM is active by default.
    3. If a single nonlinear pretensioning subcase exists, then all other nonlinear subcases should be part of the same pretensioning chain; otherwise, the run will switch to regular DDM.
    4. Nonlinear Transient Analysis is currently not supported.
    5. Example setup for Nonlinear Static for multi-level DDM:
      Every subcase in the examples below is a Nonlinear Static subcase (NLSTAT) Supported setup – Example 1:
      SUBCASE 1 
      PRETENSION=5
      SUBCASE 2
      PRETENSION=6
      STATSUB(PRETENS)=1
      CNTNLSUB=1
      SUBCASE 3
      STATSUB(PRETENS)=2
      CNTNLSUB=2
      SUBCASE 4
      STATSUB(PRETENS)=2
      CNTNLSUB=2

      If PARAM, DDMNGRPS, 2 is specified, then Subcases 1 and 2 will be run first with regular DDM geometric partitioning only, and subsequently Subcases 3 and 4 will be run in parallel with multi-level task-based DDM.

      Supported setup - Example 2:
      SUBCASE 1 
      PRETENSION=5
      SUBCASE 2
      STATSUB(PRETENS)=1
      CNTNLSUB=1
      SUBCASE 3
      STATSUB(PRETENS)=1
      CNTNLSUB=1

      If PARAM, DDMNGRPS, 2 is specified, then Subcase 1 will be run first with regular DDM geometric partitioning only, and subsequently Subcases 2 and 3 will be run in parallel with multi-level task-based DDM.

      Supported setup - Example 3:
      SUBCASE 1 
      NLPARM = 1
      SUBCASE 2
      NLPARM = 1

      If PARAM, DDMNGRPS, 2 is specified, then Subcases 1 and 2 will be run in parallel with multi-level task-based DDM.

      Unsupported setup – Example 4:
      SUBCASE 1 
      PRETENSION=5
      SUBCASE 2
      PRETENSION=6
      STATSUB(PRETENS)=1
      CNTNLSUB=1
      SUBCASE 3
      STATSUB(PRETENS)=2
      CNTNLSUB=2

      This is not supported in multi-level DDM. Even if PARAM, DDMNGRPS, 2 is specified, OptiStruct switches to regular DDM.

A Task is a minimum distribution unit used in parallelization. Each buckling analysis subcase is one task. Each Left-Hand Side (LHS) of the static analysis subcases is one task. Typically, the static analysis subcases sharing the same SPC (Single Point Constraint) belong to one task. Not all tasks can be run in parallel at the same time (Example: A buckling subcase cannot start before the execution of its STATSUB subcase).


Figure 2. (a) DDM Level 1: Parallelization of Tasks; (b) DDM Level 2: Geometric Partitioning of each Task-based Models. depending on the number of MPI processes and number of MPI groups


Figure 3. (a) DDM Level 1: Parallelization of MMO Models; (b) DDM Level 2: Geometric Partitioning of Some or all of the MMO Optimization Models. depending on the number of MPI processes and number of MPI groups
DDM Level 1 is purely task-based parallelization and you can enforce only Level 1 DDM by setting PARAM,DDMNGRPS,MAX. This maximizes the number of groups that can be assigned for the specified model, thereby leading to a pure Task-based parallelization.
Note: DDM Level 1 (Task-based parallelization) currently is not supported for Nonlinear Static or Nonlinear Transient Analysis. However, DDM Level 2 (geometric partitioning) is supported for Nonlinear Analysis.

DDM Level 2 – Parallelization of Geometric Partitions

The second level of parallelization occurs at the distributed task level. Each distributed task can be further parallelized via Geometric partitioning of the model. These geometric partitions are generated and assigned depending on the number of MPI groups and the number of MPI processes in each group.

The second level DDM process utilizes graph partition algorithms to automatically partition the geometric structure into multiple domains (equal to the number of MPI processes in the corresponding group). During FEA analysis/optimization, an individual domain/MPI process only processes its domain related calculations. Such procedures include element matrix assembly, linear solution, stress calculations, sensitivity calculations, and so on.

The platform dependent Message Passing Interface (MPI) handles the communication between various MPI processes.


Figure 4. Example DDM Setup with 5 MPI Processes (-np=5) . The model has 3 task attributes (Example, 3 buckling subcases, or 3 DFREQ loading frequencies, etc). PARAM, DDMNGRPS,2 is defined
The necessary communication across domains is accomplished by OptiStruct and is required to guarantee the accuracy of the final solution. When the solution is complete, result data is collected and output to a single copy of the .out file. From the user’s perspective in terms of results, there should be no difference between DDM and serial runs in this aspect.


Figure 5. Example DDM Setup with 6 MPI Processes (-np=6). The MMO setup has 2 models. PARAM,DDMNGRPS should not be used in conjunction with MMO

The DDM functionality for level two parallelization is direct geometric partitioning of a model based on the number of MPI processes available in the MPI group that is assigned to this task.

DDM level 2 is purely geometric partitioning and you can enforce only level 2 DDM by setting PARAM,DDMNGRPS,MIN. This minimizes the number of groups that can be assigned for the specified model, thereby leading to a pure geometric partitioning parallelization.

Hybrid DDM parallelization using both Level 1 and Level 2 parallelization is possible by setting PARAM,DDMNGRPS,AUTO or PARAM,DDMNGRPS,#, where # is the number of MPI groups. In case of AUTO, which is also the default for any DDM run, OptiStruct heuristically assigns the number of MPI groups for a particular model, depending on model attributes and specified -np.

For a default DDM run (PARAM,DDMNGRPS,AUTO) or if PARAM,DDMNGRPS,# is specified to identify the number of MPI groups, hybrid DDM parallelization is typically active. Both level 1 and level 2 parallelization are performed, depending on the specified -np and the number of MPI groups. PARAM,DDMNGRPS,<ngrps> is an optional parameter and AUTO is always the default for any DDM run. The model is first divided into parallelizable tasks/starting points (refer to Table 1 for supported solution sequences for level 1). Each task/starting point is then solved sequentially by an MPI group. Each MPI group consists of one or more MPI Processes. If an MPI group consists of multiple MPI processes, then the task/starting point is solved by geometric partitioning in that MPI group (the number of MPI processes in a group is equal to the number of geometric partitions. Refer to Supported Solution Sequences for DDM Level 2 Parallelization (Geometric Partitioning) for solutions that support geometric partitioning level 2 DDM).
Note: If Global Search Option is used, then PARAM,DDMNGRPS (or default DDM run) activates parallelization of starting points in the initial parallelization level, then depending on the number of MPI processes in each MPI group, geometric partitioning is also done.
  1. In the case of Global Search Option, the MPI process groups should not be confused with the number of groups (NGROUP) on the DGLOBAL entry. They are completely different entities and do not relate to one another.
  2. In the case of Global Search Option, each MPI process prints the detailed report of the starting points processed and the summary is output in the .out file of the master process. The naming scheme of the generated folders is similar to the serial GSO approach (each folder ends with _SP#_UD#, identifying the Starting Point and Unique Design numbers).
  3. PARAM,DDMNGRPS should not be used in conjunction with Multi-Model Optimization (MMO) runs. For MMO, DDM level 1 (task-based parallelization) is automatically activated when specified np is greater than number of models plus 1 (or if number of np for each model is explicitly defined in the master file). DDM Level 1 for MMO simply distributes the extra MPI processes to models evenly. DDM Level 2 parallelization is geometric partitioning of each model based on the number of MPI processes assigned to each one. For more information, refer to Multi-Model Optimization.


Figure 6. Example of 3 Buckling Subcase Model (or 3 Starting Points in DGLOBAL). First Level of DDM Parallelization: -np 2 -ddm with PARAM,DDMNGRPS,MAX (assign geometric partitions/starting points to DDM MPI processes, sequentially)


Figure 7. Example of 3 Buckling Subcase Model (or 3 Starting Points in DGLOBAL). Hybrid DDM with 2 Levels of Parallelization -np 4 -ddm, and PARAM,DDMNGRPS,2 (assigns 3 Buckling subcases/starting points to DDM MPI groups, sequentially. Subsequently, geometric partitioning is done within each MPI
Table 2. Overview of Support for PARAM,DDMNGRPS
Solution Activation Level 1 Level 2 Grouping
Parallel Global Search Option (DDM) Use -np # -ddm

Default is Hybrid DDM with PARAM,DDMNGRPS,AUTO

PARAM,DDMNGRPS,<ngrps> is optional

Starting points are parallelized (each starting point is solved sequentially in a MPI process, if multiple starting points are assigned to 1 MPI process). A Starting point is geometrically partitioned and solved in parallel by an MPI group (if the MPI group contains multiple MPI processes). Grouping via PARAM,DDMNGRPS,AUTO is the default. If you need to adjust number of MPI groups, use PARAM,DDMNGRPS,<ngrps> (if np> ngrps then this activates 2nd level parallelization wherein geometric partitioning of some/all starting point run(s) occurs when multiple MPI processes can operate on a single starting point).
General Analysis/Optimization runs (DDM) Use -np # -ddm

Default is Hybrid DDM with PARAM,DDMNGRPS,AUTO.

PARAM,DDMNGRPS,<ngrps> is optional.

Loadcases, loading Frequencies (Task-based parallelization) Refer to Table 1 for supported solutions). Each loadcase/loading frequency is geometrically partitioned and solved in parallel by an MPI group (if the MPI group contains multiple MPI processes). Grouping via PARAM,DDMNGRPS,AUTO is the default. If you need to adjust number of MPI groups, use PARAM,DDMNGRPS,<ngrps>

Like the above scenario, if np > ngrps, then this activates 2nd level parallelization wherein geometric partitioning of some (or all) tasks occur.

Multi-Model Optimization (MMO) with DDM Use -np # -mmo

PARAM,DDMNGRPS is not supported with MMO

Optimization models are parallelized wherein extra MPI processes (> no of models plus 1) are distributed evenly to models in master MMO file (or user- defined -np for each model in master file). Any MMO optimization model is geometrically partitioned and solved in parallel (number of partitions depends on number of MPI processes/np assigned to it). There is no grouping via PARAM,DDMNGRPS as it is not supported for MMO. However, grouping is indirectly accomplished by automatic distribution of np to the MMO models or by explicit user input via Master file.

Supported Solution Sequences for DDM Level 2 Parallelization (Geometric Partitioning)

DDM Level Supported Solutions
Domain Decomposition (Level 2) Linear Static Analysis and Optimization Nonlinear Static analysis Linear Buckling Analysis and Optimization (MUMPS is available for SMP)
Fatigue Analysis (based on Linear Static, Modal Transient, Random Response) Normal Modes (with Lanczos) Modal Frequency Response (with Lanczos)
Preloaded Modal Frequency Response (with AMLS/AMSES) Nonlinear Transient Analysis Direct Linear Transient Analysis
Modal Linear Transient Analysis Structural Fluid Structure Interaction (Structural FSI) Multi-Model Optimization (MMO)
Direct Frequency Response Analysis (Structural, Acoustic, and MFLUID) (MUMPS is available for SMP) Periodic Boundary Conditions (PERBC)
The Iterative solver are currently not supported in conjunction with DDM. Heat transfer subcases are allowed to be combined with structural subcases for DDM runs; however, the heat transfer subcases will not use DDM, while the structural subcase will be able to take advantage of DDM functionality. If only heat transfer subcases exist for a DDM run, then the run will error out.
Note:
  1. The -ddm run option can be used to activate DDM. Refer to How many MPI processes (-np) and threads (-nt) per MPI process should I use for DDM runs? for information on launching Domain Decomposition in OptiStruct.
  2. In DDM mode, there is no distinction between MPI process types (for example, manager, master, slave, and so on). All MPI processes (domains) are considered as worker MPI processes. Additionally, hybrid DDM with multiple levels is the default for any DDM run. Therefore, if -np n is specified, OptiStruct first divides available -np into MPI groups heuristically (PARAM,DDMNGRPS,AUTO) and then within each MPI group, subsequent geometric partitions are conducted wherein the number of such partitions in each group are equal to the number of MPI processes available in that group. These MPI processes are then run on corresponding sockets/machines depending on availability.
  3. Hybrid computation is supported. -nt can be used to specify the number of threads (m) per MPI process in an SMP run. Sometimes, hybrid performance may be better than pure MPI or pure SMP mode, especially for blocky structures. It is also recommended that the total number of threads for all MPI processes (n x m) should not exceed the number of physical cores of the machine.