Recommendations for Using Multiple Cores

Modern CPUs typically have multiple cores which can be utilized to reduce solution times.

Three different parallel computing methods are available, Shared-Memory Parallelism (SMP), single program multiple data (SPMD) and Hybrid (a combination of SMP and SPMD) to solve a problem.

The SMP method uses the specified cores (often called threads) to solve the whole model.

The SPMD method splits a model into separate domains which are solved separately and communication between the domains is handled by a message passing interface (MPI) software.

In Hybrid mode, the model is split into separate domains and then multiple cores (threads) are used to solve each domain. Hybrid mode reduces the amount of communication because there are less domains as compared to using only the SPMD method. This is useful when using a very large number of cores on a computer cluster or when using two workstations connect via a slow network.

The number of cores to use for each method is specified using the HyperWorks Solver Run Manager input option:
  • -nt NumThreads for SMP
  • -np NumDomains for SPMD
  • -nt NumThreads -np NumDomains for Hybrid

Recommendations depend on the computer setup available.

Single Workstation

  • Use only SPMD by specifying the run option, -np NumDomains
  • NumDomains should be the number of cores available on the workstations’ CPU
  • If Hyperthreading is enable for the CPU, the computer will appear to have twice as many cores as listed in the CPU specification. These extra cores are virtual and thus provide only a small amount of speedup. A 5% speedup can be obtained by using these extra cores but extra licenses will be used because the number of license depends on the number of cores requested. If utilizing hyperthreading then the hybrid mode with 2 SMP threads, -nt 2, will give the best speedup. For example if an 8 core CPU is being used then, radioss -nt 2 -np 8 model_0000.rad.

Computer Cluster

  • When the number of cores used < 256, the fastest solution times can be obtained by using only SPMD via, -np NumDomains. Note the default value of -nt=1 so it does not need to be included.
  • When the number of cores used > 256, hybrid mode should be used, -nt NumThreads -np NumDomains. NumThreads= 2 and NumDomains= (# cores to be used ) / 2.

    For example, if a cluster node contains 2 CPU, with each CPU having 8 cores. To use 512 cores, NumThreads=2 and NumDomains = 512/2=256.

  • All the cores available on a compute node should be used and dedicated to the Radioss solution, i.e. if a compute node has 16 cores, then the number of cores available to use should be multiples of 16.
  • Hyperthreading cores should not be used in a solution and if possible, hyperthreading can be disabled in the system bios.

It is recommended to use a job scheduler, like Altair PBS to manage the cluster.

Two or Three Workstations

It is possible to use two or three workstations connected with gigabit ethernet to solve one Radioss simulation.

  • For best results, all machines should be identical hardware or at least have the same CPU and be connected with gigabit ethernet or faster network.
  • Hybrid mode can be used to minimize network communication, -np NumDomains -nt NumThreads. Start by setting NumThreads=2 and NumDomains=(total # cores available on all machines)/NumThreads. Run a benchmark model and compare the time to running the model on just one workstation.
  • Next, increase the NumThreads and rerun the benchmark to see if there is any additional speedup. NumThreads should be ≤ number of cores on 1 CPU. NumDomains must be a multiple of the of number of computers used.
    For example, if using 2 Workstations each with two CPU and each CPU has 8 cores. 2*2*8=32 cores available.
    • -nt 2 -np 16 = OK
    • -nt 4 -np 8 = OK
    • -nt 8 -np 4 = OK
  • If hyperthreading is enable for the CPUs, do not use the extra hyperthreading cores.

Additional setup is required as detailed in the HyperWorks Advanced Installation Guide.

Model Size

When using multiple cores to solve a simulation, there must be a reasonable number of elements in the simulation.

A good balance of speedup and throughout is obtained by making sure there are at least 10000 elements in the model for every core used in the solution. So, for a model with 320,000 elements, 320,000/10000=32 cores. Usually, additional speedup can be obtained down to 1000 elements per core.

Use a Different Number of Cores

The Starter can use multiple cores via SMP parallelization.

For extremely large models that will run using a large number of SPMD domains, the Starter will run faster if multiple SMP cores are utilized for domain decomposition and to create the restart files. When using the HyperWorks Solver Run Manager or included script, the Starter and Engine must be ran separately using the -onestep option. For example, assuming a very large 3 million element model that will run on 120 cores on a compute server with 12 cores per CPU.

The Starter will use 12 cores to calculate the 120 SPMD domains and create the restart files.

radioss -nt 12 -np 120 -onestep model_0000.rad

The Engine will use 120 cores using SPMD parallelization.

radioss -nt 1 -np 120 -onestep model_0001.rad