[MARS-UD-100] OMP vs MPI, different results

[MARS-UD-100] OMP vs MPI, different results

Postby MMarco88 » Wed Jun 15, 2016 11:20 am

Hello,

I started using the MPI version of Mars to save time for the simulations and I encountered some problems.

The manual says that to work, the subdivision of the domain can't cross a master-slave connection. Since in my model (similarly to a cylinder compression test) the top and bottom surfaces have a master-slave connection with something else, I tried to use also the:

MpiDomainList {
RecursiveBisection ttL-PRTC Z-direction
}

I compared the results of the OMP, MPI and MPI Zdirection and they gave me three different results.

I created a word document where I explain everything in detail with some pictures.

At the end of the document I ask this two questions:

A- Is it possible to subdivide the domain in 4 parts or in the number of nodes that you required?

B- Does just the master-slave interaction create problems with MPI or also the rebar interaction?

Can you please help me defining the domain subdivision in such a way that the OMP and MPI (or MPI Zdirection) give the same results?

Thank you, best regards
Attachments
QuestionMPI.docx
(240.17 KiB) Downloaded 471 times
MMarco88
 
Posts: 4
Joined: Mon Aug 24, 2015 1:55 pm

Re: OMP vs MPI, different results

Postby zhouxinwei » Tue Jun 21, 2016 6:48 pm

A. Theoretically, you can use any integers which are power of 2, e.g. 2, 4, 8, 16, etc; practically, there is always an optimal number beyond which things are slowed down.

B. MPI is incompatible with the TrngFaceNodeBondList, which is a master-slave formulation.

Other comments:

1. I tried your input file for MPI decomposition in Z direction, the domain is divided exclusively in Z direction. I couldn't reproduce the issue you reported. Please try again with different number of processes to confirm.

2. Given the aspect ratio of the slab in your input file, which sits in XY plane, the optimal bisection direction is in X and Y. If we use MPI with preferential decomposition in Z, in my experience, the optimal number of processes would be a small number (e.g. 2 or 4). After that optimal number, the more processes we use, the slower it will be because the MPI communication overtakes the domain decomposition gain.

3. To get the "same" result as OpenMP, we suggest use preferential MPI decomposition with global updates. In cases, including yours, where the constraint/contact is evolving during the simulation and may move from one MPI domain to another (e.g. beam particle constraints in a large deformation simulation such as pullout), it is almost impossible to get exact the same numbers as OpenMP, however, doing global updates more frequently could improve the accuracy. More details can be found in the manual where global update is discussed.

4. In your case, decomposition in X and Y is better for speed. The master-slave formulation could introduce small errors at the domain interface, it's the user's discretion if the error is acceptable or not. If not, you can still use it for development or try things out, and eventually do an accurate run with OpenMP or MPI with preferential decomposition.
zhouxinwei
 
Posts: 208
Joined: Thu Jul 09, 2015 7:12 pm


Return to OpenMP and MPI

Who is online

Users browsing this forum: No registered users and 1 guest

cron