Hi,
I got a problem when I was running a LDPM simulation with around 900'000 particles. Considering the size of the model, I was running with MPI. However, every time I got my job killed, and received an error message: mpirun noticed that process rank 0 with PID 85335 on node n1 exited on signal 9 (Killed). This error message showed at the phase of ttL initializing (right after Generating Ldpm Cells). I was wondering, and Prof. Cusatis suggested, that this may be caused by Tetgen. Since Tetgen hasn't been parallelized, it only runs at a single node, which may exceed the limit of memory in the node. In my case, the memory of each node I used is around 130 GB, and I requested two nodes and 48 procs per node with a hybrid MPI execution with 2 nodes and 48 threads.
Does anyone have any experience of running a large scale LDPM simulation? It would be nice if we can talk, share our experience, and fight out the problem together.
Best Regards,
Weixin