Adam Ginsburg's blog

Running big MPI runs again, I encounter errors like these one:

2021-12-02 07:02:26      WARN    MPICommandServer::command_request_handler_service::SynthesisImagerVi2::CubeMajorCycle::MPIServer-25 (file src/code/synthesis/ImagerObjects/CubeMajorCycleAlgorithm.cc, line 336)        Exception for chan range [1434, 1445] ---   FilebufIO::readBlock - incorr
ect number of bytes read for file /blue/adamginsburg/adamginsburg/almaimf/workdir/G327.29_B6_spw7_12M_spw7.sumwt/table.f0
##################################
#############################
Exception: FilebufIO::readBlock - incorrect number of bytes read for file /blue/adamginsburg/adamginsburg/almaimf/workdir/G327.29_B6_spw7_12M_spw7.sumwt/table.f0


2021-12-02 06:03:53        WARN    MPICommandServer::command_request_handler_service::SynthesisImagerVi2::CubeMajorCycle::MPIServer-13 (file src/code/synthesis/ImagerObjects/CubeMajorCycleAlgorithm.cc, line 336)        Exception for chan range [550, 560] ---   FilebufIO::readBlock - incorrect number
 of bytes read for file /blue/adamginsburg/adamginsburg/almaimf/workdir/G327.29_B6_spw4_12M_spw4.sumwt/table.f0
##################################
#############################
Exception: FilebufIO::readBlock - incorrect number of bytes read for file /blue/adamginsburg/adamginsburg/almaimf/workdir/G327.29_B6_spw4_12M_spw4.sumwt/table.f0

2021-12-02 04:57:21 WARN    MPICommandServer::command_request_handler_service::SynthesisImagerVi2::CubeMajorCycle::MPIServer-9 (file src/code/synthesis/ImagerObjects/CubeMajorCycleAlgorithm.cc, line 336) Exception for chan range [1563, 1574] ---   FilebufIO::readBlock
- incorrect number of bytes read for file /blue/adamginsburg/adamginsburg/almaimf/workdir/G327.29_B6_spw0_12M_spw0.sumwt/table.f0
##################################
#############################
Exception: FilebufIO::readBlock - incorrect number of bytes read for file /blue/adamginsburg/adamginsburg/almaimf/workdir/G327.29_B6_spw0_12M_spw0.sumwt/table.f0

This is only quasi-repeatable. For spw4, it happened after the 8th Major Cycle the first time, then the 6th Major Cycle the second time. By repeating the cleans over and over, I was able to eventually get the results to converge. It looks like these crashes are sporadic but don't affect the data.

Adam Ginsburg's blog

Recent Posts

CASA reading incorrect number of bytes w/MPI