CLEAN diverges sometimes. The point at which divergence begins is very difficult to identify and unfortunately it looks like the images get mostly better as you slowly approach the toxic threshold:
The precise clean command used, with only threshold changing, is:
2016-03-10 07:49:28 INFO tclean:::: tclean(vis="w51_contvis_selfcal_4.ms",selectdata=True,field="",spw="",timerange="", 2016-03-10 07:49:28 INFO tclean::::+ uvrange="",antenna="",scan="",observation="",intent="", 2016-03-10 07:49:28 INFO tclean::::+ datacolumn="corrected",imagename="selfcal_allspw_selfcal_4ampphase_mfs_tclean_deeper_4mJy",imsize=[3072, 3072],cell="0.05arcsec",phasecenter="J2000 19:23:41.629000 +14.30.42.38000", 2016-03-10 07:49:28 INFO tclean::::+ stokes="I",projection="SIN",startmodel="",specmode="mfs",reffreq="", 2016-03-10 07:49:28 INFO tclean::::+ nchan=-1,start="",width="",outframe="LSRK",veltype="radio", 2016-03-10 07:49:28 INFO tclean::::+ restfreq=[],interpolation="linear",gridder="mosaic",facets=1,wprojplanes=1, 2016-03-10 07:49:28 INFO tclean::::+ aterm=True,psterm=False,wbawp=True,conjbeams=True,cfcache="", 2016-03-10 07:49:28 INFO tclean::::+ computepastep=360.0,rotatepastep=360.0,pblimit=0.4,normtype="flatnoise",deconvolver="clark", 2016-03-10 07:49:28 INFO tclean::::+ scales=[],nterms=2,restoringbeam=[],outlierfile="",weighting="briggs", 2016-03-10 07:49:28 INFO tclean::::+ robust=-2.0,npixels=0,uvtaper=[],niter=100000,gain=0.1, 2016-03-10 07:49:28 INFO tclean::::+ threshold="4mJy",cycleniter=-1,cyclefactor=1.0,minpsffraction=0.05,maxpsffraction=0.8, 2016-03-10 07:49:28 INFO tclean::::+ interactive=False,mask="",overwrite=True,savemodel="modelcolumn",calcres=True, 2016-03-10 07:49:28 INFO tclean::::+ calcpsf=True,parallel=False)
It looks like CLEAN needs a different type of threshold to finish on, perhaps something requiring each subsequent component to be some value less than the previous component. If each component is forced to be less than the previous one, the clean can never diverge, though clean would stop unexpectedly if subtracting a component increased the amplitude of some other part of the map (i.e., if the next brightest source is sitting in a negative bowl). Still, this might be a rarer occurrence than the divergence that I see all the time, which may be related to gridding (given the sharp grid pattern seen in panel 4 of the linked image).