Three weeks ago I started to write a blog post about Crashplan. This is not how I expected it to turn out.
This is likely to be quite long, so I'll put the conclusions at the front, and then the information I've used to draw those conclusions follows.
If you're a Crashplan user (quite possibly because I've recommended it to you in the past) you need to be aware that.
- Previous versions of Crashplan have silently corrupted data that has been backed up.
- The team at Crashplan are aware of this. More recent versions of the software do not have this problem.
- However, more recent versions of the software do not fix, acknowledge, or in any way indicate that some of the files in the backup are corrupt.
- Crashplan support appear to wholly unconcerned with this in a manner that means I no longer have faith in the product or their support. I leave you to determine the course of action that's right for you.
I've been an enthusiastic user of the Crashplan backup software for something like two and a half years. I forget how I found it -- probably some blog post or mailing list -- but it seemed to me to be a great example of software that just works. It was flexible enough to handle my backup needs, and easy enough to use that I recommended it to family. friends, and work colleagues. I'm a paying customer, and have purchased Crashplan licenses to give to other people as gifts to encourage them to back up their important data safely.
So for more than two years my main computer at home has been backed up using Crashplan, initially to a locally attached USB drive, and latterly also to a colleague who I convinced to run Crashplan for his backup needs.
One of Crashplan's more useful features is that the software will auto-update, prompting you when a new version is released. So during this period I've very closely tracked whatever the most recent version of Crashplan is.
A couple of weeks ago I purchased a new PC, and the plan was once I'd gone through the somewhat tedious business of reinstalling my software, restoring all my data, and so forth I was going to decommission the old one. To that end, once the new PC was up and running one of the first things I did was install Crashplan on the new PC, make sure the old PC was 100% backed up to the USB drive, and then plug the USB drive in to the new PC.
When you do this, Crashplan can "attach" to the backup. Even though the files in the backup weren't from the new PC I just had to enter the password for the backup so it could decrypt them and restore them to the new PC. I thought this would be the simplest (and probably fastest) way of migrating my data from the old to the new PC.
I let Crashplan chug along doing the restore, which took several hours because of the volume of data. And then, at the end of the process, I saw a warning that 140 files have failed the "integrity check" during the restore, and couldn't be properly restored. All of them were digital photos.
Now this is a bit odd. One of the things that the Crashplan team champion on the website is the following claim:
Once your files are backed up, CrashPlan continuously checks that your files are 100% healthy and ready to restore when you need them. If it finds any problems, CrashPlan fixes them.For me, this is a big benefit. One of the things you should do when backing up data is periodically try and restore it, to ensure that the backup is actually working. The fact that Crashplan tries to do this in the background was an important part of choosing the software.
Source: http://www.crashplan.com/consumer/features.html
Now I knew the backup was complete -- I'd verified it before I unplugged it from the old PC, so this is one of those things that should just never happen.
I sent an e-mail to the Crashplan support address. This generated ticket #20145 in their queue, and my message went like this:
Hi,
I'm migrating to a replacement PC. I decided to migrate my data across by plugging the external hard drive that the original PC backs up to using Crashplan+, and then restoring from that archive on the new PC, running version 3.8.2010.
29,742 files restored correctly. 140 failed, listing in the History tab as - Integrity check failed for
First, it would be very useful if I could cut/paste the contents of the History tab. It would make it much easier to figure out which files I'll need to copy over by hand.
Second, and much more importantly, I'm very concerned by this. From http://b1.crashplan.com/consumer/features.html:
Once your files are backed up, CrashPlan continuously checks that your files are 100% healthy and ready to restore when you need them. If it finds any problems, CrashPlan fixes them.
This does not appear to have happened. How do I find out what went wrong in this instance, and how do I fix it?About 5h30m later (which is, by the way, fine, we're in very different time zones, so that sort of response time is not only perfectly acceptable it's probably above and beyond what I would normally expect) I get a reply from Renee at Crashplan, asking if I can send logs from the destination computer, and instructions on how to do that. I do so, and over the course of a few days (a short vacation intervened) I send logs from the source computer (i.e., the one that's been doing all the backups over the last few years) as well.
A day and a half after I send the necessary logs I get a reply from Bret at Crashplan. He says:
Unfortunately these logs don't point to a clear source of this error. A copy of the restored file was preserved with a modified name; it may be useful for you to review this modified file and let us know if the file that was restored appears to be correct or is non-functional. For example, the following file:
C:/Documents and Settings/Nik Clayton/My Documents/My Pictures/2006/2006 07 14 All Things Gothic/IMG_2301.JPG
was restored to the following location:
C:\Users\nik\Documents\My Documents\My Pictures\2006\2006 07 14 All Things Gothic\restore.failed-checksum.IMG_2301.JPG
Can you attempt to open this file and verify that it is a well-formed JPEG file?I do some digging, and reply about five hours later, with
It goes quiet for two days, and then Matt Genelin takes over the ticket, saying:
It's not a valid file. Windows Photo Viewer refuses to open it.
The restore.failed-checksum.* files have suspicious file sizes:
09/09/2008 19:42 786,432 restore.failed-checksum.IMG_2296.JPG 09/09/2008 19:42 786,432 restore.failed-checksum.IMG_2297.JPG 09/09/2008 19:42 786,432 restore.failed-checksum.IMG_2298.JPG 09/09/2008 19:42 786,432 restore.failed-checksum.IMG_2299.JPG 09/09/2008 19:42 786,432 restore.failed-checksum.IMG_2301.JPG 09/09/2008 19:42 786,432 restore.failed-checksum.IMG_2302.JPG 09/09/2008 19:42 786,432 restore.failed-checksum.IMG_2303.JPG 09/09/2008 19:42 917,504 restore.failed-checksum.IMG_2305.JPG 09/09/2008 19:42 917,504 restore.failed-checksum.IMG_2306.JPG 09/09/2008 19:42 786,432 restore.failed-checksum.IMG_2307.JPG 09/09/2008 19:42 917,504 restore.failed-checksum.IMG_2308.JPG 09/09/2008 19:42 786,432 restore.failed-checksum.IMG_2309.JPG 09/09/2008 19:42 786,432 restore.failed-checksum.IMG_2310.JPG 09/09/2008 19:42 655,360 restore.failed-checksum.IMG_2311.JPG
They're all exact multiples of 1,024, and far too small. Compare and contrast with the same files that I restored by direct sync from the source PC to the target PC:
09/09/2008 19:42 2,999,745 IMG_2296.JPG09/09/2008 19:42 3,029,664 IMG_2297.JPG09/09/2008 19:42 3,102,390 IMG_2298.JPG09/09/2008 19:42 2,923,048 IMG_2299.JPG09/09/2008 19:42 2,939,522 IMG_2301.JPG09/09/2008 19:42 3,077,000 IMG_2302.JPG09/09/2008 19:42 2,707,091 IMG_2303.JPG09/09/2008 19:42 3,478,028 IMG_2305.JPG09/09/2008 19:42 3,509,851 IMG_2306.JPG09/09/2008 19:42 2,627,625 IMG_2307.JPG09/09/2008 19:42 3,169,280 IMG_2308.JPG09/09/2008 19:42 2,859,546 IMG_2309.JPG09/09/2008 19:42 2,924,675 IMG_2310.JPG09/09/2008 19:42 2,518,022 IMG_2311.JPG
Let me step in here. Thank you for the log files. After checking with several engineers on our staff, our best causation of the 140 files missing / corrupt is as follows:
The 140 files stored on your external hard drive are inaccessible because they were stored with an older version of the CrashPlan Application that has a known issue with incorrectly checksum-ing stored files in a backup archive. We have corrected this issue in the last 12 months, and the current version of the CrashPlan Client Application backs up files with the correct checksum information.
Moving forward here, the best recommendation we can make is:
1. Restore your complete archive from your other backup destination (I believe this is [redacted]).
(verify that your restore is successful.) Then proceed to step 2:
2. Shutdown the CrashPlan Backup engine on [redacted] like this:
http://support.crashplan.com/doku.php/recipe/stop_and_ start_engine
3. Erase, delete or replace the backup archive that is stored on your external drive named "Folder: External 320G". Simply perform a file copy from [redacted] to your external drive.
Please note that since your external drive was created on 12/12/2008 and your archive on [redacted] was created on 2/7/2009, you will loose any file version information that was made between December 2008 and Feb. 2009.
4. Restart (start) the backup on [redacted], again:"[redacted]" was the name of the remote destination I also back up to -- since it's a colleague's name I've removed it from the above.
http://support.crashplan.com/doku.php/recipe/stop_and_ start_engine
If this seems like an unreasonable fix to this issue, please let me know.
I should mention at this point that none of my data has been irretrievably lost. My original PC is still here, and with some faffing around I can retrieve the missing files from it (or download them from the [redacted] offsite backup). But that is purely by luck. If all my backups had the same problem, which is not an unreasonable assumption, this data (140 digital photos) would have been lost forever.
I wasn't sure that I'd quite understood Matt correctly. In particular, with the reference to an older version of Crashplan I thought that perhaps he'd misunderstood, and assumed that the backup I was restoring from was only created with an older version of Crashplan. So we had the following exchange. First, me:
Matt's reply:
While it is the case that I first started backing up to "External 320G" using an older version of Crashplan, the Crashplan version has been regularly (auto)updated since then. The specific set of steps I carried out to do the restore was:
1. Power up PC #1 (runs XP SP3, Crashplan+, and is the machine that "External 320G" has been plugged in to for the last few years).
2. Verify (through the Crashplan UI) that Crashplan thinks that the backup of PC #1 to "External 320G" is complete. This is using the latest version of Crashplan (3.8.2010) because it auto updated earlier in the month.
3. Power down PC #1, power off the external drive, power up PC #2 (Windows 7), plug the external drive in to PC #2 and power it up. Install the latest version of Crashplan from crashplan.com, import the backup from the external drive, and attempt the restore.
That then generated the checksum errors for 140 files upon restore.
Are you saying that backups that were started with the older version of Crashplan may have this problem, and that simply using the newer version is not sufficient to correct the issue -- the corrupt backups need to be wiped, and the backup started afresh?
I've just reviewed the release notes going back to 12.10.2008, and don't see this mentioned.
Correct. The Backups being the backup archive on your External Drive. I am recommending:
1. Verifying the [redacted] Backup.
2. Wiping the External Drive.
3. Coping the [redacted] backup archive over to the External Drive.
Seem reasonable?At this point I'm still not quite convinced that I have this right. In particular, he's not correcting my assertion that this is a problem they've known about, and fixed with no notice in the release notes, and no mechanism to fix existing-but-broken backups. After all, this is a company that sells backup software (and sells an optional service whereby they'll host external backups for you). They wouldn't be that cavalier about the integrity of their customers' data, would they?
So I replied:
Matt's reply:
Well, I don't need to do that, because I've moved the data from the old machine by other means -- restoring from the backup was (supposed) to be the simplest way to do this.
However, I want to make sure that I understand you correctly. Are you saying that the following sequence of events:
1. Install Crashplan in 2008.2. Tell Crashplan to backup to an external drive.3. Let Crashplan autoupdate throughout 2008, 2009, and 2010, and continue to backup to the external drive throughout this period.
is sufficient to cause this corruption? This was not an external backup that I created once using an old version of Crashplan, and then put away -- the external drive has been attached to this PC almost continuously, and Crashplan (from the earlier 2008 version to the most recent March 2010 version) has been backing up to it on pretty much a daily basis.
I must ask why Crashplan doesn't warn about this -- big red flashing letters saying "Warning: You created this backup with a version of Crashplan that had checksum errors. You must delete this backup and start afresh".
Better still, why don't newer versions of Crashplan detect this and correct it automatically? http://b4.crashplan.com/consumer/features.html is quite explicit:
Once your files are backed up, CrashPlan continuously checks that your files are 100% healthy and ready to restore when you need them. If it finds any problems, CrashPlan fixes them.
This does not appear to have happened here.
I'm very concerned that based on what I've been told so far it seems as though an older version of Crashplan corrupted my backup, you released a fixed version without noting the fix in the release notes, but the fixed version does not correct prior instances of the problem.
Right now I do not have a warm fuzzy feeling about continuing to trust Crashplan with my data.
Yes, that is correct. This is what I am stating here.
I am also explaining that an older version of CrashPlan has a known issue -- that has been corrected in our newer versions of the CrashPlan Client. This known issue appears to have passed our nightly archive maint. check:
And only appears when you attempt to restore files.
Normally our website is correct; once you back a file up, there is no need to worry about your files. In your case, it appears from your archive that some of your files were backed up with a version of the CrashPlan Client with a known issue, and the newer versions CrashPlan Client's nightly archive maint. did not detect the problem in your archive. The problem here surfaced when you went to restore your external drive's archive, that is 99.996% fine, but 0.004% corrupted.
I am suggesting a course of action that brings you back to 100% fine, and throws away the archive that is 0.004% corrupted.
I can understand. Your feelings on CrashPlan are a conclusion you will need to come to on your own.
Let's keep in mind the facts here:
* Only one of our multiple-destination archives is having issues here.
* The one archive that has issues restored 29,742 correctly and failed to restore 140 files. That's a failure rate of 0.004%.
I agree -- this is not perfect. Perfection would be 100% data recovery. This is why CrashPlan Allows you to backup to multiple destinations. You should be able to achieve perfection of recovery by using your second archive; on your [redacted] computer.A couple of points here. Matt's skipped over my "Why doesn't Crashplan warn about this, and/or fix the problem automatically?" question. He also seems to think that you can quantify the effectiveness of a backup solution by taking the number of files, and divide that by the number that failed to restore as some sort of useful metric. That takes no account of the relative importance of the files -- these were photos, and irreplaceable, nor the absolute volume of data lost.
He also assumes that I can restore the files from the [redacted] site. While that may be possible (and I haven't tried, I haven't needed to) that backup was created by taking a copy of my local backup archive and giving it to my colleague, so it's entirely possible that that archive has the same problem as my local one.
And finally, the Crashplan site is quite explicit, "100% healthy and ready to restore". There's no equivocating around some-number-of-9s availability. They claim 100%.
My final message to Matt asked:
1. Will the next release of Crashplan detect this problem and fix it.
If not, when will it be fixed?
2. Why wasn't this problem called out in any of the release notes for versions released after the problem was detected?All, I think, reasonable questions, which are ducked in Matt's final reply.
3. Will you inform existing customers of this problem, and the need to wipe and restart existing backups if they're older than date?
It has been a pleasure working with you. It's clear that the technical recommendation I have made for you will correct the issue at hand here, and that the quoting of text back and fourth is leading our conversation in a circle. I want to bring you to a place that moves you forward, and the best way to do this is to end our conversation now.
I believe I have answered your questions repeatedly, and your questions are deviating away from solving your technical problem. By closing this conversation, I am hoping that you will take my recommendation in good faith, and apply it to your unique situation to move your backups with CrashPlan forward.Looking back through this discussion those three questions are not answered.
- There's no commitment that future versions of Crashplan will detect and fix this problem.
- There's no answer as to why Crashplan weren't honest about this problem in the release notes of the software once they detected and fixed it.
- And there's nothing to suggest that they'll inform existing customers of the problem.
You might also want to start thinking about trusting your data to a different organisation; and in particular one that values honesty when it notices and fixes a mistake that leads to data loss.
Has anyone got any recommendations?