Crashplan: Part Two

Since my last post about Crashplan there have been some developments.

In a nutshell (and, like before, I'll start with the conclusion for those that want the Cliff Notes version, and then provide the details after that) Code 42 (the developers who produce Crashplan):
  • Continued to ignore my concerns and questions until after I'd written the blog post, and started linking to it from the comment sections of various online reviews of the software.
  • Repeatedly claimed that I've got facts wrong, when I'm just reiterating what their support engineer told me.
  • Changed their story about what the root cause of the problem was.
  • Denigrated their competitors in their replies dealing with this topic.
  • Silently deleted comments from their support forum
  • Continued to refuse to answer questions posed by customers (myself, and others)
To understand what happened you have to follow three distinct lines of communication that opened after I'd written the original post.

In the first, Matthew Dornquast, who identifies himself as "one of the developers/founding partners of Code 42 Software" opened a new Crashplan support ticket with me, and started a discussion there.

In the second, "Brian" starts a thread on Crashplan's support forums, "Silently corrupted data" which links to my blog post.  This attracts numerous comments.

And in the third, several people leave comments under my original blog post.  I notice these somewhat after the fact, as I didn't have the "Send me e-mail when people comment" option turned on in Blogger.  That's now remedied.

What follows is the unedited exchange between myself and Matthew in the new support ticket, along with a description and links to relevant entries in the Crashplan support forum thread.

One thing I want to call out up-front.  You'll note that midway through the exchange between myself and Matthew he asks that I keep his comments private, and not repost them.  I have chosen not to respect this requst for several reasons.

First, in the support thread there was concern expressed several times that I may have been paraphrasing what Code 42 staff had said.  I want there to be no doubt that this a full, unedited copy of the conversation.

Second, I don't believe it's appropriate for someone to say something and then note, after the fact, "Oh, by the way, that was off the record".  If you want to have those sorts of conversations then clearly establish that up front, not half way through.

Third, Matthew's comments are clearly intended to represent Code 42 -- he's writing in his professional capacity, not a private one.

Fourth, Matthew's comments in private contradict what another Code 42 employee wrote and that had already been quoted.

So, with that said, Matthew's first message, sent on March 30th, is as follows (misspellings and other grammar oddities retained):
Hi Nik,

I'm one of the developers/founding partners of Code 42 Software, we make CrashPlan. I read your blog post and saw several inaccuracies based on your interaction with one of our support agents. I wanted to track down the source, so I re-read the correspondence between our support staff and yourself. As I suspected, it's a case of miscommunication, we're to blame.

Rather than follow up on your blog post directly, I'd rather write you in hopes you can revise/clean up your initial blog post. I'm not trying to do damage control, you can write what you want. What I'm trying to do is share facts with you in hopes your personal trust in CrashPlan is restored.

If you have any questions about what I've written below, please give me a call! I'd love to chat with you. It's much faster that way- [number redacted -- Nik].
If you can agree with my points below, I'd appreciate you updating your blog entry to properly reflect the situation.. what you saw is real, but the cause is misdirected and frankly, you're kicking dirt on the very feature that makes us better than everyone else! :-)

Here are my points with your blog post.

re:"Previous versions of Crashplan have silently corrupted data that has been backed up."

This is absolutely not true. CrashPlan does not have a bug that silently corrupts data. Previous version of CrashPlan failed to detect all types of corruption that occur "in the wild" – bad disks, corrupt volume info, rebooting a machine instead of cleanly shutting down, etc. Again - CrashPlan did NOT corrupt your data, it just failed to detect the TYPE of corruption that had occurred. If we had corruption in our product, all your archives would have been corrupted. (And frankly, yours wouldn't be the ONLY negative post EVER on the internet.) This misunderstanding isn't your fault - it's ours. (Read my comments on support ticket below) The support engineer on our side was not communicating well.

Not only are we the only company that's so paranoid we're verifying your data at destinations (and trying to heal!) but we're also one of the few that verify every aspect of the restore as well! That part worked - it logged each and every file that didn't work. It's another failsafe we have. Other backup products don't verify the integrity of the restores! We actually detected and communicated to you the issues. Most products just write them out. Imagine if we'd just written out all your data - you would have trusted it worked. Months, maybe never, you would have attributed those few files failure to something else. Verifying everything at the mathematical level we do is expensive and time consuming. It took a lot of engineering to do.

re:"The team at Crashplan are aware of this. More recent versions of the software do not have this problem."

This statement is false because the first one is. The software does not have a backup issue, and does not corrupt data. The team at CrashPlan is aware of the fact we've improved our healing technology to include scenarios we otherwise had not considered.

re:"However, more recent versions of the software do not fix, acknowledge, or in any way indicate that some of the files in the backup are corrupt."

That's actually not true either. Every version of our software fixes, acknowledges, and repairs files that are corrupt. Do they detect, repair and fix every conceivable corruption possible? We can't say that for certain. What we can say for certain is every day we heal hundreds of thousands of issues discovered do to bad disks, reboots, corrupted file systems, etc. No other backup product works as hard as CrashPlan to make the inherently unreliable, reliable.

re:"Crashplan support appear to wholly unconcerned with this in a manner that means I no longer have faith in the product or their support. I leave you to determine the course of action that's right for you."

I agree with this if you limit it to the one agent. It's unfair to say our entire support group isn't concerned. (Hey, I'm writing you!) The reality is, there are a lot of agents here. Some new, some experienced. You got a new agent that did not properly recognize the seriousness of your event. He should have escalated this internally. This misunderstanding could have been avoided. I apologize for that. I would like to point out that we hire really great people, and we haven't outsourced our support overseas like our two main competitors have. Typically, we receive accolades for our support. Your experience (which is absolutely valid) is the exception, not the rule.

Here are my points on your support ticket with us. (Sorry for verbosity, I'm just talking informally.)

1. Mis communication by support engineer.

There was a misunderstanding as to what the source/cause was. CrashPlan is not the cause of the corruption, however it failed to heal around it. Healing around every possible situation is difficult to imagine, every release we have improves this healing. The corruption that occurred was not due to crashplan, it's just because it was a really old version, it had not been healed. The proof in what I'm saying is look at your other archive, it was fine! Your drive, your computer, the cable, something caused data corruption. It was NOT crashplan. Again, we just failed to discover and heal around that particular form of corruption that affected a very small % of the archive. I'm not making light of it, we want to be bulletproof on our healing technology. It's been improved many times since the issue you faced occurred. I strongly suspect you have a corrupt VTOC on that drive, have you run a disk repair utility on it?

2. You would not be able to reproduce the situation unless you reproduced the failure at the exact same moment. (i.e. disconnected drive, corrupted a block of data, etc.) Since the corruption was very small, my guess is it was a perfectly timed reboot as it was writing to drive. There is a good chance if you check integrity of the filesystem on that disk, there are issues.

3. I don't feel our support engineer properly conveyed a sense of urgency around this issue. My guess is he felt it wasn't as big of a deal as you had another destination and you had the data, but that doesn't make it any less serious that if it were your only source. Our agent should have treated this with a greater sense of urgency and spent more time explaining the details of this to you. Your faith in CrashPlan was unnecessarily shaken.

CrashPlan is the only product that automatically verifies destinations and attempts to heal around issues discovered through bad hardware (i.e. disks, disconnected cables, etc.) We've learned a lot over the last 3 years, we're continuously improving the feature. Please don't confuse the failure of a defensive feature with the core backup/restore engine. Our engine is 100% solid. Unfortunately, you had some corruption from an older backup that we did not heal around. You can't reproduce this, as we improved the way we store & verify data several times since then.

In summary - I'm sorry you had to post a blog entry to get the attention this deserves. I always tell support - "support is our marketing. Each person has the power to undo years of hard work." Already, your blog entry was linked to a recommendation, where now the guy might use something else other than us.. which I believe is the wrong call. This person wont get a more reliable product than CrashPlan. Who else supports multiple destinations, multiple levels of integrity checks, and attempts to heal around any and all corruption automatically?
I send the following reply on April 13th:
Matthew,

Sorry it's taken some time to reply to you -- a combination of training courses and vacation have left me away from the computer for an extended period of time.

On 30 March 2010 17:01, CrashPlan Support wrote:

Here are my points with your blog post.


re:"Previous versions of Crashplan have silently corrupted data that has been backed up."


This is absolutely not true. CrashPlan does not have a bug that silently corrupts data. Previous version of CrashPlan failed to detect all types of corruption that occur "in the wild" – bad disks, corrupt volume info, rebooting a machine instead of cleanly shutting down, etc. Again - CrashPlan did NOT corrupt your data, it just failed to detect the TYPE of corruption that had occurred.
This is semantics. It doesn't matter if it's the raw data that's corrupt or the checksum that is corrupt (and/or computed incorrectly).

It's also at complete odds with what your support engineer wrote.
re:"However, more recent versions of the software do not fix, acknowledge, or in any way indicate that some of the files in the backup are corrupt."


That's actually not true either. Every version of our software fixes, acknowledges, and repairs files that are corrupt. Do they detect, repair and fix every conceivable corruption possible? We can't say that for certain. What we can say for certain is every day we heal hundreds of thousands of issues discovered do to bad disks, reboots, corrupted file systems, etc.
You go out of your way to say this for certain on the Crashplan website. To quote this text again:

Once your files are backed up, CrashPlan continuously checks that your files are 100% healthy and ready to restore when you need them. If it finds any problems, CrashPlan fixes them. 
Not "99.9x% healthy".

Not "If it finds some problems".

If you can't stand by these statements in private then don't make them in public. And I don't expect to find out that files are unrestorable at the point when I do the restore -- note that the affected files date from 2008 and have not been modified since then, so there's been plenty of time for Crashplan's "continuous" file check to determine that they're not restorable and alert me.

To be clear -- software has bugs, hardware is not error free, I know this. However, if your promotional material tells me that using Crashplan means I don't have to worry about performing test restores then I'm going to be upset as a customer if it turns out that I have to.
Here are my points on your support ticket with us. (Sorry for verbosity, I'm just talking informally.)

1. Mis communication by support engineer.



There was a misunderstanding as to what the source/cause was. CrashPlan is not the cause of the corruption, however it failed to heal around it. Healing around every possible situation is difficult to imagine, every release we have improves this healing. The corruption that occurred was not due to crashplan, it's just because it was a really old version, it had not been healed.


The proof in what I'm saying is look at your other archive, it was fine!
No -- the other archive was never tested, I just copied the original files from the original PC.

Since the other (remote) archive was created by unplugging the USB drive that contained the corrupt archive and physically handing it to the person who hosts the remote archive, whereupon they imported it in to their Crashplan instance I don't see that there's any evidence for making any claims, positive or negative, about the health of the remote archive.
Your drive, your computer, the cable, something caused data corruption. It was NOT crashplan. Again, we just failed to discover and heal around that particular form of corruption that affected a very small % of the archive. I'm not making light of it, we want to be bulletproof on our healing technology. It's been improved many times since the issue you faced occurred. I strongly suspect you have a corrupt VTOC on that drive, have you run a disk repair utility on it?
Yes. No issues found.
2. You would not be able to reproduce the situation unless you reproduced the failure at the exact same moment. (i.e. disconnected drive, corrupted a block of data, etc.) Since the corruption was very small, my guess is it was a perfectly timed reboot as it was writing to drive. There is a good chance if you check integrity of the filesystem on that disk, there are issues.
Again, to be completely clear -- are you saying that the support engineer's statement that:
they were stored with an older version of the CrashPlan Application that has a known issue with incorrectly checksum-ing stored files in a backup archive.
is false, and that older versions of Crashplan did not have a known issue when checksumming stored files?
Matthew replies the same day.
no worries - i figured as much. I'm disappointed in your response - I was hoping you'd see the severity of your accusation and while a lot of your presumptions were based on a single miswritten statement from a support engineer, realize you jumped the gun on a few conclusions.

You're publicly saying "Crashplan corrupts data" and "they know about it" and "aren't communicating it."

That's ridiculous. That's false. You should have confirmed that understanding before writing them publicly as fact. Bloggers should fact check just like journalists if they're going to publish. (This is my opinion, hey, don't agree.. but as soon as you started promoting your blog by hijacking other threads, you crossed that line. IMHO.)

Why did you not try your other backup? It could have confirmed the checksum issue wasn't present in the product and was due to something else. Maybe we'll make progress on the source of the issue rather than assuming the worst?

Taking a junior support guys communication mistakes (he's a new hire, hasn't been here all that long) and blowing it up into "crashplan corrupts data" then hijacking our public positive threads about us is a bit over the top. If you had simply said, "Hey, can I talk to your supervisor? This doesn't make sense. It seems inconsistent with everything I'm reading/have heard including your website" it would have saved us a lot of pain and time.

I wouldn't hang google out to dry on what a single support person said in an email.. at least, not without agreement from their higher ups!

Finally, let's not loose perspective on how great CrashPlan is. Do you know of another backup product (free or otherwise) that backs up to multiple destinations, has multiple levels of checksums and protections, encrypts before transmission, and then ultimately tries to identify and heal around corruption at destinations asynchronously? One that proactively sends backup status reports to prevent silent failure? Do you seriously think spreading FUD about CrashPlan helps the consumer? What else will they use? Mozy? Carbonite? Give me a break.. While yours is the only negative thread I know of like this, literally thousands have failed to restore data with those products. They're not 1,000 times bigger.. or 50.. or even 10.

We're a market leader for a reason. We try really hard, and we care a ton. It's beyond frustrating that you take a junior guys mis-step in communication and blow it out like this. We're engineers that care, we're engineers that work really hard to make a great product, that which (I personally) think is far more reliable and secure than anything else out there. Where we fail, we have a culture of fixing things, of not accepting anything else less than perfect.

I'll post on our forums as much as I can about all the safeties we employ for your data.. hopefully you'll agree it's ridiculous how far we go to do a great job.. certainly farther than anyone else out there.. and ultimately.. what a disservice you're doing with your post.. again.. IMHO.

Sorry for the fragmented thread - jamming fast between meetings.

Also.. if I come across as harsh.. sorry.. I'm mostly frustrated at how this situation even developed.. it could have been avoided with better communication at the start. Had the support guy said, "we can't heal around all types of loss, you hit one, we have a theory on why and think we've improved it".. it would have went a better way.. you might have asked what do we do.. and then been satisfied with it.

And while you're free to post whatever you wont.. please don't? I'm writing you personally.. please respect my privacy.

As always, you're free to call me.. it can save a lot of time.. and confusion around typing.
 And my reply -- which is, at the time of writing, the final one on this ticket.
On 13 April 2010 22:29, CrashPlan Support wrote:

no worries - i figured as much. I'm disappointed in your response - I was hoping you'd see the severity of your accusation and while a lot of your presumptions were based on a single miswritten statement from a support engineer, realize you jumped the gun on a few conclusions.
You're publicly saying "Crashplan corrupts data" and "they know about it" and "aren't communicating it."


That's ridiculous. That's false. You should have confirmed that understanding before writing them publicly as fact.
I did confirm it -- with the support engineer. See my message of Mar 24 4:32,
Are you saying that backups that were started with the older version of Crashplan may have this problem, and that simply using the newer version is not sufficient to correct the issue -- the corrupt backups need to be wiped, and the backup started afresh?
Correct. [...]
See also my message of Mar 25 4:01, where I write "Are you saying that the following sequence of events [...] is sufficient to cause this corruption?"

To which your engineer replies "Yes, that is correct".

In two different messages there I said that Crashplan was causing corruption, and your engineer confirmed what I was saying, and did not quibble with my use of "corrupt" and "corruption".

While you may feel that this is one of your junior engineers speaking out of turn (I note that he was the third engineer that replied to the ticket, so from my perspective it's been escalated to more senior engineers twice) I think it is unreasonable of you to expect me to know the ins and outs of your staffing organisation.
Why did you not try your other backup?
Because it's at the other end of a very slow Internet connection, and the original files were sitting on a PC a few feet away from the one I was trying to restore the files to.
Taking a junior support guys communication mistakes (he's a new hire, hasn't been here all that long) and blowing it up into "crashplan corrupts data" then hijacking our public positive threads about us is a bit over the top. If you had simply said, "Hey, can I talk to your supervisor? This doesn't make sense. It seems inconsistent with everything I'm reading/have heard including your website" it would have saved us a lot of pain and time.
As I say:

1. Support agreed with the use of the term "corrupt".

2. This was the third person who'd chimed in on the support ticket -- from my perspective I was already talking to someone senior.

3. The final response to the ticket left several questions that I'd asked open, with a complete refusal to answer them. As far as I was concerned Crashplan was done talking to me.
Finally, let's not loose perspective on how great CrashPlan is. Do you know of another backup product (free or otherwise) that backs up to multiple destinations, has multiple levels of checksums and protections, encrypts before transmission, and then ultimately tries to identify and heal around corruption at destinations asynchronously? One that proactively sends backup status reports to prevent silent failure? Do you seriously think spreading FUD about CrashPlan helps the consumer? What else will they use? Mozy? Carbonite? Give me a break.. While yours is the only negative thread I know of like this, literally thousands have failed to restore data with those products. They're not 1,000 times bigger.. or 50.. or even 10.
With respect, this is irrelevant to the issue at hand. I also find it crass that a significant part of your response to my concerns involves denigrating your competitors.

And you still haven't answered the questions that I've asked. Again, to be completely clear -- are you saying that the support engineer's statement that:

they were stored with an older version of the CrashPlan Application that has a known issue with incorrectly checksum-ing stored files in a backup archive.
is false, and that older versions of Crashplan did not have a known issue when checksumming stored files?
And while you're free to post whatever you wont.. please don't? I'm writing you personally.. please respect my privacy.
I reserve the right to

a) Post copies of this to the existing support thread (https://crashplan.zendesk.com/entries/140286-silently-corrupted-data) and

b) Excerpt some or all of the text for what I'm sure will be a followup post to the blog. One, I hope, that I will be writing after this is resolved satisfactorily.
Having read that it may be instructive to read the Crashplan support forum thread sparked by the initial blog post.  I didn't contribute to the thread until page two, because until that point I was unaware of its existence.  I post some entries to the thread clarifying that the quote acknowledging that this was caused by a Crashplan bug is a direct quote from a Crashplan employee, and two further messages that are copies of the ongoing correspondance with Crashplan (the messages quoted above).

In response to this, Code 42 silently deleted those two messages (by which I mean -- they are removed from the forum, and there is no indication that they were ever there, no placeholder that says something like "This message removed by a moderator" or similar).  Handily, their forum software (optionally) e-mails participants in a thread a copy of new posts, so at least some other contributors there saw them (and quoted them in replies).

Matthew then posts a message headed "SITUATION SUMMARY" in the thread (I can't link to it directly, their forum software does not allow you to link to individual posts, it's about half way down page 2).  I'm not going to reproduce it here, partly because it'll make a lengthy post even lengthier, and partly because you can see the responses and questions that other customers asked that forum (assuming they haven't been deleted).

You'll also note that those questions haven't been answered.

So, I'm still looking for alternative backup software.  Some cursory searching has turned up WualaSpider OakJungle Disk, as well as those two competitors that Matthew was so quick to trash, Mozy and Carbonite.

Does anyone have any positive experiences about them to relate?

22 comments:

  1. iDrive is fast and resonably priced for an online backup solution. This is my 2nd year and I have restored data as needed immediately and FAST..

    ReplyDelete
  2. The moment I read you equating the "active corruption of your data" with "not detecting that you corrupted your own data", I realized you're just that irrational screaming customer at the help-desk. You don't actually seem to want any help, but looking for someone to vent your rage on.

    So here it is, so you can hear it, get over your indignant rage, and grow up:
    You're right, and they're wrong. Your great ordeal was entirely their fault, and they are pathetic simpletons whose malicious ignorance caused you great inconvenience and emotional pain. They would admit that they are sorry if they didn't think your furious righteousness at hearing such an apology would crush them into a fine atomic mist and scatter them across the universe.

    Chill out, and stop being that guy.

    ReplyDelete
  3. Your entire argument is based on what the original engineer told you. Matthew has tried to tell you over and over that what the original engineer said was wrong. You refuse to believe Matthew, and rather keep going back to the original comments. This dispute will thus never be resolved.

    Oftentimes it is very difficult to communicate difficult issues via email. I'm surprised to learn that he offered to resolve the issue with you on the phone. Rather than take advantage of his offer, you chose to maintain the pointless exchange and then post it on your blog. I agree with a previous poster, that you come across as an "irrational screaming customer at the help-desk," we've all seen those before.

    You put Matthew and CrashPlan in a very difficult situation. You are a lone voice that cannot be appeased or persuaded, that I am sure they would like to ignore. At the same time the internet gives you the power to be an influential voice. What should they do? The only resolution you would accept would be for them to admit they have duped us all into thinking they could backup our data, and close up shop. That is your preferred resolution, even if it is all based on the misstatements of one employee that have since been addressed.

    ReplyDelete
  4. I agree with the previous two posts... their engineer was wrong, you know it, they admit it, everyone reading knows it. He said some things that, if true, would be incredibly damning for CrashPlan.

    However, CP has bent over backwards explaining over and over again why what he said was not true, and apologizing for the misinformation, and the continued novellas you write about it do not change that underlying fact. I agree with the previous two posts, it's time to move on.

    ReplyDelete
  5. To my three previous commentators. You're rather missing the point.

    To boil it down -- Crashplan are making claims about the way their software behaves in their marketing materials. Based on my experience these claims are false.

    I'd be "appeased" if Crashplan stopped making these claims, and instead provided more detail on failure modes that Crashplan doesn't work around, so that customers can make informed decisions about other measures to take as part of their backup regime.

    I note that from the support thread (https://crashplan.zendesk.com/entries/140286-silently-corrupted-data) there are examples of corruption that will occur that won't be reported on by Crashplan until you come to restore the files. By which point it's really a bit late.

    Or, y'know, they could make sure the software does actually meet the claims they make for it. The affected files were backed up in 2008, not modified on the source computer after that, and restored in 2010 -- that's plenty of time for the software to check that they're still restorable. I'd be overjoyed if (for example) the software provided a dashboard that shows how recently each file in the archive was checked for restorability (and if the marketing material dropped the word "continuously" and used something more accurate).

    Even better would be diagnostics that reported on the frequency of test restore failures. Something along the lines of "Hey, we've done 10 test restores in the last 3 days, and 7 of them have failed -- it looks like this backup location is flaky, you might want to buy a new disk" (and note that Matthew's assertion that this must be down to hardware issues at my end is not borne out by any tests on the drive that I've done).

    ReplyDelete
  6. Nik, I had some sympathy for you at the start of this saga. But now I have none. You have taken this way too far.

    I particularly don’t think it’s fair that you posted links to your slander on other forums. And your argument that publishing a private conversation is acceptable, even after you’ve been asked not to, is very weak.

    Two positives came out of this for me. Firstly, no other CrashPlan customers have came forward with similar stories (which should have been a warning to you that this was a fault caused by your hardware, not the product)

    The only other positive is that your disrespectful, slanderous post above is there for the world to see. And let me tell you, it says a lot more about you than it does about CrashPlan.

    ReplyDelete
  7. I really don't like the way that the CrashPlan folks treated your problem, especially since you made so many efforts to get clear about what they were claiming.

    I have to say that I'm tempted to conclude that the original engineer wasn't wrong, he was just sharing information that CP would rather not have shared.

    ReplyDelete
  8. I happened on this discussion when I was completing—for myself & for colleagues—an in-depth review of a great many backup products & strategies. I proceeded to install CrashPlan on numerous PCs, bought a CrashPlan+ license, & opened a CrashPlan Central account. I’ve since recommended that my colleagues do the same. While there are numerous features I’d like to see added or changed, I’m obviously satisfied that CP is the best offering on the market today. My point is to emphasize that while this forum topic generated some concern, it did not put me off the product.

    That said, I feel that CP—though well-intentioned—is too convinced of its own infallibility.

    A healthy organization listens to its customers, & improves & matures by doing so. That seems to be happening selectively here.

    It’s always important to “consider the source.” As this debate evolved, I checked Nik’s blog & noted that he’s a Site Reliability Engineer for Google. That doesn’t make him omniscient, but it does make him worth listening to…particularly when he opines on technical issues that are in his field of expertise.

    I’m not qualified to judge the more detailed & arcane aspects of the technical discussion between CP & Nik. I’m perfectly qualified to judge whether the forum exchange was handled in civil & fair fashion. As stated before, all posters expressed legitimate concerns & opinions, & all did so respectfully. There was no rudeness, or foul language, or flaming caps, or anything most folks would consider offensive.

    Yet—user posts were quietly removed without explanation. A company spokesman later said he did so because the removed posts “really didn't add anything to the discussion.” Further—he has now made clear that he’ll remove posts again if, in his judgment, such action is appropriate.

    Subjective editing of user forums is the best way I know for a company to erode faith in its offering.

    Personally, I’ll always maintain at least some backups in native file format because I don’t want to trust my data exclusively to ANY proprietary encoding system, whether CrashPlan’s or competitiors’. However, I find CP extremely useful, am wholly satisfied with its products & largely satisfied with its service. That said, I hope the company learns from this experience to be more transparent & less defensive, & to embrace constructive criticism from customers to improve & enhance its products.

    By the way, there was nothing slanderous in Nik’s post, nor on his blog. Defamatory statements in print are considered libel, not slander…& both terms are used without justification far too often. I do not agree with everything Nik said, or with the way it was presented, but I see nothing malicious & nothing that is inarguably false, both criteria being essential components of libel.

    ReplyDelete
  9. Nik,
    Stick to your guns. The product (Crashplan) has many nice points and is quite a polished looking app, but I think the reporting on the back end is not telling the whole story (and thus causing me to not trust it either). I have tried to find issues that might be logged to indicate failures on the "healing" of backup files, but have not been able to find any as of yet. My backup motto has and is still, "a backup that is not restored and verified is not a backup".

    ReplyDelete
  10. Nik, the data corruption point is unknown at this time and will probably never be known. Neither you (or CrashPlan) have enough information to correctly identify the true cause.

    You are speculating it was caused by CrashPlan and they are speculating it was caused by other hardware/software issues. Not enough information was gathered at the error detection point to be 100% sure.

    Only you would have been able to provide the additional needed data and it was unfortunately removed by you. As a reliability engineer I would expect you to full realize this and be a little more understanding of the situation.

    As a CrashPlan user, I am happy it was able to detect this situation, even if it was later in the process than I would have liked.

    ReplyDelete
  11. I don't think Nik is trying to be malicious or slanderous here, I think he's just very irritated by a product that he trusted, based on it's documentation and sales blurb - and it let him down big-time, and then the support let him down again, and then he escalated that and got let down again.

    You've got to admit, that's got to be pretty annoying.

    ReplyDelete
  12. Irritated? Or just disappointed…

    "(and note that Matthew's assertion that this must be down to hardware issues at my end is not borne out by any tests on the drive that I've done)."

    @Nik, which tests have you done on the drive?

    CrashPlan's manual healing should be executed while pressing the "Compact" button according to .

    @Nik,
    - have you ever executed this "Compaction"?
    - where there changes in the restorability for these corrupted files?
    - was it healed or not?

    As it is interesting to see whether this healing on Compact has improved since the 1/6/09 version.

    ReplyDelete
  13. Thanks for the heads up Nik - it seems the corruption issues are ongoing with reports posted as recently as yesterday regarding the latest version:

    "My sister-in-law purchased the mp3 files after the 3.8.2010 version of CrashPlan came out, so that is the only version of CrashPlan that has backed up the files. When I restored the files to her new laptop and to my computer, both times, 7 of the mp3 files from that album are corrupt. I just checked my computer and 5 of the files are still corrupt when I restore them."

    I also note that CrashPlan continue to make questionable claims about what they call Guaranteed Restore™:

    "Each night CrashPlan verifies that all your backup files are valid and can be restored. If CrashPlan discovers an error, it recovers automatically or notifies you if it cannot."

    I'd personally like to see a *lot* more transparency from CrashPlan on the issue and find attacking customers to be appalling behaviour. If they got it wrong, fine - it happens to the best of us - but until they admit it and warn customers who may be affected I'll be holding off on migrating from Mozy and recommending others do the same.

    ReplyDelete
  14. @ The anonymous commentator above Sam. To answer your questions:

    1. Windows (XP and version 7) drive checks show no issues with the filesystem on the USB drive. I also plugged it in to a FreeBSD box I have handy, made a complete image of the disk which succeeded with no errors, and was able to mount that disk image back on to the Windows 7 system.

    2. No, I did not try the compaction option. At the time I wasn't aware of it, and you should recall that Crashplan Support's solution to the problem (once they'd told me that it was due to a bug in the software) was to delete the backup entirely and start from fresh. Once I'd followed this advice the local backup was gone, as was the remote ones.

    ReplyDelete
  15. @Nik

    1. No problems found with both XP, 7 and FreeBSD disk check tools are still no guarantee that every bit of the archive can be read by the OS. Did you run tests that verified each (used) block/sector on the disk (these tests or the ones that run for multiple hours) or were the tests more FAT/NTFS structure tests (usual runtime < 15 minutes when no problems are found or < 90 minutes when problems, note these numbers are my experiences on 200-300GB sized drives)?

    2. That support could have done a better job in this case, is something where both parties can agree upon, isn't it?

    3. Are you still using CrashPlan?

    @ Sam, the "guaranteed restore" claim is maybe not so questionable, only requires good reading. And I must admit that me myself were also in the category of bad readers.

    Quote: "Unlike hard drives, DVDs, CDs, and tapes, with CrashPlan you know your data can be restored." Because the sentence start with "hard drives" these "hard drives" are (like the other mentioned media) excluded from the "guaranteed restore" claim. Which makes the guaranteed restore claim only applicable to the 'online destinations'. From the perspective of marketing this is a truly ++ solution of making people think something that (on the contrary) is not being said (and done).

    A more clear rephrase might be (but I don't know how far from reality that is): "Unlike hard drives, DVDs, CDs, and tapes, with CrashPlan Central you know your data can be restored. How? Each night CrashPlan Central servers verify that all your backup files are valid and can be restored." The claim does not apply to local destinations, neither to friends. As each friend can specify their own maintenance interval, default was 7 days when I remember correctly. And I still don't know if such routine maintenance does run a "full scan" (like pressing the [Compact] button) or some partial/quick hack type of scan.

    ReplyDelete
  16. This whole thing and Crashplan's multiple answers actually made me feel safer about CP !

    You reported an actual issue, and got a not-so-good answer at first - then, thanks to your insistence, we all benefited from answers and information from CP, which is the part that makes me feel safer about it.

    Sadly you then went in nitpicking mode and stopped adding anything new (not to mention the thread hijacking part). Gotta know when to let it go !

    About other backup software, now that you know more, I hope when you try them you'll try to corrupt some backups manually, see what happens.

    ReplyDelete
  17. Wow, what a saga. I've read everything now, here and at the CrashPlan forums. It is not really that complicated, and never was.

    1. The method of corruption and many other facts that were debated are irrelevant. There was only ever one interesting issue to discuss: Nik had an example of a failure of the archive maintenance feature. Period. The archive maintenance feature is a failsafe device. If corruption of any kind occurs for any reason, it is supposed to detect it (and presumably take advantage of any opportunity to fix it - but even that is secondary). The maintenance routine ran frequently enough that it should have had ample opportunity to detect the corruption, but did not. With checksums there are no ifs or buts. There are no corner cases to consider. Either the maintenance routine checked the file and erroneously thought it was fine, in spite of what ought to have been a checksum mis-match, or else it did not check the file, which would also be erroneous. The one possibility that would absolve CrashPlan is if the corruption occurred immediately prior to the restoration attempt, in a small enough window that the maintenance routine was (rightly) never run. With so many files affected that does not seem likely, but while various logs might have revealed more about that likelihood I did not see the question pursued by either party.

    2. Matt from CrashPlan is an average communicator. That is not an insult or a complaint. But it does mean that his communications are not going to hold up to intense scrutiny. Most people's do not. That doesn't mean he is covering something up or is ignorant of things. It means you can't know either way.

    3. Nik was not always charitable and constructive in his communications. Since both parties allowed themselves to be sidetracked by the other's mistakes (instead of trying their best to fix them or set them aside) only small parts of the saga were constructive.

    4. I remain confused as to the exact method of verification used in the archive maintenance routine. I know that checksums are involved. Is there a shortcut being used that allowed this particular circumstance to occur? Was there a bug related to checksum generation, comparison or storage? That possibility - first given as a reason to Nik by The Unfortunate Junior and then disclaimed by Matt - is actually a very plausible explanation. Perhaps that is why Nik didn't want to let go of it so easily.

    5. CrashPlan shouldn't be worrying about proprietary techniques. I could sit down and map out ten different ways to do what they are doing, and so could all of their competitors. Ideas are a dime a dozen. It is implementation that is hard. It is building a company and getting customers that is hard. It is maintaining a strong operating record and the faith of those customers that is hard. I'm sure they have a few truly remarkable ideas baked into their code, but those ideas are not what the strength of the company (or the product) depends on. Speaking of which, this is also why they should maintain an open-source archive reader like Arq does. In fact they ought to package it with each archive, right next to the passphrase-protected key!

    ReplyDelete
  18. Nik, ignore those people who say you were rude, or nitpicky. You were (and are) 100% totally right to pursue this. Crashplan in its current incarnation cannot be trusted. And that is an unforgiveable sin for a data backup program. Contrary to Crashplan's marketing claims, archives are NOT checked for data corruption (after the fact they tell you you have to manually click on the "compact button"????) The program has some really great features so I really want to use it, but if I can't trust the data, then the program is totally useless to me. I will be watching for the next release and hopefully some improvements will be made (Crashplan, are you listening???) to the data integrity check features, that will relieve our concerns. Until then, to reiterate...Crashplan cannot be trusted. To confirm your issues with Crashplan Nik, here is another data integrity issue I have personally had with Crashplan when I tested it (as of sept 2010)...

    Basically I noticed that on one test client machine 2 files were not being backed up because of an "access denied" problem...I found this by accident because there was no error logged on the client or alert triggered on the crashplan server. I went manually into the history logs on the client and noticed the problem. This error condition went on for 26 days! and no alert was triggered by Crashplan. Backup reports for the client indicated 100% successful backups. I contacted Crashplan support about this.

    Crashplan responded exactly as they did to your issue...not answer the question, deny there is a problem...try to make it sound like a "feature request" when it is a major bug, promise to fix it in the next release etc...then try to take the discussion offline so there is no record of the discussion...

    Eventually I asked what the server parameter "alert if no backup for x days" was supposed to do. I had assumed that it meant that an alert would be triggered if a client had not had a successful (meaning ALL files backed up) backup in x days. Crashplan support responded that it meant the alert would only be triggered if no data AT ALL had been transmitted to ANY destination in x days...WTF kind of an alert is that? Totally useless...So, not only can you not trust the integrity of the data in your archive, you cannot even trust that your clients have actually backed up your selected files at all. (Remember Crashplan indicated 100% successful backup. To reiterate...Crashplan cannot be trusted!

    Crashplan, get your act together, and fix the data integrity problems. Without that, your product is doomed.

    ReplyDelete
  19. I tried to post a comment over the weekend about Crashplan not able to see previous version after one of my hard drive crashed. Finally, after 3 days of synchronization... all the previous versions are back. I am quite impress by how Crashplan works and how the software repaired the integrity of the data by itself. A keeper !

    ReplyDelete
  20. I think a previous poster said it best:
    "...I feel that CP—though well-intentioned—is too convinced of its own infallibility."

    For what its worth I had a similar experience with Matthew D. - very frustrating to get him to see the facts and admit any fallibility when it is something he doesn't want to hear or believe.

    I just love the argument "no one else has reported this issue so it must not exist" - this is a classic fallacy resorted to by companies who would rather be right than get to the truth.

    To this day, Crashplan pro will report backups 100% even when files are skipped for various reasons. Believe its reporting at your own risk. Besides this and their attitude, it is otherwise a pretty good product overall.

    ReplyDelete
  21. Its always best to keep a physical backup handy just incase something happens to the online backup. i think totally i have about 3 backups of all my data.

    ReplyDelete
  22. Did you check the IP addresses of the first three comments? I'd say they were written by CrashPlan staff. I just purchased another subscription to CrashPlan, but after reading about the way you have been treated, irregardless of what caused the problem, is making me look to alternatives. I am considering migrating to Amazon or Google. It's appalling and embarrassing for them to use that tone and language with you. Matthew's second email almost resembles a threat. What a pathetic and childish way to handle the issue. So bizarre.

    ReplyDelete