Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

NetBSD - Live Network Backup

Posted by Zonk on Fri Apr 29, 2005 09:55 AM
from the de-backup-mouse dept.
dvl writes "It is possible but inconvenient to manually clone a hard disk drive remotely, using dd and netcat. der Mouse, a Montreal-based NetBSD developer, has developed tools that allow for automated, remote partition-level cloning to occur automatically on an opportunistic basis. A high-level description of the system has been posted at KernelTrap. This facility can be used to maintain complete duplicates of remote client laptop drives to a server system. This network mirroring facility will be presented at BSDCAN 2005 in Ottawa, ON on May 13-15."
+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • use rsync (Score:2, Informative)

    It's much less network and hardware intensitive and with the right parameters, will keep past revisions of every changed file. Your hard disks will live longer.
    • I was about to say the same thing...
      Are there really places where rsync is not enough and this bit by bit backup would be needed?
    • Re:use rsync (Score:5, Informative)

      by FreeLinux (555387) on Friday April 29 2005, @10:06AM (#12383656)
      This is a block level operation, whereas rsync is file level. With this system you can restore the disk image including partitions. Restoring from rsync would require you to create the partition, format the partition and the restore the files. Also, if you need the MBR...

      As the article says, this is drive imaging whereas rsync is file copying.
      • In most cases, file backups are better. Imaging a drive that is currently mounted writable and actively updated can produce a corrupt image on the backup. This is worse that what can happen when a machine is powered off and restarted. Because the sectors are read from the partition over a span of time, things can be extremely inconsistent. Drive imaging is safest only when the partition being copied is unmounted.

        The way I make backups is to run duplicate servers. Then I let rsync keep the data files i

        • Re:use rsync (Score:3, Interesting)

          From the article, it sounds like they are using a custom kernel module to intercept all output to the drive. This would keep things from getting corrupted, yes?
      • Restoring from rsync would require you to create the partition, format the partition and the restore the files.

        Sure, but that's not difficult. Systemimager [systemimager.org] for Linux keeps images of disks of remote systems via rsync, and has scripts that take care of partition tables and such.

        Yes, it's written for Linux, but it wouldn't be difficult to update it to work with NetBSD or any other OS. The reason it's Linux specific is that it makes some efforts to customize the image to match the destination machin

    • Re:use rsync (Score:2, Insightful)

      What's the fastest way to get a server running again after a disk crash? With rsync, if I backup /home and /etc, I still have to install and configure the OS and other software. That could take a significant amount of time (possibly days). Not to mention the time spent answering the phone (is the server down? when will it be back up?)

      But if I have a drive image, I could just put it on a spare server and be back up and running almost immediately. That would require an identical spare server though.

      What
      • Just make sure the backup server is properly configured (or very nearly so) I guess.

        Our nightly rsync backups have saved us many times from user mistakes (oops, I deleted this 3 months ago and I need it now), but we haven't had a chance to test our backup server in the event of losing one of our main servers. We figure we could have it up and running in a couple hours or less, since it's configured very closely to our other servers, be we won't know until we need it.

        • I recall the last place I was a developer at, we tested our IT department like that a few times haha.... We'd "simulate" a hardware failure. Usually by pulilng the power, but sometimes we'd get a little more scientific with it... Or we'd simulate a database crash and ask for a backup from our IT department.

          We were developers plagued with an IT department that wanted to take control of the application and add red tape to our deployment cycle. While we understood there was a place for it, we worked for a
          • It was IIS instead of Apache, and IIS is a piece of shit, so no one wanted to bother learning it because it was a piece of shit.

            Your IT dept. probably has more of a clue than you do.
            • I never made the choice for IIS, but I didn't use that as an excuse not to know how to administer it, and neither should they.

              Besides, there's many larger companies who use IIS than those guys...
    • rsync doesn't scale to huge numbers of files. It also doesn't work so well when all of those are changing at once. Finally, the protocol and algorithms may work for imaging an entire disk as if it was a file, but the program doesn't -- it can ONLY copy device nodes as device nodes, and will NEVER read a block device as a normal file. There have been patches to fix this, which have been rejected.

      We use a scheme which actually seems better for systems which are always on: DRBD for Linux [drbd.org]. Basically, ever
  • Pros and Cons (Score:5, Insightful)

    by teiresias (101481) on Friday April 29 2005, @10:01AM (#12383593)
    This would be an extremely sensitive server system. With everyones harddrive image just waiting to be blasted to a blank harddrive, the potential for misdeeds is staggering. Even in an offical capacity, I really feel uneasy if my boss was able to take a copy of my harddrive image and see what I've been working on. Admittely, yes it should all be work but here we are allowed a certain amount of freedom with our laptops and I wouldn't want to have that data at my bosses fingertips.

    On the flipside, this would be a boon to company network admins especially with employees at remote sites who have a hard crash.

    Another reason to build a high speed backbone. Getting my 80GB harddrive image from Seattle, while I'm in Norfolk would be a lot of downtime.
  • by LegendOfLink (574790) on Friday April 29 2005, @10:02AM (#12383599) Homepage
    ...when you get that idiot (and EVERY company has at least 1 of these guys) who calls you up asking if it's OK to defrag their hard-drive after downloading a virus or installing spyware. Then, when you tell them "NO", they just tell you that they did it anyways.

    Now we can just hit a button and restore everything, a few thousand miles away.

    The only thing left is to write code to block stupid people from reproducing.
    • by SecurityGuy (217807) on Friday April 29 2005, @11:06AM (#12384358)
      The only thing left is to write code to block stupid people from reproducing.


      Unfortunately the user interface for the relevant hardware has a very intuitive point and shoot interface.

      • The biggest problem usually is the virus and/or spyware will corrupt files. Inept Windows users for some reason think defragging a harddrive is the answer to every computer problem in the universe. They defrag, and next thing you know, you can't boot the machine up.

        Theoretically, a drive defrag should have no effect on how an operating system runs, only that it is re-sorting the physical drive to make file access faster. But for some reason, it messes things up.
          • not even the best recovery tools can get it
            There are forms of forensic data recovery which can sometimes work out the bit that was written before the current bit on a certain disk location. I've forgotten the details, but, it involves dismantling the drive and working on the platters with very expensive equipment.
  • by Bret Tobey (844402) on Friday April 29 2005, @10:02AM (#12383603) Homepage
    Assuming you can get around bandwidth monitoring, how long before this becomes incorporated into hacking tools. Add this to a little spyware and a zombie network and things get very interesting for poorly secured networks & computers.
  • by OutOfMemory (879817) on Friday April 29 2005, @10:04AM (#12383625)
    I've been using der Mouse to copy files for years. First I user der Mouse to click on the file, then I use der Mouse to drag it to a new location!
  • by hal2814 (725639) on Friday April 29 2005, @10:04AM (#12383631)
    Maybe setup is inconvenient. Remote backups using dd and ssh (our method) was a bit of a bear to initially setup, but thanks to shell scripting and cron and key agents, it hasn't given us any problems. I've seen a few guides with pretty straightforward and mostly universal instructions for this type of thing. That being said, I do hope this software will at least get people to start looking seriously at this type of backup since it lets you store a copy off-site.
  • If one tries to clone an FS that is active, can this cloning tool handle open/changin files (often the most important/recent-in-use files on the system)? I remember an odd bug in an Mac OS X cloning tool that would create massive/expanding copies of large files that were mid-download during a cloning.
  • by RealProgrammer (723725) on Friday April 29 2005, @10:26AM (#12383846) Homepage Journal
    While this is cool, as I thought when I saw it on KernelTrap, disk mirroring is useful in situations where the hardware is less reliable than the transaction. If you have e.g., an application-level way to back out of a write (an "undo" feature), then disk mirroring is your huckleberry.

    Most (all) of my quick restore needs result from users deleting or overwriting files - the hardware is more reliable than the transaction. I do have on-disk backups of the most important stuff, but sometimes they surprise me.

    I'd like a system library that would modify the rename(2), truncate(2), unlink(2), and write(2) calls to move the deleted stuff to some private directory (/.Trash, /.Recycler, whatever). Obviously the underlying routine would have to do its own garhage collection, deleting trash files by some FIFO or largest-older-first algorithm.

    Just a thought.
  • Novell Zenworks has had this capability for sometime in production environments. It also integrates with their management tools so it is easy to use on an entire network. To say this technology is newly discovered is a far cry from the truth. They also use Linux on the back end of the client to move the data to the server.

    It is nice though to have something like this in the open source world though. Competition is good.
  • Wacky idea (Score:3, Insightful)

    by JediTrainer (314273) on Friday April 29 2005, @10:44AM (#12384080)
    Maybe I should patent this. Ah well, I figure if I mention it now it should prevent someone else from doing so...

    I was thinking - I know how Ghost supports multicasting and such. I was thinking about how to take that to the next level. Something like Ghost meets BitTorrent.

    Wouldn't it be great to be able to image a drive, use multicast to get the data to as many machines as possible, but then use BitTorrent to get pieces to any machines that weren't able to listen to the multicast (ie it's on another subnet or something) and to pick up any pieces that were missed in the broadcast, or get the rest of the disk image if that particular machine joined in the session a little late and missed the first part?

    I think that would really rock if someone wanted to image hundreds of machines quickly and reliably.

    I'm thinking it'd be pretty cool to have that server set up, and find a way to cram the client onto a floppy or some sort of custom Knoppix. Find server, choose image, and now you're part of both the multicast AND the torrent. That should take care of error checking too, I guess.

    Anybody care to take thus further and/or shoot down the idea? :)
    • Wouldn't it be great to be able to image a drive, use multicast to get the data to as many machines as possible, but then use BitTorrent to get pieces to any machines that weren't able to listen to the multicast (ie it's on another subnet or something) and to pick up any pieces that were missed in the broadcast, or get the rest of the disk image if that particular machine joined in the session a little late and missed the first part?

      Multicast will work across subnets (you just need to set the TTL > 1).
        • Multicasted BitTorrent is a complete waste. The idea with multicast is that there is no real "load" on the sender -- you can run an open-loop multicaster with your image, and people can join the group to download it. Alternately, you can use a protocol like MTFTP to make it a bit more "on-demand".

          Either way, bittorrent is completely useless in an environment where multicast is available.
    • I must shoot down your idea. I have lots of experience with this sort of thing.

      then use BitTorrent to get pieces to any machines that weren't able to listen to the multicast (ie it's on another subnet or something) and to pick up any pieces that were missed in the broadcast, or get the rest of the disk image if that particular machine joined in the session a little late and missed the first part?

      Bittorrent poses NO advantage for this sort of thing. Why not just a regular network service, unicasting t

  • ghost 4 unix (Score:3, Interesting)

    by che.kai-jei (686930) on Friday April 29 2005, @11:14AM (#12384449)
    not the same?

    http://www.feyrer.de/g4u/ [feyrer.de]
  • I just took one of our mailservers offline a minute ago to do a block-level copy, so this would be fantastic. I develop images for our machines, e.g., mailserver, etc, and then dd them onto other drives. When I update one machine, I then go around and update the others with the new image. This saves me tons of time, and we do a similar thing with desktops and Norton Ghost (although, if I'm not mistaken, this actually a file level copy).

    And since we're running OpenBSD on those machines, porting this sho

  • How about disk cloning across servers, for on-demand scalability? As a single server reaches some operating limit, like monthly bandwidth quota, disk capacity, CPU load, etc, a watchdog process clones its disks to a fresh new server. The accumulating data partition may be omitted. A final script downs the old server's TCP/IP interface, and ups the new one with the old IP# (/etc/hostname has already been cloned over). It's like forking the whole server. A little more hacking could clone servers to handle loa
  • WTF (Score:5, Informative)

    by multipart/mixed (163409) on Friday April 29 2005, @11:52AM (#12384963)
    Why on earth are people always so insistent on doing raw-level dupes of disks?

    First of all, it means backing up a 40GB with 2 GB of data may actually take 40GB of bandwidth.

    Second of all, it means the disk geometries have to be compatible.

    Then, I have to wonder if there will be any wackiness with things like journals if you're only restoring a data drive and the kernel versions are different...

    I have been using ufsdump / ufsrestore on UNIX for ...decades!. It works great, and its trivial to pump over ssh:

    # ssh user@machine ufsdump 0f - /dev/rdsk/c0t0d0s0 | (cd /newdisk && ufsrestore f -)

    or


    # ufsdump 0f - /dev/rdsk/c0t0d0s0 | ssh user@machine 'cd /newdisk && ufsrestore 0f -' .. it even supports incremental dumps (see: "dump level"), which is the main reason to use it over tar (tar can to incremental with find . -newer X | tar -cf filename -T -, but it won't handle deletes).

    So -- WHY are you people so keen on bit-level dumps? Forensics? That doesn't seem to be what the folks above are commenting on.

    Is it just that open source UNIX derivative and clones don't have dump/restore utilities?
    • Re:WTF (Score:2, Interesting)

      I hear ya. We've been cloning our labs with dump/restore over the net for years. Works on everything: Solaris, *BSD, Linux. Wrapper scripts make it a one line command.

      I know some Linux distros don't come with dump/restore. Maybe that's why more people don't use it.
    • Re:WTF (Score:3, Interesting)

      Why on earth are people always so insistent on doing raw-level dupes of disks?

      I can think of a few reasons. It makes time-consuming partioning/formatting unnecesary. It does not require as much work to restore the bootable partion (ie. no need to bootstrap to run "lilo", "installboot" or whatnot). But mainly, because there are just no good backup tools...

      I have been using ufsdump / ufsrestore on UNIX for ...decades!. It works great, and its trivial to pump over ssh:

      Full dumps work fine, despite

    • Re:WTF (Score:3, Interesting)

      You missed the point. Here you only need to copy the image once and then all subsequent writes are done on both images at once (the on-disk and the network one). That means that everything after the initial copy (assuming you begin doing this on an existing fs) is as efficient and real-time as possible, requiring no polling for changes or any scheduling. It is essentially RAID1 over a network. Although it doesn't do much against system crashes (since neither side will have the final syncs and umount writes)
  • by RonBurk (543988) on Friday April 29 2005, @11:59AM (#12385069) Homepage Journal
    Image backups have great attraction. Restoring is done in one big whack, without having to deal with individual applications. Absolutely everything is backed up, so no worries about missing an individual file. etc. So why haven't image backups replaced all other forms of backup? The reason is the long list of drawbacks.

    • All your eggs are in one basket. If a single bit of your backup is wrong, then the restore could be screwed -- perhaps in subtle ways that you won't notice until it's too late to undo the damage.
    • Absolutely everything is backed up. If you've been root kitted, then that's backed up too. If you just destroyed a crucial file prior to the image backup, then that will be missing in the restore.
    • You really need the partition to be "dead" (unmounted) while it's being backed up. Beware solutions that claim to do "hot" image backups! It is not possible, in the general case, for a backup utility to handle the problem of data consistency. E.g., your application stores some configuration information on disk that happens to require two disk writes. The "hot" image backup software happens to backup the state of the disk after the first write, but before the second. If you then do an install, the disk is corrupted as far as that application is concerned. How many of your applications are paranoid enough to survive arbitrary disk corruption gracefully?
    • Size versus speed. Look at the curve of how fast disks are getting bigger. Then look at the curve of how fast disk transfer speeds are getting faster. As Jim Gray [microsoft.com] says, disks are starting to behave more like serial devices. If you've got a 200GB disk to image and you want to keep your backup window down to an hour, you're out of luck.
    • Lack of versioning. Most disk image backups don't offer versioning, certainly not at the file level. Yet that is perhaps the most common need for a backup -- I just messed up this file and would like to get yesterday's version back, preferably in a few seconds by just pointing and clicking.
    • Decreased testing. If you're using a versioned form of file backup, you probably get to test it on a fairly regular basis, as people restore accidental file deletions and the like. How often will you get to test your image backup this month? Then how much confidence can you have that the restore process will work when you really need it?

    Image backups certainly have their place for people who can understand their limitations. However, a good, automatic, versioning file backup is almost certainly a higher priority for most computer users. And under some circumstances, they might also want to go with RAID for home computers [backupcritic.com].

    • Image backups certainly have their place for people who can understand their limitations. However, a good, automatic, versioning file backup is almost certainly a higher priority for most computer users.

      Great. Now, could you please enlighten us as to what a good, automatic, versioning file-based backup system might consist of?

      AFAICT, this doesn't seem to exist. It doesn't matter how much sense it makes, or how perfect the idea is. It is simply unavailable.

      In fact, the glaring lack of such a capable s
      • Ummm. Well, there's DAR [linux.free.fr] and there's kdar [sourceforge.net]. I think there's even a win32 version for the clueless.

        It doesn't get much easier than this. You can have a sane, incremental backup setup in a single line cronjob or even point and click one up.

        If that's not simple enough for you then you have no business of storing or working with sensible data.
    • It's not that complicated. Disk image backups and file-level backups are not intended to serve the same purpose.

      Disk image backups are pure disaster recovery or deployment. Something is down and needs to be back up ASAP, where even the few minutes of recreating partitions and MBRs is unwanted. Or it's about deploying dozens or hundreds of client systems as quickly as possible with as few staff as possible.

      File level backups are insurance for users. Someone deletes/edits/breaks something important and
    • Re:Mac OS X (Score:3, Informative)

      by Anonymous Coward
      If you want something for OSX
      I'd suggest either
      CCC (Carbon Copy Cloner)
      ASR (Apple System Restore)
      Rsync
      Radmind

      Have fun on version tracker....
    • Use rsync and hardlinked snapshots. There are lots of examples out there. I rolled my own a while back, but if you want something relatively nicely polished and based on that idea, check out dirvish [dirvish.org] (I didn't find that until after I already had my system set up).

      I really like having several months worth of nightly snapshots, all conveniently accessible just like any other filesystem, and just taking up slightly more than the space of the changed files.
    • rsync is not scalable to large numbers of files. We set up a backuppc machine awhile ago, tried to rsync the entire backup set over to another machine... It was a miserable failure. Even if we didn't check for hardlinks, (which we have to, backuppc uses tons of hardlinks,) the rsync process completely saturated a gig of RAM before it even started syncing.

      Now, rsync would have been fine if we'd unmounted the filesystem and done it on the raw partition. But there's a couple of problems with that:

      It's no
    • RTFA: It responds to heavy load by making a log (journal?) of the blocks that need backing up, and then does them when the load is lesser. If you do it on swap, then you're insane and deserve whatever you get :)

      This is a good idea, even if its niche is small, but I'm interested in how it handles the encryption. If it doesn't allow key re-generation on the fly, HMACs, certificates (or at least PSKs) and other things we expect from modern (SSH, IPSec/IKE, etc) systems then it's not going to be very useful.