Thursday, February 01, 2007

New nbackup (or online dump) myth

Today I've made some simple tests of Firebird's nbackup and InterBase 2007 online dump features.
They have some differences, but one main point. Let's describe what nbackup or online dump is:

You know that if you try to copy database when connections are active, you'll get broken copy. Because copying works sequentially, while database is a random access file. The idea of nbackup and online dump is to copy pages (by server) from database using timestamp marks. When such copying starts, server does not write changes to database file, but into special temporary file. When copy is done, server writes those modified pages back to database (and maybe to online dump). So, original database can be read and written during this process, and resulting copy is consistent.

This is fine, and according to nbackup documentation you can organize incremental backups, using "backup levels". When you make nbackup level 0, full copy of database is made. When you make nbackup level 1, only changes from previous full nbackup is written to difference file. And so on.
Example of "practical nbackup application" says that you can use full backup monthly, level 1 backup daily, and level 2 backup hourly. This is fine, but "hourly backups" can become impossible.
Why these "hourly backups" looks suspicious to me?

Because each time you run nbackup (any level) in Firebird or online dump in InterBase 2007, server reads every page in original database.
I took 13gb database from tpc-c test to check what time is needed to read the whole database.
At first I tried to copy database from one SATA drive to another. This took 6 minutes.
Then I created backup level 0 with FB and online dump with IB 2007. Both reads database and writes it's full copy:
  • Firebird - 6 minutes
  • InterBase 2007 - 8 minutes
Not a big difference, but let's run test again (nbackup level 1, and online dump):
  • Firebird - 6 minutes
  • InterBase 2007 - 6 minutes
This time both servers only reads original database, and writes nothing, because database was not changed.

So, there is no difference what backup level you use - database will be read 100% each time. And, you see that reading 13gb database took 6 minutes when no users were connected. I think that when there will be some activity, "backup" time will be definitely longer. Since reading each page of the database is not an easy operation for HDD, this will of course lower server performance.

Here, I think that running nbackup or online dump each hour for 10gb database is not a good idea. And I'm sure that if you have ~50gb database, you will not be able to do incremental backups each hour, when users work with database.
If you don't believe this - make this simple test, and send me results.

BTW: At least you must be sure that you have fast and well configured RAID controller, and disks that you use for database and its "copy" have the same performance.

p.s. Don't try to place incremental backups on the same hard drive, even on different logical disks - if your hard drive fails, you will loose both - database and backup. And, nbackup/online dump time will be 2-3 times slower.


Dmitri Kuzmenko said...

heard today about one system, able to run nbackup on 25gb database in 3 minutes. Anyway, this process takes lot of server resources.

Sean Leyne said...

1) You have misinterpreted the runtime number in your tests. In fact they confirm the complete opposite of your " ...but "hourly backups" can become impossible..." statement.

First, the length of time that it takes to run a backup has very little to do with the need to run the backup.

There are plenty of cases of systems which are so large that running GBak takes over 24 hours to run. So, are you saying that those installs shouldn't be running backups? (I know that isn't your point)

As to your runtime numbers, They prove that:

It would take 6 minutes to perform an incremental backup of the database even if EVERY page in the database had been changed since the last backup

If it takes 6 minutes to backup/copy the full database it will take 6 minutes to create the worst incremental backup possible!

The fact that the incremental backup read every database page is just the reality of the implementation... It doesn't make the implementation bad.

Sean Leyne said...

Your comment "...this process takes lot of server resources." is very misleading.

It suggests that somehow the Nbackup process is faulty/badly designed. Nothing could be further from the truth.

First, all backup processes are server intensive. Ever tried to use an MS Exchange system while a backup is being performed?

It is only the fact that most backups today are done to tape which makes the impact on the backups tolerable -- tape drives are possibly the slowest devices which we connect to servers today, so the tape backup software has to wait for the drive to pass along the next block of data.

Because NBackup is used to write backups to disk, it is able to use the available disk IO subsystem to it's advantage. (An incremental backup in 3 minutes is damn good!)

So to blame NBackup for using a lot of server resources is unfair -- it is using the resources that the OS is providing/managing.

A fairer comparison for NBackup is a disk-to-disk backup. Try that and compare how well it does in using server resources! I'm sure that it will be no better than NBackup.

Finally, remember that the slowest operation for disk subsystems is writing data. So, if you are backing upto another drive, the master drive will still have bandwidth to handle database engine requests.

BTW: It could be possible to add an new option/parameter to the NBackup process to slow the process down, to reduce the impact on the server, but the backup would run longer. The problem is trying to determine the best way to manage the slowdown, since the issue is that the disk subsystem is being maxed-out, so that if the disk system is not being used, you don't need to slow down as much.

Dmitri Kuzmenko said...

Sean, thanks for you comments.

1. Yes, I know systems with even GFIX (not gbak) running more than 24 hours. This can happen on good hardware with 100gb database.

6 minutes for nbackup - this is for "single-user mode". When no one connection touch server. And this is for 13gb database.
So, if we take 50gb database, and check nbackup time, and also add about 20-50 active connections, nbackup session will have reasons not to finish in 1 hour.

2. I'm not blaming nbackup. I want to make attention that HDDs must be good, and nbk files must be written on another physical disks. And that nbackup will "fill" windows file cache same as copying database file. All this can be hard for system (slower nbackup) and for users that work during nbackup process.

So, really I have nothing to disagree with you. :-)