Monday, September 01, 2008

What page size I should use in my database?

It's a common question with interesting background from developers . As I can suppose from IBSurgeon statistics, 99% of Firebird and InterBase developers use default page size: since the old ages it is 1024 bytes, and only for Firebird 2.0 it is changed to 4096 by default.
There are 4 main things related with page size: indices depth, database cache size, records per page quantity and disk cluster size.

The shortest answer to this question is to use 4k, 8k or 16k page size. That's it.
If you want to go a bit deeper, read the following:

Index depth
Indices depth is quite obvious thing - since the basic internal structure of the indices is a B-tree, index depth depends on page size, and too small page size (1024-2048) will produce 4-levels trees even at moderate amounts of data (several hundred thousands of integer-based
keys). It's recommended that index depth should be no more than 3. (you can check indices depth using IBAnalyst... and some other things too).
For tables with hundred millions records page size should be the largest. For example, if there is a 250Gb database contained billing information, so it must have page size = 16384.

Database cache
You probably know that Firebird and InterBase has database cache, it is counted as buffers * page size (in megabytes).
Rookie developers think that the more database cache is the better it work (another idea is "I want to use all my RAM").

The problem that mostly popular Windows versions of Firebird (and all InterBase) are 32-bit, so in theory there is only 4Gb of addressed memory available to use. Actually Windows restricts this size maximum to 2Gb per process by default (it can be changed to 3Gb in config.ini).
So, if you set 100000 buffers (using SuperServer, not Classic, of course) in the firebird.conf
DefaultDbCachePages = 100000
or in ibconfig
and each page is 16k, the cache will consume 1.6Gb (at
first connection). And it will prevent to open the second database due to arithmetic fact 3Gb-1.6Gb = 1.4.
Of course, this applies mostly for SuperServer, due to the fact that Classic (Firebird only) allocates buffers for each connection (1.6Gb per connection is too much, really), not per database (regardless of connections).
And the most interesting thing that allocation of such a big cache does not bring a boost in performance - in some test it even decreases performance. Actually engine effectively "collaborates" with operation system's cache and allows OS to cache pieces of data being often read and write, and the main (but not only) purpose of cache is to store metadata records in RAM.
From this point of view large page size in combination with moderate cache size (<100000) decreases metadata page read-write operations.

Records per page
Specifying "records per page" as a separate sub-topic seems to be repeating of cache explanation, but this is not true. During recovering of more than 2 thousands databases we have noticed that databases with larger pages size (4k, 8k, ...) had less corruption.

Cluster size
NTFS, for example, use 512 bytes clusters, when you format a logical disk. You can choose another size for clusters, but how it can be related to the page size? Let's compute with an example.

Page size = 4k
cluster size = 2k

Here operating system will read 2 clusters when server tells it to read one 4k page, and writes also 2 clusters when page is being written. There can be an unwanted overhead for the file system, that must store one page using the cluster chain. And, if 2 clusters can be placed by file system at different locations of the disk, there will be of course read slowdown.

Page size = 4k
Cluster size = 8k
Here 2 pages can be in the same cluster. For read operations it may be good, like "pre-loading" of pages, but you can't be sure that server will need second page from the cluster, when it reads first. Write operations will be doubled, because same cluster must be written when each page on it is changed and need to be written to disk.

So, it is better to have page size and cluster size equal.


WarmBooter said...

You need to fix the article... it is broken (cut).

Dmitri Kuzmenko said...

sorry, fixed. just a few words was missing. :-)