Tuesday, September 16, 2008

FBCon 08 - 25-26-27 september - Bergamo - Italy

You CAN'T miss FBCon 08 - 6th Firebird International Conference that will be held in Bergamo from 25 to 27 september. Conference will be from thursday september 25 till saturday Every day have a special theme with best italian and worldwide firebird-world expert!
Here some of our sessions, for a detail go to www.firebirdconference.net:

Italian sessions :
  • CTE e query ricorsive
  • Database per sysdba – avanzato
  • Database per sysdba – base
  • Database per sysdba – web
  • Delphi, DBExpress e Firebird
  • delphi4php e Firebird
  • GO: Gestionale Open
  • ISAPI
  • Php e Firebird
  • SQL and Firebird
  • SQL enhancements in FB 2.5
  • Sw-ing
  • vTiger

english sessions:
  • 10 ways to backup FB database
  • Common Table Expressions : tutorial
  • Delphi and Firebird
  • Developing Firebird apps for pocketpc/smartphone clients
  • Execute block e trigger (database)
  • Fb{Ref?}Doc - the Firebird Documentation project
  • Fire in The Lamp - FireTube an example for scaling Firebird on web2.0
  • Firebird 2.5 Architecture
  • Firebird Development in 2008/2009
  • Firebird in real time applications
  • Firebird on Linux - advanced topics
  • Firebird on Linux – basics
  • Firebird Stored Procedures in Depth
  • Firebird, tools and Linux distribution integration
  • Installing Firebird with your application under Windows
  • Introduction to development with Firebird and Python
  • Know what is happening in your database and server
  • Managing recursive data structures (trees) with Firebird
  • Migrating from MSSQL and mySQL to Firebird
  • Practical Guide to the Garbage Collection
  • Query optimization - how to speed up COUNT(*) function
  • Replication for a high availability infrastructure
  • Security in Firebird: 2.1, 2.5, 3.0
  • Separating web development from database logic
  • Service Oriented Architecture with Firebird
  • SQL: Migliorare e ottimizzare le prestazioni
  • Stored procedures
  • The Firebird System Tables
  • Towards a REST architecture with Delphi and Firebird
  • Transactions In Your Applications
  • Ultimate Vmware Appliance - Ubuntu and Firebird
  • Updating Database structures
  • Using embedded Firebird with .NET
  • Working with Firebird in Python

To REGISTER go to www.firebirdconference.net

Also note that 24-th September at the same place, before the Conference, there will be FREE IBSurgeon seminar:
  • Introduction
  • Firebird database recovery and protection against corruptions
  • How to find and solve performance problem in Firebird with IBSurgeon products
  • Replication with IBReplicator
  • Using dbFile
p.s. I will be there. My wife told me that I must taste native pizza, pasta, cappuchino, etc.

Sunday, September 14, 2008

Multi-file Database ?

Some last database repair cases we did were with multi-file databases. Interesting, that latest InterBase and Firebird versions was used, and also file system for the storage was NTFS, not FAT32 and FAT16. But, the databases was created and maintained using 1 gigabyte files.

If you still use InterBase 5 and less, stop reading this, please :-)

This is strange, because all InterBase and Firebird errors working with >4gb databases are gone, and also lot of people forgot about FAT32.
Some says that FAT32 is faster than NTFS, so they still use FAT32. Maybe this is true, but personally I don't see any reason to use old file system.
Moreover, once I had interesting case with FAT32: there was a 4gb logical drive with 3gb file. Maybe it was Windows 2000, but anyway, I couldn't copy this file to another logical drive with NTFS - operating system gave me an error when it tried to copy over 2 gigabytes of the file.

Well, maybe XP and Vista can copy 3gb files from FAT32 to NTFS, but multi-file databases have several disadvantages, that does not depend on file system:
  • some people think that multi-file database will have better performance than single-file database, on multi-processor/core computers and using InterBase SMP or Firebird Classic. This is wrong. Server does not do any "parallel" access to the secondary files. It treats multi-file database as one sequential file broken into parts. So, if table pages spreaded in several database files, table scan will scan these files one by one.
  • Multi-file databases have hard-links to the secondary files inside it, and it is impossible to move such a database from one logical disk to another (for example, from d: to e:). If you want to do this, you need to make backup and restore, or get hex-editor and hack this hardcoded drive letter and path in the header pages of the mult-file database. Single-file databases does not have this problem at all.
  • You need to watch over mult-file database's size, and from time to time, while database is growing, create additional secondary file with appropriate size. We've seen lot of databases with first files having some equal size, and the last file larger than other - DBA forgot to create additional secondary file at right time.
  • Secondary file size can be specified in 2 ways. Using STARTING AT, when you specify from what page number server will continue database in the next file, and using LENGTH, where you specify secondary file size in pages. The best way is to use LENGTH parameter for all files of the multi-tile database, including primary file and secondary files. But if you will start to mix STARTING AT and LENGTH you can loose understanding what size secondary files will have, and get multi-file database with different file sizes.
  • Backup and restore automated scripts must be updated from time to time to include additional secondary file, because there is no command-line parameters for gbak that could do automated restore of the database to create unknown number of secondary files. Each secondary file and it's size must be specified at the command line.
    This is really a headache for the DBA.
I think I gave you enough reasons to stop using multi-file databases, if you still do this. If you use InterBase 7, 2007, 2009, Firebird 1.0, 1.5, 2.x - at the nearest restore make your database single file, and forget about multi-file database completely.

upd: need to say that the well known (at least for us) database corruption for IB 5.x is when DBA forgets to add new secondary file, and the last secondary file comes to 4gb size.

Monday, September 01, 2008

What page size I should use in my database?

It's a common question with interesting background from developers . As I can suppose from IBSurgeon statistics, 99% of Firebird and InterBase developers use default page size: since the old ages it is 1024 bytes, and only for Firebird 2.0 it is changed to 4096 by default.
There are 4 main things related with page size: indices depth, database cache size, records per page quantity and disk cluster size.

The shortest answer to this question is to use 4k, 8k or 16k page size. That's it.
If you want to go a bit deeper, read the following:

Index depth
Indices depth is quite obvious thing - since the basic internal structure of the indices is a B-tree, index depth depends on page size, and too small page size (1024-2048) will produce 4-levels trees even at moderate amounts of data (several hundred thousands of integer-based
keys). It's recommended that index depth should be no more than 3. (you can check indices depth using IBAnalyst... and some other things too).
For tables with hundred millions records page size should be the largest. For example, if there is a 250Gb database contained billing information, so it must have page size = 16384.

Database cache
You probably know that Firebird and InterBase has database cache, it is counted as buffers * page size (in megabytes).
Rookie developers think that the more database cache is the better it work (another idea is "I want to use all my RAM").

The problem that mostly popular Windows versions of Firebird (and all InterBase) are 32-bit, so in theory there is only 4Gb of addressed memory available to use. Actually Windows restricts this size maximum to 2Gb per process by default (it can be changed to 3Gb in config.ini).
So, if you set 100000 buffers (using SuperServer, not Classic, of course) in the firebird.conf
DefaultDbCachePages = 100000
or in ibconfig
DATABASE_CACHE_PAGES 100000
and each page is 16k, the cache will consume 1.6Gb (at
first connection). And it will prevent to open the second database due to arithmetic fact 3Gb-1.6Gb = 1.4.
Of course, this applies mostly for SuperServer, due to the fact that Classic (Firebird only) allocates buffers for each connection (1.6Gb per connection is too much, really), not per database (regardless of connections).
And the most interesting thing that allocation of such a big cache does not bring a boost in performance - in some test it even decreases performance. Actually engine effectively "collaborates" with operation system's cache and allows OS to cache pieces of data being often read and write, and the main (but not only) purpose of cache is to store metadata records in RAM.
From this point of view large page size in combination with moderate cache size (<100000) decreases metadata page read-write operations.

Records per page
Specifying "records per page" as a separate sub-topic seems to be repeating of cache explanation, but this is not true. During recovering of more than 2 thousands databases we have noticed that databases with larger pages size (4k, 8k, ...) had less corruption.

Cluster size
NTFS, for example, use 512 bytes clusters, when you format a logical disk. You can choose another size for clusters, but how it can be related to the page size? Let's compute with an example.

Page size = 4k
cluster size = 2k

Here operating system will read 2 clusters when server tells it to read one 4k page, and writes also 2 clusters when page is being written. There can be an unwanted overhead for the file system, that must store one page using the cluster chain. And, if 2 clusters can be placed by file system at different locations of the disk, there will be of course read slowdown.

Page size = 4k
Cluster size = 8k
Here 2 pages can be in the same cluster. For read operations it may be good, like "pre-loading" of pages, but you can't be sure that server will need second page from the cluster, when it reads first. Write operations will be doubled, because same cluster must be written when each page on it is changed and need to be written to disk.

So, it is better to have page size and cluster size equal.