Saturday, January 27, 2007

Backgound garbage collection don't work?

InterBase 6.0 introduced background garbage collection, but in active multi-user environments this caused to increase accumulation of garbage in database.
In InterBase 7.1 Borland added 2 parameters, enabling to do something with garbage collection thread, allowing at least to experiment some with this issue. These parameters are
  • SWEEP_QUANTUM - how much server time garbage collector thread will take. Or, as written in documentation, how much records garbage collector will check in one pass
  • SWEEP_YELD_TIME - pause between each garbage collection attempts
Playing with this can help, but not much.

We made special test to understand how server will collect garbage if there are lot of updates of the same data (not the reads). The test was simple - table with 10k rows, and application that update and commit (pressing one button) these 10k rows by one update statement. For each server version we made 15 updates/commits in one row, and then checked statistics with IBAnalyst 2.0 (it has "refresh statistics button" and allow to see data and versions in database).
Also this test allows to check how new garbage collection in Firebird 2.0 works.

The results are:
  • All InterBase and Firebird versions with Superserver architecture and background garbage collection (GCPolicy=background for FB 2.0) left garbage in database, even after long time of waiting. So, we can say that background garbage collection really don't work with default configuration parameters. Of course, not all garbage record versions was left in database, but 70% of garbage left is a huge number.
  • Setting SWEEP_YELD_TIME to 1000 did the best for InterBase 7.5 and 2007 - right after all 15 updates only 10k versions was left.
  • Setting SWEEP_YELD_TIME to 10000 causes IB 7.x/2007 to collect all the garbage after ~1 minute, but only if you do not touch server
  • GCPolicy = combined in Firebird 2.0 works like SWEEP_YIELD_TIME=1000. But this is not the same because IB 7.x/2007 does not have non-background garbage collection at all.
  • GCPolicy = cooperative in Firebird 2.0 doesn't leave any garbage, but makes updates to run slower.
We asked Vlad Horsun, one of the main developers of Firebird, to comment this. And he found interesting thing in Firebird sources (also this test allows us to think that the same thing is true for all InterBase versions from 6.0 to 2007) - when garbage collector thread checks some data for garbage, it fails if there are updates is made, and leaves data not garbage collected.
Vlad also made experimental Firebird 2.0 build with special fix, and our tests went successfully - even with GCPolicy option all of the garbage was collected right after 1-2 seconds of last update/commit.

So, the conclusion is: if you see with IBAnalyst that garbage in your database grows constantly, try to play with configuration parameters - GCPolicy in Firebird 2 and SWEEP_YIELD_TIME in InterBase 7.5/2007. At the same time with Firebird 1/1.5 and InterBase 6.x/7.1 you do not have chances to fix that, and the only solution is to run gfix -sweep on your database from time to time.

p.s. Soon there will be full article with description of sweep, garbage collection, and practical examples.

Friday, January 26, 2007

Do you like null?

"Firebird Null Guide" - nearly complete but boring descrption about nulls. Some things are missing:
  • client-side specific of null processing
  • bugs description does not refer to Firebird bug tracker
  • no examples or links to "by descriptor" UDFs
Anyway, I have not seen anything close to this document.

Wednesday, January 10, 2007

Statistics autoupdate

Several days ago I saw poll about auto-updating index statistics feature for Firebird at firebirdnews.org. Here is the results link.

Well, 28% of voters think that this will be good. Interesting, that I can't say percent of systems where this feature will make performance worse. And can't say also, what percent of those 28% this will affect.

Right now this is not a hard question to make auto-update of index statistics by yourself. You can use gidx tool from
gtools, AT or cron + isql running your handmade script, and so on.
But, the main question is when and how often this auto-update feature must start by itself.

IBAnalyst 2.0 have some thoughts about it in its help file, in Additional Q&A section (10-th Q/A). Here is the summary:
Depending on your applications activity record count in some tables may vary more than 50%. So, statistics update must be made at the moment when data in such tables is very close to mostly useful value. Example: if you reload some table with data several times per day, you need to refresh index statistics for that table only when table is populated with data, not empty. Because, I'm sure, you want performance when you run queries on non-empty table.
But, only you know when your application deletes data from table and fills it with new data. And there will be the best moment to apply SET STATISTICS command.

p.s. Note that index statistics is being built by index. And index keys does not contain transaction numbers, so SET STATISTICS can't understand are those keys belong to deleted data or not.
p.p.s. running SET STATISTICS on all indices in 2gb database takes about 35-45 seconds on AMD 64 3500 and SATA HDD.