Saturday, January 27, 2007

Backgound garbage collection don't work?

InterBase 6.0 introduced background garbage collection, but in active multi-user environments this caused to increase accumulation of garbage in database.
In InterBase 7.1 Borland added 2 parameters, enabling to do something with garbage collection thread, allowing at least to experiment some with this issue. These parameters are
  • SWEEP_QUANTUM - how much server time garbage collector thread will take. Or, as written in documentation, how much records garbage collector will check in one pass
  • SWEEP_YELD_TIME - pause between each garbage collection attempts
Playing with this can help, but not much.

We made special test to understand how server will collect garbage if there are lot of updates of the same data (not the reads). The test was simple - table with 10k rows, and application that update and commit (pressing one button) these 10k rows by one update statement. For each server version we made 15 updates/commits in one row, and then checked statistics with IBAnalyst 2.0 (it has "refresh statistics button" and allow to see data and versions in database).
Also this test allows to check how new garbage collection in Firebird 2.0 works.

The results are:
  • All InterBase and Firebird versions with Superserver architecture and background garbage collection (GCPolicy=background for FB 2.0) left garbage in database, even after long time of waiting. So, we can say that background garbage collection really don't work with default configuration parameters. Of course, not all garbage record versions was left in database, but 70% of garbage left is a huge number.
  • Setting SWEEP_YELD_TIME to 1000 did the best for InterBase 7.5 and 2007 - right after all 15 updates only 10k versions was left.
  • Setting SWEEP_YELD_TIME to 10000 causes IB 7.x/2007 to collect all the garbage after ~1 minute, but only if you do not touch server
  • GCPolicy = combined in Firebird 2.0 works like SWEEP_YIELD_TIME=1000. But this is not the same because IB 7.x/2007 does not have non-background garbage collection at all.
  • GCPolicy = cooperative in Firebird 2.0 doesn't leave any garbage, but makes updates to run slower.
We asked Vlad Horsun, one of the main developers of Firebird, to comment this. And he found interesting thing in Firebird sources (also this test allows us to think that the same thing is true for all InterBase versions from 6.0 to 2007) - when garbage collector thread checks some data for garbage, it fails if there are updates is made, and leaves data not garbage collected.
Vlad also made experimental Firebird 2.0 build with special fix, and our tests went successfully - even with GCPolicy option all of the garbage was collected right after 1-2 seconds of last update/commit.

So, the conclusion is: if you see with IBAnalyst that garbage in your database grows constantly, try to play with configuration parameters - GCPolicy in Firebird 2 and SWEEP_YIELD_TIME in InterBase 7.5/2007. At the same time with Firebird 1/1.5 and InterBase 6.x/7.1 you do not have chances to fix that, and the only solution is to run gfix -sweep on your database from time to time.

p.s. Soon there will be full article with description of sweep, garbage collection, and practical examples.

No comments: