This document covers the hash subroutines within the bthash library. If you want to read about btree subroutines, there is a separate document for the btree subroutines.
The hdbck subroutine checks every pointer in the database to make sure it isn't duplicated by another pointer. A good database has one and only one pointer to each record in the overflow area. Every free block has one and only one pointer pointing to it in the overflow area.
If you frequently have hash collisions at the same address, you will want to reorganize the database to minimize the number of reads on the synonym chain. To do this, you need to analyze, statistically, the frequency of accesses per record on the synonym chain. Those records with the highest access should be placed at the beginning of the chain.
The sequence of records on the synonym chain is based on when the records are added to the database. The earliest records are placed at the end of each chain. The most recent overflow record is at the head of the queue.
The next pointer in the base record starts out as zero. When there is overflow, the next pointer in the base record points to the most recent overflow record. The earliest overflow record at the end of the queue has a next pointer of zero.
You can run the lsthsh program in the src directory and sort the records by hash location (column two). When you sort the list, you will see which records hash to the same address. At this point, you can determine which records you want placed at the head of each group of records with the same hashing address.
The reorg program in the src directory changes the size of the hashing space and rehashes all the records into that space. It does not prioritize the overflow records.
The chains program in the src directory allows you to see how long the overflow chains are in the database.
Reserved keys have all binary 0xff. These keys are for deleted records in the overflow space. The records will be reassigned as new records are added to the database.
The hsweep routine tests each key to see if it is reserved and should be bypassed. The hsize routine also bypasses the reserved keys.
Empty records are recognized by their next pointers. Their next pointers are initialized to all 0xff. Once a record is filled with data, its next pointer changes to zero to indicate no overflow. As the first overflow record is added to a given hash address, the next pointer points to the location of the first overflow record.
The same process applies to the next pointer for the last record of each overflow chain. The last record in each overflow chain always has a next pointer of zero.
Since all data in a bthash database is binary, you can store Unicode text in a bthash database. This is also true for Guobiao and Big5 characters, as well as encrypted data.
The following documents provide information about configuring and using the bthash library.