The PGM-index can auto-tune itself to match any given application constraint, either in query time or in space occupancy.
Once you have built the project, you can run the command-line tuner that finds the optimal value for the epsilon parameter when creating the PGM-index.
For example, to get the fastest PGM-index configuration occupying 3 megabytes of space, you would run:
./tuner -s3000000 -U file.bin
Instead, if you need a PGM-index that answers queries in 700±1% nanoseconds and that occupies the least amount of space, you would run:
./tuner -t700 -o0.01 -U file.bin
The input data must be provided as a binary file containing a sorted sequence of signed or unsigned 32- or 64-bit integers. The integer type must be specified with the options -U
, -I
, -u
, -i
, that corresponds to the types uint64
, int64
, uint32
, int32
, respectively.
The input file can be prepared for instance in Python by calling np.sort(a).astype('uint64').tofile('file.bin')
on a NumPy array a
containing your data. The following Python code for example converts a text file containing newline-separated positive integers to the binary format required by the tuner:
import numpy as np
a = np.loadtxt('my_data.txt', dtype=np.uint64)
np.sort(a).tofile('file.bin')
If you need to use the tuner from your code, have a look at the functions minimize_space_given_time
and minimize_time_given_space
in tuner.hpp.
tuner file {OPTIONS}
Space-time trade-off tuner for the PGM-index.
This program lets you specify a maximum space and get the PGM-index minimising
the query time within that space. Or, it lets you specify a maximum query time
and get the PGM-index minimising the space.
OPTIONS:
-h, --help Display this help menu
-o[float], --tol=[float]
Tolerance between 0 and 1 on the constraint (default
0.01)
-r[ratio], --ratio=[ratio]
Ratio of lookups in the query workload (default 0.33)
-v, --verbose Show additional logging info
OPERATION MODES:
-t[ns], --time=[ns] Specify a time to minimise the space
-s[bytes], --space=[bytes]
Specify a space to minimise the time
INPUT DATA OPTIONS:
-U, --u64 Input file contains unsigned 64-bit ints
-I, --i64 Input file contains signed 64-bit ints
-u, --u32 Input file contains unsigned 32-bit ints
-i, --i32 Input file contains signed 32-bit ints
file The file containing the input data
"--" can be used to terminate flag options and force all following arguments
to be treated as positional options