The Piecewise Geometric Model index (PGM-index) is a novel indexing data structure that is the result of the research pursued by Paolo Ferragina and Giorgio Vinciguerra at the University of Pisa.

Unlike traditional tree-based indexes that are blind to the regularities in the data distribution, the PGM-index exploits a learned mapping between the data and its location in memory. The succinctness of this mapping, coupled with a peculiar recursive construction algorithm, makes the PGM-index a space- and time-efficient data structure that dominates traditional indexes by orders of magnitude, as we show in the paper.

Given the increasing demand for data systems that accommodate a wide range of devices and applications, we believe that data structures should be able to self-adapt to the constraints imposed by their context of use. We call this line of research multicriteria learned data structures. We made one step towards this direction by proposing a version of the PGM-index that can efficiently tune itself to any given constraint, either in query time or in space usage.

With the release of our implementation of the PGM-index, we wish to foster the research and the adoption of novel data structures based on the above design principles (see also the survey on learned indexes). In the following, we provide a guide on how to build and how to use the code.

Building the code

Instructions on how to build the source code of the PGM-index.

Using the tuner

User manual for the multicriteria tuner.

Using the benchmark

User manual for the benchmark program.

C++ Reference

C++ API reference for the PGM-index.

Python Reference

Python API reference for the PyGM package.

Computational complexity

Discussion on the space and time complexity of the PGM-index.