This is the documentation for the PGM‑index C++ library.
Members | Descriptions |
---|---|
class pgm::PGMIndex |
A space-efficient index that enables fast search operations on a sorted sequence of n numbers. |
class pgm::DynamicPGMIndex |
A sorted associative container that contains key-value pairs with unique keys. |
class pgm::CompressedPGMIndex |
A variant of PGMIndex that uses compression on the segments to reduce the space of the index. |
class pgm::OneLevelPGMIndex |
A variant of PGMIndex that does not build a recursive structure but uses a binary search on the segments. |
class pgm::BucketingPGMIndex |
A simple variant of OneLevelPGMIndex that builds a top-level lookup table to speed up the search on the segments. |
class pgm::EliasFanoPGMIndex |
A variant of OneLevelPGMIndex that builds a top-level succinct structure to speed up the search on the segments. |
class pgm::MappedPGMIndex |
A disk-backed container storing a sorted sequence of numbers and a PGMIndex for fast search operations. |
class pgm::MultidimensionalPGMIndex |
A multidimensional container that uses a PGMIndex for fast orthogonal range queries. |
struct pgm::ApproxPos |
A struct that stores the result of a query to a PGMIndex, that is, a range [lo, hi) centered around an approximate position pos of the sought key. |
pgm::PGMIndex
A space-efficient index that enables fast search operations on a sorted sequence of n
numbers.
A search returns a struct ApproxPos containing an approximate position of the sought key in the sequence and the bounds of a range of size 2*Epsilon+1 where the sought key is guaranteed to be found if present. If the key is not present, the range is guaranteed to contain a key that is not less than (i.e. greater or equal to) the sought key, or n
if no such key is found. In the case of repeated keys, the index finds the position of the first occurrence of a key.
The Epsilon
template parameter should be set according to the desired space-time trade-off. A smaller value makes the estimation more precise and the range smaller but at the cost of increased space usage.
Internally the index uses a succinct piecewise linear mapping from keys to their position in the sorted order. This mapping is represented as a sequence of linear models (segments) which, if EpsilonRecursive
is not zero, are themselves recursively indexed by other piecewise linear mappings.
Parameters:
K
the type of the indexed keys
Epsilon
controls the size of the returned search range
EpsilonRecursive
controls the size of the search range in the internal structure
Floating
the floating-point type to use for slopes
Members | Descriptions |
---|---|
public PGMIndex () = default |
Constructs an empty index. |
public explicit PGMIndex (const std::vector<K> & data) |
Constructs the index on the given sorted vector. |
public template<> inline PGMIndex (RandomIt first,RandomIt last) |
Constructs the index on the sorted keys in the range [first, last). |
public ApproxPos search (const K & key) const |
Returns the approximate position and the range where key can be found. |
public size_t segments_count () const |
Returns the number of segments in the last level of the index. |
public size_t height () const |
Returns the number of levels of the index. |
public size_t size_in_bytes () const |
Returns the size of the index in bytes. |
public
PGMIndex
() = default
Constructs an empty index.
public explicit
PGMIndex
(const std::vector<K> & data)
Constructs the index on the given sorted vector.
Parameters:
data
the vector of keys to be indexed, must be sortedpublic template<>
inline
PGMIndex
(RandomIt first,RandomIt last)
Constructs the index on the sorted keys in the range [first, last).
Parameters:
first
last
the range containing the sorted keys to be indexedpublic
ApproxPos
search
(const K & key) const
Returns the approximate position and the range where key
can be found.
Parameters:
key
the value of the element to search forReturns: a struct with the approximate position and bounds of the range
public size_t
segments_count
() const
Returns the number of segments in the last level of the index.
Returns: the number of segments
public size_t
height
() const
Returns the number of levels of the index.
Returns: the number of levels of the index
public size_t
size_in_bytes
() const
Returns the size of the index in bytes.
Returns: the size of the index in bytes
pgm::DynamicPGMIndex
A sorted associative container that contains key-value pairs with unique keys.
Parameters:
K
the type of a key
V
the type of a value
PGMType
the type of PGMIndex to use in the container
MinIndexedLevel
the minimum level (of size 2^MinIndexedLevel) on which a PGMType
index is constructed
Members | Descriptions |
---|---|
public DynamicPGMIndex () |
Constructs an empty container. |
public template<> inline DynamicPGMIndex (Iterator first,Iterator last) |
Constructs the container on the sorted data in the range [first, last). |
public void insert_or_assign (const K & key,const V & value) |
Inserts an element into the container if key does not exists in the container. If key already exists, the corresponding value is updated with value . |
public void erase (const K & key) |
Removes the specified element from the container. |
public iterator find (const K & key) const |
Finds an element with key equivalent to key . |
public iterator lower_bound (const K & key) const |
Returns an iterator pointing to the first element that is not less than (i.e. greater or equal to) key . |
public bool empty () const |
Checks if the container has no elements, i.e. whether begin() == end(). |
public iterator begin () const |
Returns an iterator to the beginning. |
public iterator end () const |
Returns an iterator to the end. |
public size_t count (const K & key) const |
Returns the number of elements with key that compares equal to the specified argument key, which is either 1 or 0 since this container does not allow duplicates. |
public size_t size () const |
Returns the number of elements in the container. |
public size_t size_in_bytes () const |
Returns the size of the container in bytes. |
public size_t index_size_in_bytes () const |
Returns the size of the index used in this container in bytes. |
typedef key_type |
|
typedef mapped_type |
|
typedef value_type |
|
typedef size_type |
|
typedef iterator |
public
DynamicPGMIndex
()
Constructs an empty container.
public template<>
inline
DynamicPGMIndex
(Iterator first,Iterator last)
Constructs the container on the sorted data in the range [first, last).
Parameters:
Iterator
Parameters:
first
last
the range containing the sorted elements to be indexedpublic void
insert_or_assign
(const K & key,const V & value)
Inserts an element into the container if key
does not exists in the container. If key
already exists, the corresponding value is updated with value
.
Parameters:
key
element key to insert or update
value
element value to insert
public void
erase
(const K & key)
Removes the specified element from the container.
Parameters:
key
key value of the element to removepublic
iterator
find
(const K & key) const
Finds an element with key equivalent to key
.
Parameters:
key
key value of the element to search forReturns:
an iterator to an element with key equivalent to key
. If no such element is found, end() is returned
public
iterator
lower_bound
(const K & key) const
Returns an iterator pointing to the first element that is not less than (i.e. greater or equal to) key
.
Parameters:
key
key value to compare the elements toReturns:
an iterator to an element with key not less than key
. If no such element is found, end() is returned
public bool
empty
() const
Checks if the container has no elements, i.e. whether begin() == end().
Returns: true if the container is empty, false otherwise
public
iterator
begin
() const
Returns an iterator to the beginning.
Returns: an iterator to the beginning
public
iterator
end
() const
Returns an iterator to the end.
Returns: an iterator to the end
public size_t
count
(const K & key) const
Returns the number of elements with key that compares equal to the specified argument key, which is either 1 or 0 since this container does not allow duplicates.
Parameters:
key
key value of the elements to countReturns: number of elements with the given key, which is either 1 or 0.
public size_t
size
() const
Returns the number of elements in the container.
Returns: the number of elements in the container
public size_t
size_in_bytes
() const
Returns the size of the container in bytes.
Returns: the size of the container in bytes
public size_t
index_size_in_bytes
() const
Returns the size of the index used in this container in bytes.
Returns: the size of the index used in this container in bytes
typedef
key_type
typedef
mapped_type
typedef
value_type
typedef
size_type
typedef
iterator
pgm::CompressedPGMIndex
A variant of PGMIndex that uses compression on the segments to reduce the space of the index.
Parameters:
K
the type of the indexed keys
Epsilon
controls the size of the returned search range
EpsilonRecursive
controls the size of the search range in the internal structure
Floating
the floating-point type to use for slopes
Members | Descriptions |
---|---|
public CompressedPGMIndex () = default |
Constructs an empty index. |
public explicit CompressedPGMIndex (const std::vector<K> & data) |
Constructs the compressed index on the given sorted vector. |
public template<> inline CompressedPGMIndex (Iterator first,Iterator last) |
Constructs the compressed index on the sorted elements in the range [first, last). |
public size_t size_in_bytes () const |
Returns the size of the index in bytes. |
public ApproxPos search (const K & key) const |
Returns the approximate position and the range where key can be found. |
public size_t segments_count () const |
Returns the number of segments in the last level of the index. |
public size_t height () const |
Returns the number of levels of the index. |
public
CompressedPGMIndex
() = default
Constructs an empty index.
public explicit
CompressedPGMIndex
(const std::vector<K> & data)
Constructs the compressed index on the given sorted vector.
Parameters:
data
the vector of elements to be indexed, must be sortedpublic template<>
inline
CompressedPGMIndex
(Iterator first,Iterator last)
Constructs the compressed index on the sorted elements in the range [first, last).
Parameters:
first
last
the range containing the sorted elements to be indexedpublic size_t
size_in_bytes
() const
Returns the size of the index in bytes.
Returns: the size of the index in bytes
public
ApproxPos
search
(const K & key) const
Returns the approximate position and the range where key
can be found.
Parameters:
key
the value of the element to search forReturns: a struct with the approximate position and bounds of the range
public size_t
segments_count
() const
Returns the number of segments in the last level of the index.
Returns: the number of segments
public size_t
height
() const
Returns the number of levels of the index.
Returns: the number of levels of the index
pgm::OneLevelPGMIndex
A variant of PGMIndex that does not build a recursive structure but uses a binary search on the segments. It is defined as an alias for PGMIndex where the EpsilonRecursive
template argument is set to zero.
pgm::BucketingPGMIndex
A simple variant of OneLevelPGMIndex that builds a top-level lookup table to speed up the search on the segments.
The size of the top-level table is specified as a constructor argument. Additionally, the TopLevelBitSize
template argument allows to specify the bit-size of the memory cells in the top-level table. It can be set either to a power of two or to 0. If set to 0, the bit-size of the cells will be determined dynamically so that the table is bit-compressed.
Parameters:
K
the type of the indexed keys
Epsilon
controls the size of the returned search range
TopLevelBitSize
the bit-size of the cells in the top-level table, must be either 0 or a power of two
Floating
the floating-point type to use for slopes
Members | Descriptions |
---|---|
public BucketingPGMIndex () = default |
Constructs an empty index. |
public BucketingPGMIndex (const std::vector<K> & data,size_t top_level_size) |
Constructs the index on the given sorted vector, with the specified top level size. |
public template<> inline BucketingPGMIndex (RandomIt first,RandomIt last,size_t top_level_size) |
Constructs the index on the sorted keys in the range [first, last), with the specified top level size. |
public BucketingPGMIndex (const std::vector<K> & data) |
Constructs the index on the given sorted vector, with a dynamically-set top level size. |
public template<> inline BucketingPGMIndex (RandomIt first,RandomIt last) |
Constructs the index on the sorted keys in the range [first, last), with a dynamically-set top level size. |
public ApproxPos search (const K & key) const |
Returns the approximate position and the range where key can be found. |
public size_t segments_count () const |
Returns the number of segments in the last level of the index. |
public size_t height () const |
Returns the number of levels of the index. |
public size_t size_in_bytes () const |
Returns the size of the index in bytes. |
public
BucketingPGMIndex
() = default
Constructs an empty index.
public
BucketingPGMIndex
(const std::vector<K> & data,size_t top_level_size)
Constructs the index on the given sorted vector, with the specified top level size.
Parameters:
data
the vector of keys, must be sorted
top_level_size
the number of cells allocated for the top-level table
public template<>
inline
BucketingPGMIndex
(RandomIt first,RandomIt last,size_t top_level_size)
Constructs the index on the sorted keys in the range [first, last), with the specified top level size.
Parameters:
first
last
the range containing the sorted keys to be indexed
top_level_size
the number of cells allocated for the top-level table
public
BucketingPGMIndex
(const std::vector<K> & data)
Constructs the index on the given sorted vector, with a dynamically-set top level size.
Parameters:
data
the vector of keys, must be sortedpublic template<>
inline
BucketingPGMIndex
(RandomIt first,RandomIt last)
Constructs the index on the sorted keys in the range [first, last), with a dynamically-set top level size.
Parameters:
first
last
the range containing the sorted keys to be indexedpublic
ApproxPos
search
(const K & key) const
Returns the approximate position and the range where key
can be found.
Parameters:
key
the value of the element to search forReturns: a struct with the approximate position and bounds of the range
public size_t
segments_count
() const
Returns the number of segments in the last level of the index.
Returns: the number of segments
public size_t
height
() const
Returns the number of levels of the index.
Returns: the number of levels of the index
public size_t
size_in_bytes
() const
Returns the size of the index in bytes.
Returns: the size of the index in bytes
pgm::EliasFanoPGMIndex
A variant of OneLevelPGMIndex that builds a top-level succinct structure to speed up the search on the segments.
Parameters:
K
the type of the indexed keys
Epsilon
controls the size of the returned search range
Floating
the floating-point type to use for slopes
Members | Descriptions |
---|---|
public EliasFanoPGMIndex () = default |
Constructs an empty index. |
public explicit EliasFanoPGMIndex (const std::vector<K> & data) |
Constructs the index on the given sorted vector. |
public template<> inline EliasFanoPGMIndex (RandomIt first,RandomIt last) |
Constructs the index on the sorted keys in the range [first, last). |
public ApproxPos search (const K & key) const |
Returns the approximate position and the range where key can be found. |
public size_t segments_count () const |
Returns the number of segments in the last level of the index. |
public size_t height () const |
Returns the number of levels of the index. |
public size_t size_in_bytes () const |
Returns the size of the index in bytes. |
public
EliasFanoPGMIndex
() = default
Constructs an empty index.
public explicit
EliasFanoPGMIndex
(const std::vector<K> & data)
Constructs the index on the given sorted vector.
Parameters:
data
the vector of keys, must be sortedpublic template<>
inline
EliasFanoPGMIndex
(RandomIt first,RandomIt last)
Constructs the index on the sorted keys in the range [first, last).
Parameters:
first
last
the range containing the sorted keys to be indexedpublic
ApproxPos
search
(const K & key) const
Returns the approximate position and the range where key
can be found.
Parameters:
key
the value of the element to search forReturns: a struct with the approximate position and bounds of the range
public size_t
segments_count
() const
Returns the number of segments in the last level of the index.
Returns: the number of segments
public size_t
height
() const
Returns the number of levels of the index.
Returns: the number of levels of the index
public size_t
size_in_bytes
() const
Returns the size of the index in bytes.
Returns: the size of the index in bytes
pgm::MappedPGMIndex
class pgm::MappedPGMIndex
: public pgm::PGMIndex< K, Epsilon, 4, float >
A disk-backed container storing a sorted sequence of numbers and a PGMIndex for fast search operations.
Parameters:
K
the type of the indexed keys
Epsilon
controls the size of the returned search range
EpsilonRecursive
controls the size of the search range in the internal structure
Floating
the floating-point type to use for slopes
Members | Descriptions |
---|---|
public template<> inline MappedPGMIndex (RandomIt first,RandomIt last,const std::string & out_filename) |
Constructs a new disk-backed container on the sorted keys in the range [first, last). |
public MappedPGMIndex (const std::string & in_filename,const std::string & out_filename) |
Constructs a new disk-backed container on the sorted keys from the given input file. |
public explicit MappedPGMIndex (const std::string & in_filename) |
Loads a disk-backed container from the given file. |
public ~MappedPGMIndex () |
Destructs the object and closes the file backing the container. |
public bool contains (const K & key) const |
Checks if there is an element with key equivalent to key in the container. |
public auto lower_bound (const K & key) const |
Returns an iterator pointing to the first element that is not less than (i.e. greater or equal to) key . |
public auto upper_bound (const K & key) const |
Returns an iterator pointing to the first element that is greater than key . |
public size_t count (const K & key) const |
Returns the number of elements with key equal to the specified argument. |
public size_t size () const |
Returns the number of elements in the container. |
public size_t file_size_in_bytes () const |
Returns the size in bytes of the file backing the container. |
public auto begin () const |
Returns an iterator to the first element of the container. |
public auto end () const |
Returns an iterator to the element following the last element of the container. |
public template<>
inline
MappedPGMIndex
(RandomIt first,RandomIt last,const std::string & out_filename)
Constructs a new disk-backed container on the sorted keys in the range [first, last).
Parameters:
first
last
the range containing the sorted keys to copy
out_filename
the name of the output file
public
MappedPGMIndex
(const std::string & in_filename,const std::string & out_filename)
Constructs a new disk-backed container on the sorted keys from the given input file.
The in_filename
must be a binary file containing a sorted sequence numbers of type K
with the same endianness of the CPU.
Parameters:
in_filename
the name of the input file
out_filename
the name of the output file
public explicit
MappedPGMIndex
(const std::string & in_filename)
Loads a disk-backed container from the given file.
Parameters:
in_filename
the name of the input filepublic
~MappedPGMIndex
()
Destructs the object and closes the file backing the container.
public bool
contains
(const K & key) const
Checks if there is an element with key equivalent to key
in the container.
Parameters:
key
the value of the element to search forReturns:
true
if there is such an element, otherwise false
public auto
lower_bound
(const K & key) const
Returns an iterator pointing to the first element that is not less than (i.e. greater or equal to) key
.
Parameters:
key
value to compare the elements toReturns:
iterator to the first element that is not less than key
, or end() if no such element is found
public auto
upper_bound
(const K & key) const
Returns an iterator pointing to the first element that is greater than key
.
Parameters:
key
value to compare the elements toReturns:
iterator to the first element that is greater than key
, or end() if no such element is found
public size_t
count
(const K & key) const
Returns the number of elements with key equal to the specified argument.
Parameters:
key
value of the elements to countReturns:
the number of elements with key equal to key
public size_t
size
() const
Returns the number of elements in the container.
Returns: the number of elements in the container
public size_t
file_size_in_bytes
() const
Returns the size in bytes of the file backing the container.
Returns: the size in bytes of the file backing the container
public auto
begin
() const
Returns an iterator to the first element of the container.
Returns: an iterator to the first element of the container
public auto
end
() const
Returns an iterator to the element following the last element of the container.
Returns: an iterator to the element following the last element of the container
pgm::MultidimensionalPGMIndex
A multidimensional container that uses a PGMIndex for fast orthogonal range queries.
Parameters:
Dimensions
the number of fields/dimensions
T
the type of the stored elements
Epsilon
the Epsilon parameter for the internal PGMIndex
EpsilonRecursive
the EpsilonRecursive parameter for the internal PGMIndex
Floating
the Floating parameter for the internal PGMIndex
Members | Descriptions |
---|---|
public MultidimensionalPGMIndex () = default |
Constructs an empty multidimensional container. |
public template<> inline MultidimensionalPGMIndex (RandomIt first,RandomIt last) |
Constructs the multidimensional container with the elements in the range [first, last). |
public size_t size_in_bytes () const |
Returns the size of the index in bytes. |
public bool contains (const value_type & p) |
Checks if there is an element equal to p in the container. |
public iterator begin () const |
Returns an iterator to the first element of the container. |
public iterator end () const |
Returns an iterator to the element following the last element of the container. |
public iterator range (const value_type & min,const value_type & max) |
Returns an iterator pointing to an element satisfying an orthogonal range query. |
typedef iterator |
|
typedef size_type |
|
typedef value_type |
public
MultidimensionalPGMIndex
() = default
Constructs an empty multidimensional container.
public template<>
inline
MultidimensionalPGMIndex
(RandomIt first,RandomIt last)
Constructs the multidimensional container with the elements in the range [first, last).
Parameters:
first
last
the range to copy the elements frompublic size_t
size_in_bytes
() const
Returns the size of the index in bytes.
Returns: the size of the index in bytes
public bool
contains
(const
value_type
& p)
Checks if there is an element equal to p
in the container.
Parameters:
p
the element to search forReturns:
true
if there is such an element, false
otherwise
public
iterator
begin
() const
Returns an iterator to the first element of the container.
Returns: an iterator to the first element of the container
public
iterator
end
() const
Returns an iterator to the element following the last element of the container.
Returns: an iterator to the element following the last element of the container
public
iterator
range
(const
value_type
& min,const
value_type
& max)
Returns an iterator pointing to an element satisfying an orthogonal range query.
Equivalently, returns an iterator to an element lying inside the hyperrectangle defined by the extreme points min
and max
.
Successive elements can be retrieved by calling operator++()
on the returned iterator it
until it
!= end().
Parameters:
min
the lower extreme of the query hyperrectangle
max
the upper extreme of the query hyperrectangle, must be greater than or equal to min.
Returns: an iterator pointing to an element inside the query hyperrectangle
typedef
iterator
typedef
size_type
typedef
value_type
pgm::ApproxPos
A struct that stores the result of a query to a PGMIndex, that is, a range [lo, hi) centered around an approximate position pos of the sought key.
Members | Descriptions |
---|---|
public size_t pos |
The approximate position of the key. |
public size_t lo |
The lower bound of the range. |
public size_t hi |
The upper bound of the range. |
public size_t
pos
The approximate position of the key.
public size_t
lo
The lower bound of the range.
public size_t
hi
The upper bound of the range.
pgm::MultidimensionalPGMIndex::RangeIterator
Members | Descriptions |
---|---|
public RangeIterator (internal_iterator it) |
|
public RangeIterator (internal_iterator it,const value_type & p) |
|
public RangeIterator (const decltype(super) super,const value_type & min,const value_type & max) |
|
public RangeIterator & operator++ () |
|
public RangeIterator operator++ (int) |
|
public reference operator* () const |
|
public pointer operator-> () const |
|
public bool operator== (const iterator & rhs) const |
|
public bool operator!= (const iterator & rhs) const |
|
typedef value_type |
|
typedef difference_type |
|
typedef pointer |
|
typedef reference |
|
typedef iterator_category |
public
RangeIterator
(internal_iterator it)
public
RangeIterator
(internal_iterator it,const
value_type
& p)
public
RangeIterator
(const decltype(super) super,const
value_type
& min,const
value_type
& max)
public
RangeIterator
&
operator++
()
public
RangeIterator
operator++
(int)
public reference
operator*
() const
public pointer
operator->
() const
public bool
operator==
(const
iterator
& rhs) const
public bool
operator!=
(const
iterator
& rhs) const
typedef
value_type
typedef
difference_type
typedef
pointer
typedef
reference
typedef
iterator_category