kripodb.hdf5¶
Similarity matrix using hdf5 as storage backend.
-
class
kripodb.hdf5.
AbstractSimpleTable
(table, append_chunk_size=100000000)[source]¶ Abstract wrapper around a HDF5 table
Parameters: - table (tables.Table) – HDF5 table
- append_chunk_size (int) – Size of chunk to append in one go. Defaults to 1e8, which when table description is 10bytes will require 2Gb during append.
- Attributes
- table (tables.Table): HDF5 table append_chunk_size (int): Number of rows to read from other table during append.
-
class
kripodb.hdf5.
LabelsLookup
(h5file, expectedrows=0)[source]¶ Table to look up label of fragment by id or id of fragment by label
When table does not exist in h5file it is created.
Parameters: - h5file (tables.File) – Object representing an open hdf5 file
- expectedrows (int) – Expected number of pairs to be added. Required when similarity matrix is opened in write mode, helps optimize storage
-
by_id
(frag_id)[source]¶ Look up label of fragment by id
Parameters: frag_id (int) – Fragment identifier Raises: IndexError
– When id of fragment is not foundReturns: Label of fragment Return type: str
-
by_label
(label)[source]¶ Look up id of fragment by label
Parameters: label (str) – Fragment label Raises: IndexError
– When label of fragment is not foundReturns: Fragment identifier Return type: int
-
by_labels
(labels)[source]¶ Look up ids of fragments by label
Parameters: labels (set[str]) – Set of fragment labels Raises: IndexError
– When label of fragment is not foundReturns: Set of fragment identifiers Return type: set[int]
-
keep
(other, keep)[source]¶ Copy content of self to other and only keep given fragment identifiers
Parameters: - other (LabelsLookup) – Labels table to fill
- keep (set[int]) – Fragment identifiers to keep
-
label2ids
()[source]¶ Return whole table as a dictionary
Returns: Dictionary with label as key and frag_id as value. Return type: dict
-
merge
(label2id)[source]¶ Merge label2id dict into self
When label does not exists an id is generated and the label/id is added. When label does exist the id of the label in self is kept.
Parameters: label2id (dict]) – Dictionary with fragment label as key and fragment identifier as value. Returns: Dictionary of label/id which where in label2id, but missing in self Return type: dict
-
skip
(other, skip)[source]¶ Copy content of self to other and skip given fragment identifiers
Parameters: - other (LabelsLookup) – Labels table to fill
- skip (set[int]) – Fragment identifiers to skip
-
class
kripodb.hdf5.
PairsTable
(h5file, expectedrows=0)[source]¶ Tabel to store similarity score of a pair of fragment fingerprints
When table does not exist in h5file it is created.
Parameters: - h5file (tables.File) – Object representing an open hdf5 file
- expectedrows (int) – Expected number of pairs to be added. Required when similarity matrix is opened in write mode, helps optimize storage
-
score_precision
¶ int – Similarity score is a fraction, the score is converted to an int by multiplying it with the precision
-
full_matrix
¶ bool – Matrix is filled above and below diagonal.
-
append
(other)[source]¶ Append rows of other table to self
Parameters: other – Table of same type as self
-
count
(frame_size, raw_score=False)[source]¶ Count occurrences of each score
Parameters: Returns: Score and number of occurrences
Return type:
-
find
(frag_id, cutoff, limit)[source]¶ Find fragment hits which has a similarity score with frag_id above cutoff.
Parameters: Returns: Where first tuple value is hit fragment identifier and second value is similarity score
Return type: List[Tuple]
-
keep
(other, keep)[source]¶ Copy pairs from self to other and keep given fragment identifiers and the identifiers they pair with.
Parameters: - other (PairsTable) – Pairs table to fill
- keep (set[int]) – Fragment identifiers to keep
Returns: Fragment identifiers that have been copied to other
Return type:
-
skip
(other, skip)[source]¶ Copy content from self to other and skip given fragment identifiers
Parameters: - other (PairsTable) – Pairs table to fill
- skip (set[int]) – Fragment identifiers to skip
-
class
kripodb.hdf5.
SimilarityMatrix
(filename, mode='r', expectedpairrows=None, expectedlabelrows=None, cache_labels=False, **kwargs)[source]¶ Similarity matrix
Parameters: - filename (str) – File name of hdf5 file to write or read similarity matrix from
- mode (str) – Can be ‘r’ for reading or ‘w’ for writing
- expectedpairrows (int) – Expected number of pairs to be added. Required when similarity matrix is opened in write mode, helps optimize storage
- expectedlabelrows (int) – Expected number of labels to be added. Required when similarity matrix is opened in write mode, helps optimize storage
- cache_labels (bool) – Cache labels, speed up label lookups
-
h5file
¶ tables.File – Object representing an open hdf5 file
-
pairs
¶ PairsTable – HDF5 Table that contains pairs
-
labels
¶ LabelsLookup – Table to look up label of fragment by id or id of fragment by label
-
append
(other)[source]¶ Append data from other similarity matrix to me
Parameters: other (SimilarityMatrix) – Other similarity matrix
-
count
(frame_size, raw_score=False, lower_triangle=False)[source]¶ Count occurrences of each score
Parameters: Returns: Score and number of occurrences
Return type:
-
find
(query, cutoff, limit=None)[source]¶ Find similar fragments to query.
Parameters: Yields: (str, float) – Hit fragment idenfier and similarity score
-
keep
(other, keep)[source]¶ Copy content of self to other and only keep given fragment labels and the labels they pair with
Parameters: - other (SimilarityMatrix) – Writable matrix to fill
- keep (set[str]) – Fragment labels to keep
-
skip
(other, skip)[source]¶ Copy content of self to other and skip all given fragment labels
Parameters: - other (SimilarityMatrix) – Writable matrix to fill
- skip (set[str]) – Fragment labels to skip
-
update
(similarities_iter, label2id)[source]¶ Store pairs of fragment identifier with their similarity score and label 2 id lookup
Parameters: - similarities_iter (iterator) – Iterator which yields (label1, label2, similarity_score)
- label2id (dict) – Dictionary with fragment label as key and fragment identifier as value.