kripodb.pairs¶
Module handling generation and retrieval of similarity of fingerprint pairs
-
kripodb.pairs.
dump_pairs
(bitsets1, bitsets2, out_format, out_file, out, number_of_bits, mean_onbit_density, cutoff, label2id, nomemory, ignore_upper_triangle=False)[source]¶ Dump pairs of bitset collection.
A pairs are rows of the bitset identifier of both bitsets with a similarity score.
Parameters: - bitsets1 (Dict{str, pyroaring.BitMap}) – First dict of fingerprints with fingerprint label as key and pyroaring.BitMap as value
- bitsets2 (Dict{str, pyroaring.BitMap}) – Second dict of fingerprints with fingerprint label as key and pyroaring.BitMap as value
- out_format – ‘tsv’ or ‘hdf5’
- out_file – Filename of output file where ‘hdf5’ format is written to.
- out (File) – File object where ‘tsv’ format is written to.
- number_of_bits (int) – Number of bits for all bitsets
- mean_onbit_density (float) – Mean on bit density
- cutoff (float) – Cutoff, similarity scores below cutoff are discarded.
- label2id – dict to translate label to id (string to int)
- nomemory – If true bitset2 is not loaded into memory
- ignore_upper_triangle – When true returns similarity where label1 > label2, when false returns all similarities
-
kripodb.pairs.
dump_pairs_hdf5
(similarities_iter, label2id, expectedrows, out_file)[source]¶ Dump pairs in hdf5 file
Pro: * very small, 10 bytes for each pair + compression Con: * requires hdf5 library to access
Parameters: - similarities_iter (Iterator) – Iterator with tuple with fingerprint 1 label, fingerprint 2 label, similarity as members
- label2id (dict) – dict to translate label to id (string to int)
- expectedrows –
- out_file –
-
kripodb.pairs.
dump_pairs_tsv
(similarities_iter, out)[source]¶ Dump pairs in tab delimited file
Pro: * when stored in sqlite can be used outside of Python Con: * big, unless output is compressed
Parameters: - similarities_iter (Iterator) – Iterator with tuple with fingerprint 1 label, fingerprint 2 label, similarity as members
- out (File) – Writeable file
-
kripodb.pairs.
merge
(ins, out)[source]¶ Concatenate similarity matrix files into a single one.
Parameters: Raises: AssertionError
– When nr of labels of input files is not the same
-
kripodb.pairs.
open_similarity_matrix
(fn)[source]¶ Open read-only similarity matrix file.
Parameters: fn (str) – Filename of similarity matrix Returns: A read-only similarity matrix object Return type: SimilarityMatrix | FrozenSimilarityMatrix
-
kripodb.pairs.
similar
(query, similarity_matrix, cutoff, limit=None)[source]¶ Find similar fragments to query based on similarity matrix.
Parameters: Yields: Tuple[(str, str, float)] – List of (query fragment identifier, hit fragment identifier, similarity score) sorted on similarity score
-
kripodb.pairs.
similar_run
(query, pairsdbfn, cutoff, out)[source]¶ Find similar fragments to query based on similarity matrix and write to tab delimited file.
Parameters:
-
kripodb.pairs.
similarity2query
(bitsets2, query, out, mean_onbit_density, cutoff, memory)[source]¶ Calculate similarity of query against all fingerprints in bitsets2 and write to tab delimited file.
Parameters: - bitsets2 (kripodb.db.IntbitsetDict) –
- query (str) – Query identifier or beginning of it
- out (File) – File object to write output to
- mean_onbit_density (flaot) – Mean on bit density
- cutoff (float) – Cutoff, similarity scores below cutoff are discarded.
- memory (Optional[bool]) – When true will load bitset2 into memory, when false it doesn’t