kripodb.frozen¶
Similarity matrix using pytables carray
-
class
kripodb.frozen.
FrozenSimilarityMatrix
(filename, mode='r', **kwargs)[source]¶ Frozen similarities matrix
Can retrieve whole column of a specific row fairly quickly. Store as compressed dense matrix. Due to compression the zeros use up little space.
Warning! Can not be enlarged.
Compared find performance FrozenSimilarityMatrix with SimilarityMatrix:
>>> from kripodb.db import FragmentsDb >>> db = FragmentsDb('data/feb2016/Kripo20151223.sqlite') >>> ids = [v[0] for v in db.cursor.execute('SELECT frag_id FROM fragments ORDER BY RANDOM() LIMIT 20')] >>> from kripodb.frozen import FrozenSimilarityMatrix >>> fdm = FrozenSimilarityMatrix('01-01_to_13-13.out.frozen.blosczlib.h5') >>> from kripodb.hdf5 import SimilarityMatrix >>> dm = SimilarityMatrix('data/feb2016/01-01_to_13-13.out.h5', cache_labels=True) >>> %timeit list(dm.find(ids[0], 0.45, None))
… 1 loop, best of 3: 1.96 s per loop >>> %timeit list(fdm.find(ids[0], 0.45, None)) … The slowest run took 6.21 times longer than the fastest. This could mean that an intermediate result is being cached. … 10 loops, best of 3: 19.3 ms per loop >>> ids = [v[0] for v in db.cursor.execute(‘SELECT frag_id FROM fragments ORDER BY RANDOM() LIMIT 20’)] >>> %timeit -n1 [list(fdm.find(v, 0.45, None)) for v in ids] … 1 loop, best of 3: 677 ms per loop >>> %timeit -n1 [list(dm.find(v, 0.45, None)) for v in ids] … 1 loop, best of 3: 29.7 s per loop
Parameters: -
h5file
¶ tables.File – Object representing an open hdf5 file
-
scores
¶ tables.CArray – HDF5 Table that contains matrix
-
labels
¶ tables.CArray – Table to look up label of fragment by id or id of fragment by label
-
count
(frame_size=None, raw_score=False, lower_triangle=False)[source]¶ Count occurrences of each score
Only scores are counted of the upper triangle or lower triangle. Zero scores are skipped.
Parameters: Returns: Score and number of occurrences
Return type:
-
find
(query, cutoff, limit=None)[source]¶ Find similar fragments to query.
Parameters: Returns: Hit fragment identifier and similarity score
Return type:
-
from_array
(data, labels)[source]¶ Fill matrix from 2 dimensional array
Parameters: - data (np.array) – 2 dimensional square array with scores
- labels (list) – List of labels for each column and row index
-
from_pairs
(similarity_matrix, frame_size, limit=None, single_sided=False)[source]¶ Fills self with matrix which is stored in pairs.
Also known as COOrdinate format, the ‘ijv’ or ‘triplet’ format.
Parameters: - similarity_matrix (kripodb.hdf5.SimilarityMatrix) –
- frame_size (int) – Number of pairs to append in a single go
- limit (int|None) – Number of pairs to add, None for no limit, default is None.
- single_sided (bool) – If false add stored direction and reverse direction. Default is False.
time kripodb similarities freeze –limit 200000 -f 100000 data/feb2016/01-01_to_13-13.out.h5 percell.h5 47.2s time kripodb similarities freeze –limit 200000 -f 100000 data/feb2016/01-01_to_13-13.out.h5 coo.h5 0.2m - 2m6s .4m - 2m19s .8m - 2m33s 1.6m - 2m48s 3.2m - 3m4s 6.4m - 3m50s 12.8m - 4m59s 25.6m - 7m27s
-
to_pairs
(pairs)[source]¶ Copies labels and scores from self to pairs matrix.
Parameters: pairs (SimilarityMatrix) –
-