kripodb.modifiedtanimoto¶
Module to calculate modified tanimoto similarity
-
kripodb.modifiedtanimoto.
calc_mean_onbit_density
(bitsets, number_of_bits)[source]¶ Calculate the mean density of bits that are on in bitsets collection.
Parameters: - bitsets (list[pyroaring.BitMap]) – List of fingerprints
- number_of_bits – Number of bits for all fingerprints
Returns: Mean on bit density
Return type:
-
kripodb.modifiedtanimoto.
corrections
(mean_onbit_density)[source]¶ Calculate corrections
See
similarity()
for explanation of corrections.Parameters: mean_onbit_density (float) – Mean on bit density Returns: ST correction, ST0 correction Return type: float
-
kripodb.modifiedtanimoto.
similarities
(bitsets1, bitsets2, number_of_bits, corr_st, corr_sto, cutoff, ignore_upper_triangle=False)[source]¶ Calculate modified tanimoto similarity between two collections of fingerprints
Excludes similarity of the same fingerprint.
Parameters: - bitsets1 (Dict{str, pyroaring.BitMap}) – First dict of fingerprints with fingerprint label as key and pyroaring.BitMap as value
- bitsets2 (Dict{str, pyroaring.BitMap}) – Second dict of fingerprints with fingerprint label as key and pyroaring.BitMap as value
- number_of_bits (int) – Number of bits for all fingerprints
- corr_st (float) – St correction
- corr_sto (float) – Sto correction
- cutoff (float) – Cutoff, similarity scores below cutoff are discarded.
- ignore_upper_triangle (Optional[bool]) – When true returns similarity where label1 > label2, when false returns all similarities
Yields: (fingerprint label 1, fingerprint label2, similarity score)
-
kripodb.modifiedtanimoto.
similarity
(bitset1, bitset2, number_of_bits, corr_st, corr_sto)[source]¶ Calculate modified Tanimoto similarity between two fingerprints
Given two fingerprints of length n with a and b bits set in each fingerprint, respectively, and c bits set in both fingerprint, selected from a data set of fingerprint with a mean bit density of ρ0, the modified Tanimoto similarity SMT is calculated as
\[S_{MT} = (\frac{2 - ρ_0}{3}) S_T + (\frac{1 + ρ_0}{3}) S_{T0}\]where ST is the standard Tanimoto coefficient
\[S_T = \frac{c}{a + b - c}\]and Sr0 is the inverted Tanimoto coefficient
\[S_{T0} = \frac{n - a - b + c}{n -c}\]Parameters: - bitset1 (pyroaring.BitMap) – First fingerprint
- bitset2 (pyroaring.BitMap) – Second fingerprint
- number_of_bits (int) – Number of bits for all fingerprints
- corr_st (float) – St correction
- corr_sto (float) – Sto correction
Returns: modified Tanimoto similarity
Return type: