kripodb.modifiedtanimoto

Module to calculate modified tanimoto similarity

kripodb.modifiedtanimoto.calc_mean_onbit_density(bitsets, number_of_bits)[source]

Calculate the mean density of bits that are on in bitsets collection.

Parameters:
  • bitsets (list[pyroaring.BitMap]) – List of fingerprints
  • number_of_bits – Number of bits for all fingerprints
Returns:

Mean on bit density

Return type:

float

kripodb.modifiedtanimoto.corrections(mean_onbit_density)[source]

Calculate corrections

See similarity() for explanation of corrections.

Parameters:mean_onbit_density (float) – Mean on bit density
Returns:ST correction, ST0 correction
Return type:float
kripodb.modifiedtanimoto.similarities(bitsets1, bitsets2, number_of_bits, corr_st, corr_sto, cutoff, ignore_upper_triangle=False)[source]

Calculate modified tanimoto similarity between two collections of fingerprints

Excludes similarity of the same fingerprint.

Parameters:
  • bitsets1 (Dict{str, pyroaring.BitMap}) – First dict of fingerprints with fingerprint label as key and pyroaring.BitMap as value
  • bitsets2 (Dict{str, pyroaring.BitMap}) – Second dict of fingerprints with fingerprint label as key and pyroaring.BitMap as value
  • number_of_bits (int) – Number of bits for all fingerprints
  • corr_st (float) – St correction
  • corr_sto (float) – Sto correction
  • cutoff (float) – Cutoff, similarity scores below cutoff are discarded.
  • ignore_upper_triangle (Optional[bool]) – When true returns similarity where label1 > label2, when false returns all similarities
Yields:

(fingerprint label 1, fingerprint label2, similarity score)

kripodb.modifiedtanimoto.similarity(bitset1, bitset2, number_of_bits, corr_st, corr_sto)[source]

Calculate modified Tanimoto similarity between two fingerprints

Given two fingerprints of length n with a and b bits set in each fingerprint, respectively, and c bits set in both fingerprint, selected from a data set of fingerprint with a mean bit density of ρ0, the modified Tanimoto similarity SMT is calculated as

\[S_{MT} = (\frac{2 - ρ_0}{3}) S_T + (\frac{1 + ρ_0}{3}) S_{T0}\]

where ST is the standard Tanimoto coefficient

\[S_T = \frac{c}{a + b - c}\]

and Sr0 is the inverted Tanimoto coefficient

\[S_{T0} = \frac{n - a - b + c}{n -c}\]
Parameters:
  • bitset1 (pyroaring.BitMap) – First fingerprint
  • bitset2 (pyroaring.BitMap) – Second fingerprint
  • number_of_bits (int) – Number of bits for all fingerprints
  • corr_st (float) – St correction
  • corr_sto (float) – Sto correction
Returns:

modified Tanimoto similarity

Return type:

float