Contact Maps without atom slice¶

The contact map classes use an equivalent to mdtraj.atom_slice() to cut down the trajectory or frame to only contain atoms that are part of either the query or haystack. It does not use this optimization if there are no atoms to be slices. This is more performant for most real world cases, as we tested it for a protein system without water (only hydrogens are sliced) and a system of only tip4p water (hydrogens and dummy atoms are sliced away). It does involve some overhead mainly consisting of the mapping of sliced indices back to the real indices after the contactmap or frequency has been calculated. This overhead could be significant in the case where the resulting contactmap/frequency is dense (resulting in a lot of keys to be mapped) and very little is sliced away (triggering the optimization).

This notebook will show such a worst case scenario and how to disable this optimization.

The worst case example¶

This is only adding a noticeable overhead for ContactMap in this case, but this could be worse for other systems

[1]:

%matplotlib inline
import matplotlib.pyplot as plt
import mdtraj as md
topology = traj.topology

[2]:

from contact_map import ContactMap, ContactFrequency, ContactDifference

[3]:

# Select all atoms
atoms = range(topology.n_atoms)

# Slice away a single atom so the optimization is triggered
used_atoms = atoms [1:]

[4]:

%%time
frame_contacts = ContactMap(traj[0], query=used_atoms, haystack=used_atoms)

CPU times: user 824 ms, sys: 48 ms, total: 872 ms
Wall time: 291 ms

[5]:

%%time
trajectory_contacts = ContactFrequency(traj, query=used_atoms,
haystack=used_atoms)

CPU times: user 1min 12s, sys: 3.28 s, total: 1min 16s
Wall time: 20.6 s

[6]:

# Check if the optimization was triggered
trajectory_contacts.use_atom_slice

[6]:

True


Disabling use_atom_slice¶

The default for use_atom_slice is take from the class variables, to disable use_atom_slice set class._class_use_atom_slice = False

[7]:

ContactMap._class_use_atom_slice = False
ContactFrequency._class_use_atom_slice = False
ContactDifference._class_use_atom_slice = False

[8]:

%%time
frame_contacts = ContactMap(traj[0], query=used_atoms, haystack=used_atoms)

CPU times: user 668 ms, sys: 20 ms, total: 688 ms
Wall time: 204 ms

[9]:

%%time
trajectory_contacts = ContactFrequency(traj, query=used_atoms,
haystack=used_atoms)

CPU times: user 1min 12s, sys: 3.42 s, total: 1min 15s
Wall time: 21.1 s

[10]:

trajectory_contacts.use_atom_slice

[10]:

False


Speedup on a real use case¶

So the example where both the query and haystack are defined, would be an example of a real world case. Lets see what the slowdown would be if we did not use atoms_slice

[25]:

switch1 = topology.select("resSeq 32 to 38 and symbol != 'H'")
cations = topology.select("resname NA or resname MG")


[26]:

# Set class_atom_slice to True
ContactMap._class_use_atom_slice = True
ContactFrequency._class_use_atom_slice = True
ContactDifference._class_use_atom_slice = True

[27]:

%%time
cations_switch1 = ContactFrequency(trajectory=traj, query=cations, haystack=switch1)

CPU times: user 536 ms, sys: 32 ms, total: 568 ms
Wall time: 167 ms

[28]:

# Set class_atom_slice to False
ContactMap._class_use_atom_slice = False
ContactFrequency._class_use_atom_slice = False
ContactDifference._class_use_atom_slice = False

[29]:

%%time
cations_switch1 = ContactFrequency(trajectory=traj, query=cations, haystack=switch1)

CPU times: user 7.35 s, sys: 316 ms, total: 7.66 s
Wall time: 1.99 s