This page was generated from examples/contact_map_without_atom_slice.ipynb. Interactive online version: Binder badge

Contact Maps without atom slice

The contact map classes use an equivalent to mdtraj.atom_slice() to cut down the trajectory or frame to only contain atoms that are part of either the query or haystack. It does not use this optimization if there are no atoms to be slices. This is more performant for most real world cases, as we tested it for a protein system without water (only hydrogens are sliced) and a system of only tip4p water (hydrogens and dummy atoms are sliced away). It does involve some overhead mainly consisting of the mapping of sliced indices back to the real indices after the contactmap or frequency has been calculated. This overhead could be significant in the case where the resulting contactmap/frequency is dense (resulting in a lot of keys to be mapped) and very little is sliced away (triggering the optimization).

This notebook will show such a worst case scenario and how to disable this optimization.

The worst case example

This is only adding a noticeable overhead for a single frame trajectory in this case, but this could be worse for other systems

[1]:
%matplotlib inline
import matplotlib.pyplot as plt
import mdtraj as md
traj = md.load("5550217/kras.xtc", top="5550217/kras.pdb")
topology = traj.topology
[2]:
from contact_map import ContactFrequency, ContactDifference
[3]:
# Select all atoms
atoms = range(topology.n_atoms)

# Slice away a single atom so the optimization is triggered
used_atoms = atoms [1:]
[4]:
%%time
frame_contacts = ContactFrequency(traj[0], query=used_atoms, haystack=used_atoms)
CPU times: user 251 ms, sys: 14.7 ms, total: 266 ms
Wall time: 293 ms
[5]:
%%time
trajectory_contacts = ContactFrequency(traj, query=used_atoms,
                                       haystack=used_atoms)
CPU times: user 12.9 s, sys: 76.4 ms, total: 13 s
Wall time: 13.3 s
[6]:
# Check if the optimization was triggered
trajectory_contacts.use_atom_slice
[6]:
True

Disabling use_atom_slice

The default for use_atom_slice is take from the class variables, to disable use_atom_slice set class._class_use_atom_slice = False

[7]:
ContactFrequency._class_use_atom_slice = False
ContactDifference._class_use_atom_slice = False
[8]:
%%time
frame_contacts = ContactFrequency(traj[0], query=used_atoms, haystack=used_atoms)
CPU times: user 150 ms, sys: 4.99 ms, total: 155 ms
Wall time: 171 ms
[9]:
%%time
trajectory_contacts = ContactFrequency(traj, query=used_atoms,
                                       haystack=used_atoms)
CPU times: user 13.2 s, sys: 76 ms, total: 13.3 s
Wall time: 13.4 s
[10]:
trajectory_contacts.use_atom_slice
[10]:
False

Speedup on a real use case

So the example where both the query and haystack are defined, would be an example of a real world case. Lets see what the slowdown would be if we did not use atoms_slice

[11]:
switch1 = topology.select("resSeq 32 to 38 and symbol != 'H'")
cations = topology.select("resname NA or resname MG")

[12]:
# Set class_atom_slice to True
ContactFrequency._class_use_atom_slice = True
ContactDifference._class_use_atom_slice = True
[13]:
%%time
cations_switch1 = ContactFrequency(trajectory=traj, query=cations, haystack=switch1)
CPU times: user 112 ms, sys: 3.98 ms, total: 116 ms
Wall time: 124 ms
[14]:
# Set class_atom_slice to False
ContactFrequency._class_use_atom_slice = False
ContactDifference._class_use_atom_slice = False
[15]:
%%time
cations_switch1 = ContactFrequency(trajectory=traj, query=cations, haystack=switch1)
CPU times: user 1.89 s, sys: 15.1 ms, total: 1.9 s
Wall time: 1.93 s