finding symmetric cycles in a graph
분류없음 2009/06/16 00:57So, here is a ligand for FKBP named, 1,3-DIPHENYL-1-PROPYL-1-(3,3-DIMETHYL-1,2- DIOXYPENTYL)-2-PIPERIDINE CARBOXYLATE. The thing I wanted to do was to find all symmetric cycles, so that I can restrict the rotation during my calculation. In this example, I'd like to find out two phenyl rings that have C22 and C16. The ring system that have N in it is not symmetrical, so my program should be able to discriminate this.
At first, I thought it could be done easily if I use some sort of graph library. After a quick research, I decided to try out networkx and python-graph, and finally decided to stick with networkx because it depends less amount of other library and seems more mature than ; however python-graph also looks decent and it's source code actually helped me later to figure out finding cycles in my graph.
I started read tutorial and making some testcases, and, then, realized it does not have a nice method to detecting cycles in a graph. The same was true for determining whether the cycle is symmetric or not. So, I have to come up with my own, although I've helped out by numerous discussions on the web.
Here's my version.
#!/usr/bin/env python
import networkx as nx
import time
import re
def ligand_topology(filename='testcases/sb3/sb3/sb3.rtf'):
G = nx.LabeledGraph()
f = open(filename, 'r')
atm = {}
for line in f.readlines():
if line.startswith("ATOM"):
entry = re.split('\s+', line)
atm[entry[1]] = {'type': entry[2], 'charge': entry[3]}
if line.startswith('BOND'):
entry = re.split('\s+', line)
G.add_node(entry[1], atm[entry[1]])
G.add_node(entry[2], atm[entry[2]])
G.add_edge(entry[1], entry[2])
return G
def is_cycle(graph, source):
def dfs(node):
visited.append(node)
for each in graph[node]:
if (cycle):
return
if graph.degree(each) < 2:
continue
if each not in visited:
spanning_tree[node] = each
dfs(each)
nodes.append(each)
if each != spanning_tree[source] and source in graph[each]:
cycle.append(each)
if not cycle and node == source:
while nodes: nodes.pop()
pass
#while cycle: cycle.pop()
#while visited: visited.pop()
visited = []
cycle = []
nodes = []
spanning_tree = {}
dfs(source)
return nodes
def is_symmetric(graph, subgraph, nbunch):
# this is not true, but i'll just skip when a ring has more than one entry point
count = 0
entry = None
for n in subgraph:
if graph.degree(n) is not subgraph.degree(n):
count +=1
entry = n
if count > 1: return False
# build distance list from the root element
# TODO: currently, it only compares the atom types to determine whether it is symmetric.
# this is potential pot hole because if some kind of isomers are there, it will not able
# to detect it.
root = subgraph[entry].keys()[0]
dist = {}
for n in nbunch:
if n == root: continue
distance = nx.shortest_path_length(subgraph,root, n)
if dist.has_key(distance): dist[distance].append(n)
else: dist[distance] = [n]
for d,nodes in dist.items():
sub = None
if len(nodes) < 2: continue
for n in nodes:
if sub == None: sub = set([x[0] for x in subgraph.neighbors(n)])
else:
if sub != set([x[0] for x in subgraph.neighbors(n)]): return False
return True
def find_all_cycles(graph, build=False):
# remove all nodes that have only one connection
degrees = {}
for node in graph.nodes():
degree = graph.degree(node)
if degrees.has_key(degree): degrees[degree].append(node)
else: degrees[degree] = [node]
# find all cycles
max_degree = max(degrees.keys())
cycles = []
for i in range(max_degree, 2, -1):
for node in degrees[i]:
cycle = set(is_cycle(graph, node))
# uniqify cycles
if cycle:
uc = [node]
for c in cycle:
u = set(is_cycle(graph, c))
if node not in u: continue
else: uc.append(c)
h = G.subgraph(uc)
s = set(nx.node_connected_component(h, node))
if s not in cycles:
cycles.append(s)
if build:
tmp_cycles = []
for cycle in cycles:
h = nx.Graph()
h.add_edges_from(graph.edges(cycle))
tmp_cycles.append(h)
cycles = tmp_cycles
return cycles
def find_all_sym_cycles(graph):
cycles = find_all_cycles(graph)
sym_cycles = []
for cycle in cycles:
h = nx.Graph()
h.add_edges_from(graph.edges(cycle))
is_sym = is_symmetric(graph, h, cycle)
if is_sym:
sym_cycles.append(h)
return sym_cycles
if __name__ == '__main__':
G = ligand_topology()
# ubigraph server should already be running
#G = nx.UbiGraph()
cycles = find_all_sym_cycles(G)
for cycle in cycles:
print cycle.nodes()
#['C22', 'C23', 'C26', 'C27', 'C24', 'C25', 'H32', 'H33', 'H30', 'H31', 'H34', 'C13']
#['C20', 'C21', 'H29', 'H28', 'H25', 'H27', 'H26', 'C19', 'C18', 'C17', 'C16', 'C15']
So, basically two things:
1. find all cycles
2. determine whether the cycle is symmetric
The first one can be done by building depth-first-search (dfs) tree. If the dfs tree ended up having the root node that the tree is started from then the root node is one of element of the cycle. In my case, I've exhaustively tested each node whether the node is a component of a cycle. And I tried reduce cycles by finding intersecting nodes only. This was necessary because, for some nodes, dfs tree expanded to have non-cycle component nodes as well.
After every possible "unique" cycles are found, I needed to determine whether the cycle is symmetric. Uh, symmetric with respect to what? Right, now I need to know which edge can be used as a rotation axis. Thanks to the graph library, it has a subgraph method which returns a "connected" subgraph of given nodes. By just plugging in my cycle nodes, I got back a connected graph that have all the cyclic components plus one extra node that is not part of cycle; in fact this node is the node that connects the cycle and the rest of the graph. So the degree of the node in our overall graph and subgraph must be different, and the edge between the node and the root node can be used as a rotational axis.
Now, actual symmetry detection. I could have used a built-in method, which compares whether two graphs are isomorphic after defining graph on the left and graph on the right of the root. However, the problem is, my actual use-case is about atom, which makes two nodes identical if they are in same type (e.g. H11 node and H12 nodes are identical in that sense). So, I ended up just comparing atom types only after building a distance map starting from the root node.
Please bear with me that this is very rough work and I didn't know nothing about graph theory before. Any comments to improve this are welcome.
