|
|
분류없음 2009/07/02 00:46
현업에 종사하고 있지 않다 보니, 이런저런 아이디어들은 있는데, 쉽게 적용해 볼 수 있는 기회가 적다. 몇달이 넘도록 그냥 '해보고 싶은것' 목록에 남아 있는 한가지가 Google App Engine 사용해 보기 였다. 마침, 학과에서 자체 심포지움을 하는데, 나보고 발표자료의 초록 (abstract) 을 모아보도록 하였고, 이 기회에 초록 데이터베이스를, Google App Engine 을 이용해보면 좋겠다는 생각이 들었다.
Google App Engine 이 매력적인 이유는,
1. 초기 호스팅 비용이 들지 않는다. 어느정도 quota 안에서는 무료
2. appspot.com 무료 2차 도메인
3. 내가 좋아하는 Python 으로 웹프로그래밍을 할 수 있음
4. Python 웹 프레임웍인 django 와 호환이 됨
Google App Engine Launcher 라는 프로그램을 통해 기본 파일 구조를 생성하여, 로컬에서 구축/테스트 한 후, 서버로 업로드 하는 방식으로 개발이 진행된다. 기본적으로는 Google 자체 프레임웍을 사용하며, 원할 경우 수동으로 django 프레임웍을 사용할 수도 있다.
간단한 app 을 만들고 deploy 해 본 후 느낀점은,
1. 로컬 테스트 / 서버 deploy 하는 프로세스가 맘에 듦.
2. App Engine Launcher 의 데이터 베이스 도우미 및 deploy 도우미가 잘 작동함
3. 프레임웍은 아직 좀 원시적인듯
4. django 템플릿 엔진을 거의 그대로 사용하는듯
이상. 혹시 소스가 필요하신 분은 Github로..
분류없음 2009/06/16 00:57
So, here is a ligand for FKBP named, 1,3-DIPHENYL-1-PROPYL-1-(3,3-DIMETHYL-1,2- DIOXYPENTYL)-2-PIPERIDINE CARBOXYLATE. The thing I wanted to do was to find all symmetric cycles, so that I can restrict the rotation during my calculation. In this example, I'd like to find out two phenyl rings that have C22 and C16. The ring system that have N in it is not symmetrical, so my program should be able to discriminate this.
At first, I thought it could be done easily if I use some sort of graph library. After a quick research, I decided to try out networkx and python-graph, and finally decided to stick with networkx because it depends less amount of other library and seems more mature than ; however python-graph also looks decent and it's source code actually helped me later to figure out finding cycles in my graph.
I started read tutorial and making some testcases, and, then, realized it does not have a nice method to detecting cycles in a graph. The same was true for determining whether the cycle is symmetric or not. So, I have to come up with my own, although I've helped out by numerous discussions on the web.
Here's my version.
#!/usr/bin/env python
import networkx as nx
import time
import re
def ligand_topology(filename='testcases/sb3/sb3/sb3.rtf'):
G = nx.LabeledGraph()
f = open(filename, 'r')
atm = {}
for line in f.readlines():
if line.startswith("ATOM"):
entry = re.split('\s+', line)
atm[entry[1]] = {'type': entry[2], 'charge': entry[3]}
if line.startswith('BOND'):
entry = re.split('\s+', line)
G.add_node(entry[1], atm[entry[1]])
G.add_node(entry[2], atm[entry[2]])
G.add_edge(entry[1], entry[2])
return G
def is_cycle(graph, source):
def dfs(node):
visited.append(node)
for each in graph[node]:
if (cycle):
return
if graph.degree(each) < 2:
continue
if each not in visited:
spanning_tree[node] = each
dfs(each)
nodes.append(each)
if each != spanning_tree[source] and source in graph[each]:
cycle.append(each)
if not cycle and node == source:
while nodes: nodes.pop()
pass
#while cycle: cycle.pop()
#while visited: visited.pop()
visited = []
cycle = []
nodes = []
spanning_tree = {}
dfs(source)
return nodes
def is_symmetric(graph, subgraph, nbunch):
# this is not true, but i'll just skip when a ring has more than one entry point
count = 0
entry = None
for n in subgraph:
if graph.degree(n) is not subgraph.degree(n):
count +=1
entry = n
if count > 1: return False
# build distance list from the root element
# TODO: currently, it only compares the atom types to determine whether it is symmetric.
# this is potential pot hole because if some kind of isomers are there, it will not able
# to detect it.
root = subgraph[entry].keys()[0]
dist = {}
for n in nbunch:
if n == root: continue
distance = nx.shortest_path_length(subgraph,root, n)
if dist.has_key(distance): dist[distance].append(n)
else: dist[distance] = [n]
for d,nodes in dist.items():
sub = None
if len(nodes) < 2: continue
for n in nodes:
if sub == None: sub = set([x[0] for x in subgraph.neighbors(n)])
else:
if sub != set([x[0] for x in subgraph.neighbors(n)]): return False
return True
def find_all_cycles(graph, build=False):
# remove all nodes that have only one connection
degrees = {}
for node in graph.nodes():
degree = graph.degree(node)
if degrees.has_key(degree): degrees[degree].append(node)
else: degrees[degree] = [node]
# find all cycles
max_degree = max(degrees.keys())
cycles = []
for i in range(max_degree, 2, -1):
for node in degrees[i]:
cycle = set(is_cycle(graph, node))
# uniqify cycles
if cycle:
uc = [node]
for c in cycle:
u = set(is_cycle(graph, c))
if node not in u: continue
else: uc.append(c)
h = G.subgraph(uc)
s = set(nx.node_connected_component(h, node))
if s not in cycles:
cycles.append(s)
if build:
tmp_cycles = []
for cycle in cycles:
h = nx.Graph()
h.add_edges_from(graph.edges(cycle))
tmp_cycles.append(h)
cycles = tmp_cycles
return cycles
def find_all_sym_cycles(graph):
cycles = find_all_cycles(graph)
sym_cycles = []
for cycle in cycles:
h = nx.Graph()
h.add_edges_from(graph.edges(cycle))
is_sym = is_symmetric(graph, h, cycle)
if is_sym:
sym_cycles.append(h)
return sym_cycles
if __name__ == '__main__':
G = ligand_topology()
# ubigraph server should already be running
#G = nx.UbiGraph()
cycles = find_all_sym_cycles(G)
for cycle in cycles:
print cycle.nodes()
#['C22', 'C23', 'C26', 'C27', 'C24', 'C25', 'H32', 'H33', 'H30', 'H31', 'H34', 'C13']
#['C20', 'C21', 'H29', 'H28', 'H25', 'H27', 'H26', 'C19', 'C18', 'C17', 'C16', 'C15']
So, basically two things:
1. find all cycles
2. determine whether the cycle is symmetric
The first one can be done by building depth-first-search (dfs) tree. If the dfs tree ended up having the root node that the tree is started from then the root node is one of element of the cycle. In my case, I've exhaustively tested each node whether the node is a component of a cycle. And I tried reduce cycles by finding intersecting nodes only. This was necessary because, for some nodes, dfs tree expanded to have non-cycle component nodes as well.
After every possible "unique" cycles are found, I needed to determine whether the cycle is symmetric. Uh, symmetric with respect to what? Right, now I need to know which edge can be used as a rotation axis. Thanks to the graph library, it has a subgraph method which returns a "connected" subgraph of given nodes. By just plugging in my cycle nodes, I got back a connected graph that have all the cyclic components plus one extra node that is not part of cycle; in fact this node is the node that connects the cycle and the rest of the graph. So the degree of the node in our overall graph and subgraph must be different, and the edge between the node and the root node can be used as a rotational axis.
Now, actual symmetry detection. I could have used a built-in method, which compares whether two graphs are isomorphic after defining graph on the left and graph on the right of the root. However, the problem is, my actual use-case is about atom, which makes two nodes identical if they are in same type (e.g. H11 node and H12 nodes are identical in that sense). So, I ended up just comparing atom types only after building a distance map starting from the root node.
Please bear with me that this is very rough work and I didn't know nothing about graph theory before. Any comments to improve this are welcome.
분류없음 2009/03/30 15:25
phi = open(sys.argv[1], 'rb').read()
spec = {'header': [4, 48], 'parameter': [56, 104], 'phi': [108, -4]}
header = phi[spec['header'][0]:spec['header'][1]]
parameter = phi[spec['parameter'][0]:spec['parameter'][1]]
nphi = struct.unpack('i', phi[spec['phi'][0]:spec['phi'][0]+4])[0]/4
nclx, ncly, nclz, dcel, xbcen, ybcen, zbcen = struct.unpack('3i d 3d', header[0:44])
So, I had above code working in my laptop. And try to migrate the code to my server, and it suddenly stopped working. at the very last line. So does it work in Mac OS and not linux? But it worked in other linux machine when I tested. So, is it the my python that sucks? No it was not.
After struggling about two hours, I found following workaround:
nclx, ncly, nclz = struct.unpack('3i', header[0:12])
dcel, xbcen, ybcen, zbcen = struct.unpack('4d', header[12:44])
# what??
#nclx, ncly, nclz, dcel, xbcen, ybcen, zbcen = struct.unpack('3i d 3d', header[0:44])
which does not make sense..... odd.
Programming 2006/12/27 20:32
1. BeautifulSoup - Parsing an HTML is incredibly easy. from BeautifulSoup import BeautifulSoup import urllib2 doc = urllib2.urlopen('http://www.google.com').read() soup = BeautifulSoup(doc) - Navigating through parsed tree is simple. soup.contents[0].name # u'html' soup.head.next.name # u'title' soup.head.next.nextSibling.name # u'body'
- Finding a tag / tags is powerful. soup.findAll('div', id='sectionName') soup.findAll('div', {'class': 'singleprotein'}) soup.find('p', align='center') soup.find('p', align=re.compile('^b.*')
- You don't need to worry about modifying the parsed tree. soup = BeautifulSoup('<a1></a1><a><b>Amazing content<c><d></a><a2></a2>') soup.a1.nextSibling # <a><b>Amazing content<c><d></d></c></b></a> soup.a2.previousSibling # <a><b>Amazing content<c><d></d></c></b></a>
subtree = soup.a subtree.extract()
print soup # <a1></a1><a2></a2> soup.a1.nextSibling # <a2></a2> soup.a2.previousSibling # <a1></a1> 2. BioPython - BioPython is a group of python libraries for biology. - Searching biology db and parsing it is a piece of cake. from Bio import db from Bio.PDB import PDBParser db # db, exporting 'embl', 'embl-dbfetch-cgi', 'embl-ebi-cgi', 'embl-fast', 'embl-xembl-cgi', 'embl-xml', 'fasta', 'fasta-sequence-eutils', 'genbank-nucleotide', 'genbank-protein', 'interpro', 'interpro-ebi-cgi', 'medline', 'medline-eutils', 'nucleotide-genbank-eutils', 'pdb', 'pdb-ebi-cgi', 'pdb-rcsb-cgi', 'prodoc', 'prodoc-expasy-cgi', 'prosite', 'prosite-expasy-cgi', 'protein-genbank-eutils', 'swissprot', 'swissprot-expasy-cgi', 'swissprot-usmirror-cgi'
f = db['pdb']['1kdx'] p = PDBParser() s = p.get_structure('1kdx', f) s[0] # <model id=0> s[0].child_list # [<Chain id=A>, <Chain id=B>] len(s[0].child_list[0].child_list) # 81 (81 residues) 3. Building Protein List - The objective is building a table of membrane-proteins so we can get some insight of what to choose. - Protein list from http://blanco.biomol.uci.edu/Membrane_Proteins_xtal.html is used. - It didn't provide size and the image information, I downloaded pdb to calculate how many residues are there and downloaded images from opm website. 4. Protein List - At first, I went through the original protein list and fetch pdb files and image files, then build the whole list. - protein_list.py, protein_list.html, protein_list.pdf - 5. Inspected List - Then, I extract rows, which don't meet my needs. - inspect.py, inspected_list.html, inspected_list.pdf -
|