Thesis (Index)   <-  Sean Forman   <-  You Are Here



Next: Protein and Protein Folding Up: TORSION ANGLE SELECTION AND Previous: TORSION ANGLE SELECTION AND Subsections


Why We Care

Proteins transport oxygen to cells, ward off harmful infections, convert chemical energy into mechanical energy and perform many other important and beneficial biological processes. Proteins are also notable for their more deleterious effects; for instance, viruses are surrounded by protein shells that allow them to gain access to host cells. For the most part, the three-dimensional structure (i.e., the conformation or fold) of a protein determines its proper function or lack thereof. Knowledge of the protein's structure brings many benefits: the ability to synthesize drugs which interact with particular proteins, greater understanding of genetic defects, and improved therapies for diseases such as AIDS and malaria.


Post-genome Life Sciences

Like beads in a necklace, a linear sequence of amino acids, joined together as an open chain, form a protein. There are only twenty naturally occurring amino acids, and every protein is some combination of these twenty amino acids [84]. Proteins can vary in length from less than ten to thousands of amino acids.1 Given the alphabet of 20 amino acids, the space of potential proteins is vast and provides some idea of how so much genetic diversity is possible.2

Amino acid sequences in nature are relatively easy to determine. They are stored on our genome as sequences of DNA base pairs called genes. With the recent announcement of the completion (or anticipated completion) of the sequencing of the human genome [76], we will soon have the data to determine all the genes on the genome. This will place us in the position to know the structures of all the proteins in the human body, and similar results will be possible for other genomes that have been sequenced.

Traditionally, two laboratory techniques have been used to determine a protein's three-dimensional structure, x-ray crystallography and nuclear magnetic resonancing (NMR) [84]. Protein structures, however, are often very difficult to find using these methods. In x-ray crystallography, crystals of the protein are formed and an x-ray of the crystal is then taken. But some proteins are difficult to crystallize, and even when a crystal is grown, it is uncertain if the in vivo and crystalline structures are the same. In nuclear magnetic resonancing, the proteins remain in solution and the nuclear spin of the protein's hydrogen atoms is studied. This information is then translated into the positions of the protein's other atoms. Unlike x-ray crystallography, NMR can also give some insight into the dynamics involved in protein folding, but at the moment it is limited to small proteins. In both cases, these techniques are slow and expensive to perform relative to the large number of structures we would like to determine.

This expense and time constraint is creating an ever widening gap in the number of known protein sequences and the number of known protein structures.3 This gap between known sequences and known structures is growing larger as more genomes are sequenced and fold determination fails to keep up.

Clearly, there is room for new and interesting solutions to the protein structure prediction problem. Many of these proposed solutions take the form of computational protein structure prediction methods. Central to these methods is the idea that the structure of a protein is determined by its amino acid sequence [2].4This insight suggests that knowing the sequence of amino acids gives us all the information we need in order to determine the protein's structure.


Outline of Our Approach

Previous computational approaches will be discussed later (Section 2.2), but nearly all of them are combinatorial in nature. The search space of potential solutions grows exponentially in size as the length of the amino acid sequence grows. In fact, some formulations of the protein folding problem have been proven to be NP-complete [8].5

This thesis describes work I have undertaken on a larger multiple investigator project called HOPS. HOPS, which is an acronym for Hybrid Optimizer of Protein Structure, is an ab initio6 protein folding package. We are approaching structure prediction as a discrete optimization problem. We select a discrete search space, search it in parallel, and then make small continuous adjustments to our solutions (hence, the ``Hybrid'' in our title). Following some background on the biology involved (Section 2.1) and previously attempted techniques (Section 2.2), a general discussion of the processes involved in HOPS (Chapter 3) will be followed by a detailed discussion of my primary contributions to the project: selection of the discrete search space (clustering, Chapter 4) and continuous adjustment of the discrete search space (tweaking, Chapter 5). A discussion of numerical results and future directions will complete this thesis.


next up previous
Next: Protein and Protein Folding Up: TORSION ANGLE SELECTION AND Previous: TORSION ANGLE SELECTION AND
sforman@sju.edu