Spatial profiling of proteins using hydrophobic moments | Patent Publication Number 20080086291
US 20080086291 A1Generally, the present invention provides a number of procedures to spatially profile proteins by using hydrophobic moments. In all procedures, a hydrophobicity distribution of a protein is shifted and normalized. In one procedure, a shape or profile of a curve of a second-order moment of hydrophobicity is determined. A second procedure involves determining one or more ratios, such as the ratio of a distance at which the second order moment of hydrophobicity vanishes to the distance at which a zero-order moment of hydrophobicity vanishes. The distance at which a peak occurs in a profile of the zero- or second-order moment of hydrophobicity can also be used for comparison. For many of these procedures, a surface or profiling contour can be chosen and used to accumulate hydrophobicities and to determine the moments. These procedures can be combined to provide a good mathematical determination of whether a protein belongs to a particular class of proteins.
- 1. A method for spatially profiling a protein to determine if the protein is a globular protein, the method comprising the steps of: ndetermining a shifted and normalized hydrophobicity distribution for a protein; determining a centroid of the protein; determining, by using the shifted and normalized hydrophobicity distribution, an adjusted second-order moment of hydrophobicity; determining a profile of the adjusted second-order moment of hydrophobicity, wherein determining the profile comprises determining a first distance from the centroid at which the adjusted second-order moment of hydrophobicity is zero; and comparing the profile to a globular protein profile to determine if the protein is a globular protein.
- 9. A system for spatially profiling a protein to determine if the protein is a globular proteins comprising: na memory that stores computer-readable code; and a processor operatively coupled to the memory, the processor configured to implement the computer readable code, the computer-readable code configured to: ndetermine a shifted and normalized hydrophobicity distribution for a protein; determine a centroid of the protein; determine, by using the shifted and normalized hydrophobicity distribution, an adjusted second-order moment of hydrophobicity; determine a profile of the adjusted second-order moment of hydrophobicity, wherein determining the profile comprises determining a first distance from the centroid at which the adjusted second-order moment of hydrophobicity is zero; and compare the profile to a globular protein profile to determine if the protein is a globular protein.
- 17. An article of manufacture for spatially profiling a protein to determine if the protein is a globular protein, comprising: na computer-readable medium having computer-readable code embodied thereon, the computer-readable code comprising: na step to determine a shifted and normalized hydrophobicity distribution for a protein; a step to determine a centroid of the protein; a step to determine, by using the shifted and normalized hydrophobicity distribution, an adjusted second-order moment of hydrophobicity; a step to determine a profile of the adjusted second-order moment of hydrophobicity, wherein determining the profile comprises determining a first distance from the centroid at which the adjusted second-order moment of hydrophobicity is zero; and a step to compare the profile to a globular protein profile to determine if the protein is a globular protein.
This application is a divisional application of U.S. patent application Ser. No. 09/818,461, filed Mar. 27, 2001, which claims the benefit of U.S. Provisional Application No. 60/245,396, filed Nov. 2, 2000 incorporated by reference herein.
The present invention relates to the mathematical analysis of proteins and, more particularly, relates to the spatial profiling of proteins using hydrophobic moments.
Proteins may be thought of as string with beads on it. Each bead has a particular color. For many proteins, there are 20 colors, or 20 different beads. The string folds up in a certain way, which means that it ends up with a certain series of folds. When profiling a protein, researchers attempt to determine the order of the colors of the beads and where the beads are in three-dimensional space. These locations are important because all of the bodily functions depend on this three-dimensional structure. An important problem is determining how hundreds of thousands of proteins fold.
Many proteins are globular and form in an intracellular environment or plasma, which are both aqueous environments. For these proteins, it can be assumed that there are only two colors, blue and red. Blue beads (called “hydrophobic†) do not like water and red beads (called “hychophilic†) are attracted to water. When these types of globular proteins fold up, all of the blue beads get in the center and the red beads are on the outside of the protein. Consequently, the residues that like water are on the outside and the residues that do not like water are on the inside. A protein formed in this manner will have a hydrophobic core and a hydrophilic exterior.
The structure of globular proteins can actually be quite complex, and contain substructures such as beta sheets, beta strands, alpha-helices, and other helices. Because the structure of the protein affects the way that the protein interacts with its environment (and vice versa), protein structures have been studied in detail A computational technique for studying proteins includes mathematically modeling protein structure to determine primary, secondary, tertiary, and even quaternary protein structures.
Many of these techniques examine details associated with proteins, such as determining exactly where residues are or the exact order of residues. Few of these techniques are suitable for analyzing an entire protein. Even fewer of the these techniques can accurately determine whether a man-made protein structure is or could be a real protein.
Thus, what is needed is a better way of quantifying and analyzing protein structure and a better way to determine if an example protein structure is or could be a real protein.
Generally, the present invention provides a number of procedures to spatially profile proteins by using hydrophobic moments. In all procedures, a hydrophobicity distribution of a protein is shifted and normalized. This allows better quantitative comparisons of proteins. In one procedure, a shape or profile of a curve of a second-order moment of hydrophobicity is determined. This shape can then be used to determine if an example protein belongs to a particular class of proteins, such as globular proteins. A second procedure involves determining one or more ratios, such as the ratio of a distance at which the second order moment of hydrophobicity vanishes to the distance at which a zero-order moment of hydrophobicity vanishes. The distance at which a peak occurs in a profile of the zero- or second-order moment of hydrophobicity can also be used for comparison. These techniques also help to determine if a protein belongs to a globular or other class of proteins. For many of these techniques, a surface or profiling contour can be chosen and used to accumulate hydrophobicities and to determine the moments. These procedures can be combined to provide a good mathematical determination of whether a protein belongs to a particular class of proteins. For globular proteins in particular, the present invention reveals that many globular proteins exhibit similar structural characteristics. This result may be used to easily determine if a decoy protein (a man-made exemplary protein) is a globular protein or a poor structural imitation.
A more complete understanding of the present invention, as well as further features and advantages of the present invention, will be obtained by reference to the following detailed description and drawings.
The present invention provides a tool for probing protein structure. This tool may be used in such situations as protein folding, dynamic protein modeling or analysis of protein structure. The present invention may be used to analyze any protein but is particularly useful for analyzing proteins that form in an aqueous environment, such as globular proteins. It turns out, as will be discussed in more detail below, that globular proteins exhibit certain characteristics that can be determined by the present invention. These characteristics can be used to analyze a protein or decoy (a man-made protein) to see if it is a globular protein. Transmembrane proteins will have a different profile signature, but may also be analyzed by the present invention.
Because globular proteins form in an aqueous environment, they have a hydrophobic core and a hydrophilic exterior. A hydrophobicity scale can be used to determine the hydrophobicity distribution of a protein. A hydrophobicity value is a value that indicates the degree to which a residue is attracted to or repelled by water. The resultant hydrophobicity distribution can be shifted and normalized, which places each protein with mathematical basis for comparison. Without shifting the hydrophobicity distribution, the ability to compare different proteins is significantly degraded. If the hydrophobicity distribution is shifted but not normalized, the ratios disclosed herein can still be compared. However, values of the moments cannot be compared.
After shifting and/or normalizing the hydrophobicity distribution, the adjusted zero- and second-order moments of the hydrophobicity distribution can be determined. The zero- and second-order moments are “adjusted†because they use a hydrophobicity distribution that is shifted or shifted and scaled. The shape or profile of the adjusted second-order moment can be used to determine if a protein is globular. All globular proteins studied to date exhibit a characteristic profile such that the adjusted second-order moment rises from zero to a high positive value, then passes through zero and becomes strongly negative. There is generally only one zero crossing after the high positive value, and the profile becomes strongly negative after the zero crossing. Any protein that does not exhibit this profile most likely is not a globular protein.
Another technique that can be used to distinguish globular proteins from other proteins or decoys is the determination of a ratio of the distance at which the adjusted second-order moment of hydrophobicity vanishes and the distance at which the adjusted zero-order moment of the hydrophobicity vanishes (or vice versa). Another ratio that can be determined is a ratio of a distance at which a peak occurs in a profile of the zero-order moment of hydrophobicity and a distance at which the zero-order moment of hydrophobicity vanishes. Yet another ratio is a ratio between a distance at which a peak occurs in a profile of the second-order moment of hydrophobicity and the distance at which the second-order moment of hydrophobicity vanishes. For all globular proteins, both peaks of the zero- and second-order moments occur at the same distance from the centroid of the protein. Globulat proteins tend to exhibit a certain range of these distance ratios. If a protein or decoy has a hydrophobicity ratio that is not within the range, then the protein or decoy is likely not a globular protein.
The “distance†discussed in the last paragraph is determined with reference to the centroid of the protein, which is the center of mass of the protein when each of residue is assigned unit mass. Additionally, a surface is necessary to determine the cumulative moments. A good choice of a surface for globular proteins is an ellipsoidal surface. The ellipsoidal surface is used to determine the cumulative moment at a particular distance from the centroid. The surface defines a volume that contains the hydrophobicity distribution of amino acid residues.
Although the primary emphasis herein is placed on globular proteins, the present invention may be used to analyze other proteins, such as extracellular or transmembrane proteins, as well. For these proteins, suitable surfaces, such as spheres or cylinders, may be utilized.
Referring now to
The centroid of the protein (step 115) is determined as the centroid of residue centroids.
In step 120, the hydrophobicity distribution is determined. Each residue is assigned a hydrophobicity consensus value hi. In this disclosure, a residue and an amino acid will be treated as being fungible. A representative table of hydrophobicity values is shown in
It should be noted that this is also the net hydrophobicity of the protein (step 120 of
The first-order moment of the hydrophobicity distribution is:
where {right arrow over (r)}i is a vector to the centroid of the ith amino acid residue with hydrophobicity consensus value h1. The sum is over all n amino acid residues. Since the zero-order moment, H0, or net hydrophobicity of the protein, is generally non-vanishing, the first-order moment will depend upon the origin of the calculation. In connection with the calculated moments of α-helices, Eisenberg (see Eisenberg et al., Faraday Symp. Chem. Soc., 17, pp. 109-120, 1982; and Eisenberg et al., Nature, p. 299, 371-374, 1982, the disclosures of which are incorporated herein by reference) had pointed out that the first-order moment would be invariant if hydrophobicity differences about the mean,
with
The first-order moment calculated about the centroid of the protein is, therefore, a measure of first-order hydrophobic imbalance about the mean. With the inclusion of values of the solvent accessible surface area, si, fox each of the residues, the surface exposed first-order hydrophobic moment imbalance about the entire protein can then be written:
This could provide useful information with respect to the three-dimensional spatial affinity of the tertiary protein structure and external structures with which it might interact. Thus, these equations provide insight into protein structures. However, this would not profile the hydrophobicity distribution within the protein interior.
Second-order moments provide the capability of spatially profiling the hydrophobicity distribution of amino acid residues. Profiling the distribution of hydrophobicity requires the choice of a profiling shape. Proteins come with all sorts of overall shapes. To profile, one must choose a particular reference point (the centroid), an appropriate coordinate system (the principal axes of geometry) and a shape representative of the protein (such as an ellipsoidal shape for a globular protein). A representation that is the simplest generalization of the shape of a globular protein is an ellipsoidal representation. This representation can be generated from the molecular moments-of-geometry, i.e., moments-of-inertia for which all amino acid residue centroids are weighted by unity instead of by residue mass. The moments of geometry are obtained as eigenvalues of the following moment-of-geometry matrix written in dyadic notation:
where {tilde over (1)} is the unit dyadic. The calculation is performed with the centroid (determined by using the amino acid centroids) of the protein as origin. The moments-of-geometry are designated g1, g2, and g3, with g1<g2<g3. The ellipsoidal representation generated by these moments is written as:
x2+g21y2=g31z2=d2  (Eq 8)
with g21=g2/g1 and g31=g3/g1. The coordinates, x, y, z, are written in the lame of the principal-geometric-axes Equation 8 determines a surface (step 135) that can be used to profile the moments of the hydrophobicity distribution.
The ellipsoidal surface obtained by the choice of a particular value of d enables the collection of the values of hydrophobicity for all amino acid residues of number, nd, lying within this surface. The consensus hydrophobicity scale of
The hydrophobicity distribution arises from the spatial distribution of residues and their assigned values of hydrophobicity. The distribution of amino acid hydrophobicity is, however, shifted (step 140) such that the net hydrophobicity of each protein vanishes. This is done by subtracting the average hydrophobicity from each value in the hydrophobicity distribution. Thus, when the surface described by d encompasses all of the residues, the shifted hydrophobicity distribution will yield a net hydrophobicity value of zero.
It should be noted that it is not necessary to zero the net hydrophobicity when the last residue is collected. Optionally, one could profile the protein by zeroing out the zero-order moment (which is an indication of the net hydrophobicity up until a certain distance) at a location in the protein interior.
Such shifting of the values of amino acid hydrophobicity eliminates the zero-order moment from the distribution and, consequently, the dependence of the second-order moment upon differences in net protein hydrophobicity. This provides a basis for comparison of the hydrophobic moment profiles of the different proteins and, consequently, a basis for comparison of their hydrophobic ratios.
The distribution is then optionally, but preferably, normalized (step 145) to yield a standard deviation of one. This step enables comparison of the moment magnitudes of different proteins.
The average hydrophobicity per residue collected within the ellipsoidal surface specified by d is then written (step 150):
Equation 9 is one way to create an adjusted zero-order hydrophobic moment The superscript, d, indicates that the moment has been divided by the number of residues, nd. Dividing by the number of residues is not necessary, but can be used to aid comparisons. The prime designates the value of hydrophobicity of each residue after shifting and normalizing the distribution. The term (hi-
The value of the second-order ellipsoidal moment per residue (step 160), from residues lying within the ellipsoidal surface specified by d is written:
Equation 10 is one way to create an adjusted second-order hydrophobic moment. When all residues fall within the ellipsoidal surface and are collected, the following results:
where
The values of H0d and H2d are calculated for each protein with increasing values of the surface defined by d.
Once the zero- and second-order hydrophobic moments have been determined, the distances at which peaks occur for the profiles of these moments may be determined (step 165). The distances of the peaks are preferably determined as being distances from the centroid of the protein. Some exemplary peaks and distances are described below.
In step 170, the distance is determined at which the second-order hydrophobic moment becomes zero. The distance d is the value of d for which H2d has changed sign, becoming negative, and do the value for which Hod vanishes. The protocol that, for d. to be chosen, all values of H2d at larger values of d must be negative, seems to be a quick estimate of when the second-order hydrophobic moment vanishes. A more accurate estimate would choose the value of d for which the second-order moment was the smallest.
In step 175, various hydrophobic ratios are determined. One possible ratio is the ratio between d. and d0 (i.e., R equal to d/d0). Another ratio is the ratio between a distance at which a peak of the zero-order moment of hydrophobicity occurs (d0p) and a distance at which the zero-order moment of hydrophobicity vanishes (i.e., R equal to d0p/d0). A third ratio is the ratio of a distance at which a peak of the second-order moment of hydrophobicity occurs (d2p) and the distance at which the zero-order moment of hydrophobicity vanishes (i.e., R equal to d2p/d0). The latter two ratios, as seen and discussed below, are equal.
For globular proteins, these ratios should be comparable and act as discriminative devices, which can include or exclude proteins from a set of representative globular proteins.
In step 180, results from examining the current protein can be compared with results determined previously. This step allows a set of proteins to be determined and a general profile that matches each of the profiles for the zero- and/or second-order hydrophobic moments to be determined. Ranges of ratios for the set of proteins can also be determined. If the protein being examined has profiles that are of a shape similar to the general profile, then the current protein is assumed to belong to the class of proteins defined by the set of proteins. Similarly, if the ratios for the current protein are within a predetermined amount from the range of ratios obtained for the set of proteins, then the current protein is assumed to belong to the class of proteins defined by the set of proteins.
In this manner, either single proteins or a set of proteins may be examined and profiled or compared with the profiles or ratios determined from a training set of proteins.
Referring now to
Turning now to
It should also be noted that computers system 310 could be an application-specific integrated circuit that performs some or all of the steps and functions discussed herein.
As is known in the art, the methods and apparatus discussed herein may be distributed as an article of manufacture (such as compact disk 305) that itself comprises a computer readable medium having computer readable program code embodied thereon. The computer readable program code is operable, in conjunction with a computer system, to carry out all or some of the steps to perform the methods or create the apparatuses discussed herein. The computer readable medium may be a recordable medium (e.g., floppy disks, hard drives, compact disks, or memory cards) or may be a transmission medium (e.g., a network comprising fiber-optics, the world-wide web, cables, or a wireless channel using time-division multiple access, code-division multiple access, or other radio-frequency channel). Any medium known or developed that can store information suitable for use with a computer system may be used. The computer-readable program code is any mechanism for allowing a computer to read instructions and data, such as magnetic variations on a magnetic medium or height variations on the surface of compact disk 305.
What has been shown so fax is a tool for probing proteins and revealing structures of proteins that have not been determined before. This tool also provides better comparisons between proteins than what has come before. Because the benefits of the present invention are hard to envision when equations are solely used, the following Examples section provides a more visual and succinct description of results obtained by using the present invention.
Now that the methods of the present invention have been presented, experimental results will be presented. For the experimental results, protein structures were selected by keyword searches of the Protein Data Bank (PDB) and by examination of entries in different SCOP classes. For more discussion on the latter; see Murzin et al., Journal of Molecular Biology 247, 536-540, 1995, the disclosure of which is incorporated herein by reference. The objective was to choose a selection representative of different sizes and different classes. Thirty protein structures were chosen in this manner. For an internal check, two of the proteins chosen included 1CTQ and 121P, the same protein with independently determined structures. Three additional proteins were also chosen from the recently determined structure of the 30S ribosomal subunit. For more information about the structure of the 30S ribosomal subunit, see Wimberly et al. Nature 407, 327-339, 2000, the disclosure of which is incorporated herein by reference. The PDB identifications (IDs) and number of amino acid residues for each are listed in
Detailed results of profiling one of the structures, 1AKZ, are shown in
All thirty protein structures that were tested exhibit similar spatial behavior for either the accumulated second-order hydrophobic moment, H2(a), or H2d(d), the moment per residue. The accumulated profiles awe, however; somewhat smoother and accentuate the plunge to negative values as the surface of the protein is approached.
A few of the proteins require special attention. Three of the structures, 1PDO, 1LDM and 1FSZ, have extended arms that are away from the main body of the protein. Collecting all residues to determine the value of d0 yields a value that is not representative of the protein bulk. Shifting the scale of residue hydrophobicity such that the net hydrophobicity of the protein is zero when all residues of the bulk are collected, yields the values given in
Structure 1LBU exhibits slightly deviant behavior of H2. There is a rapid crossover to a negative value of the second-order moment at a value of d equal to 20. This value remains negative, until at d equal to 23 it becomes marginally positive before becoming negative again at d equal to 24 and thereafter. The two zero crossovers at d equal to 20 and d equal to 24 yield a hydrophobic-ratio average of 0.76.
Two of the ribosomal proteins, B—1FJF (chain B; protein S2) and D—1FJF (chain D; protein S4) axe the largest deviants with respect to the values of Rt for the non-ribosomal proteins. On the other hand, C—1FJF (Chain C; protein S3) yields a value of Rt that is within the range of the other thirty values C—1FJF makes no contact with RNA at all and exhibits an α/β-domain frequently found in different proteins with α-helices packed against a β-sheet.
Finally, ellipsoidal moment profiling has been performed on a simple decoy set. Fourteen decoys and native structures of this set, with a number of residues greater than one hundred, were obtained from Stanford University Twenty-eight moment calculations were, therefore, performed. A typical result is shown in
The comparison between the second-order moment profiles of the native with the decoy structures is revealing. The second-order moment amplifies differences about the mean protein hydrophobicity. Profiles of the native structures reflect the significant separation between the hydrophobic residues comprising the core and the hydrophilic residues the protein exterior. The decoy residue distribution fails to mirror this separation. This suggests that moment profiling should play an important role in recognizing the difference between native folds and decoy folds. It should also play a role in validating predicted protein structures.
With respect to molecular dynamics and protein folding pathways, profiling could be done at various points in the folding trajectory. One would then look for trajectories that begin to exhibit a relatively smooth monotonic increase of the second-order moment in the structural interior with the onset of a transition to negative values near the exterior. It would then be of interest to see how close such identification would appear with respect to the final native structure achieved. After identification or selection of such trajectory, fine-tuning could then be observed or directed by examination of the hydrophobic-ratio. Considering the native structure as the endpoint in the folding trajectory, perhaps the moment regularities will provide not only constraints with respect to the pathways selected but also provide a clue to the underlying processes responsible for such selection.
The procedures described in this disclosure need not be restricted to examination of globular proteins, but can be used in connection with the profiling of proteins of diverse overall structure with the choice of an appropriate overall profiling geometry.
Thus, what has been shown are techniques for determining profiles and ratios for protein probing and analysis. In the case of globular proteins, heretofore unseen characteristics and similarities between relatively diverse proteins have been shown. Moreover, the present invention allows decoy and unrelated proteins to easily be excluded from a group of already examined and similar proteins.
It is to be understood that the embodiments and variations shown and described herein are merely illustrative of the principles of this invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention. For instance, surfaces other than an ellipse, such as a conical surface or cylindrical surface could be used. Additionally, shifting could be used without normalization.