Measure Proteins in Place
Better tools to study proteins in their natural context could bolster predictive algorithms.
What drew me — and I suspect many — to biology is its chaos and its beauty. You’ve heard this before: Cells are crowded, imperfect, warm. Thermal energy fluctuations are large compared to the scale of most biological interaction energies. It is nothing short of a miracle, then, that biological systems work as well as they do. Living systems are far more robust than anything humans have engineered because they know, by heart, the secrets of the molecular world.
Among this mess is proteins, which comprise over half of non-water weight in cells. For proteins, context is crucial. Their very job is to interact with other biomolecules in a crowded cellular environment, and these interactions regulate just about everything.
Indeed, proteins are defined just as much by their context and function as by their structure or sequence. Though it’s often necessary, purifying proteins and studying them in isolation removes them from their context — and this has been the predominant way of studying proteins for decades. We need better tools to study proteins and their interactions in a native context. Doing so might help us to better design and build functional proteins from first principles, or at least improve our algorithmic predictions of protein structures. But this is a long-standing and difficult problem.
In living cells, there are many proteins whose functions would be difficult to tease apart in isolation. Some proteins, like ubiquitin, are small enough to wiggle into tight spaces and then, like a warning sign, mark other proteins for degradation. Other proteins are complex, molecular motors that rotate and turn a chemical gradient into ATP. Still other proteins moonlight — they have multiple jobs prescribed to them by chance of evolution. One small signaling protein, XCL1, adopts two folds: one that forms dimers and another that does not.
For purified proteins in solution, missing contextual factors can drastically alter experimental results. Protein structures obtained from X-ray crystallography are affected by crystallization conditions, such as humidity and solutes. Structures are important on their own, but the secrets of protein folding cannot be understood without accounting for the conditions, environment, and molecular chaperones within a cell. Models to predict protein structures regularly fail because of limitations in the data that is available to computer scientists — Even after AlphaFold released over 200 million protein structures, the algorithm fails to predict structure-disrupting mutations because it isn’t trained on that kind of data.
Building datasets of proteins and interactions in their native context could inform our understanding of proteins from first principles, and such data could be incorporated into models to yield better predictions.
Some proteins are notoriously difficult to study in vitro. Membrane proteins are picky about their chemical environment, and so-called ‘intrinsically disordered’ proteins are doomed to carry this misnomer only because we lack sufficient understanding of their binding interactions, however weak and transient. Hence, we often study membrane proteins in reconstituted membranes, and use cross-linkers to capture transient interactions between proteins and linear motifs. These are brilliant compromises — model systems can often recapitulate relevant details without including the full details of a cell — but having all the context would be even better.
To visualize proteins in cells, we often carefully incorporate fluorescent labels so as to not perturb a protein’s structure and function. Label-free methods that use photoacoustics have recently been developed for live cell imaging, but their resolution is limited. Notably, many existing methods cannot fully capture molecular interactions in solution, or can only do so at very low throughputs (for instance, by studying a few proteins at a time).
There is much to be learned at both the cellular and protein scale by exploring protein-peptide interactions without knowing a priori what will interact. Hence, recent methods have been developed for probing many-to-many interactions. Two examples include PCP-SILAM, which uses mass spectroscopy, and SMART display, which uses DNA sequencing. However, these datasets take a long time to generate, and testing whether a perturbation (such as expressing a recombinant protein or introducing a peptide or drug) will affect an interaction network is still an arduous process — most labs don’t have the expertise or resources required.
So how can one study proteins in their natural context, at high throughput?
In some ways, we are already on our way. Cryo-electron tomography, a method in which cells are flash frozen, sliced to a thickness less than 150 nanometers (or 200 times thinner than a single human hair) and then imaged with beams of electrons, could soon be used to image the entire proteome of individual cells. Deep learning algorithms can help to resolve ‘noisy’ images in these datasets, and electron microscopy hardware for imaging cells is constantly improving.
Still, cryo-electron tomography can miss transient protein interactions because cells are frozen in time. Several labs are working on time-resolved methods (freezing cells at various time points) to resolve these issues, and these lines of inquiry seem promising.
New Science is providing funds to develop a platform technology that measures transient protein-peptide interactions, using a barcoded peptide library and DNA sequencing as an output. We intend to present preliminary work towards this aim soon, and solicit feedback from the scientific community. But we’re excited about this platform because it will provide us with datasets that could help refine predictive algorithms for designing proteins.
Predictive algorithms hinge on the ability to represent information about proteins in a way that machines can understand, and in a way that is consistent across a large sample of proteins. Algorithms to predict a protein’s structure — like AlphaFold — can be trained using amino acid sequences, or a protein’s binding partners, or a complex set of features that may not be intuitive or interpretable to humans at all. AlphaFold is trained, simply, on amino acid sequences and a few hundred thousand solved protein structures; it cannot predict unknown protein structures from first principles.
A dataset of protein binding affinities to a reference pane of peptides could better inform predictions of protein surface features, such as binding hotspots, how peptides bind to known domains, or even protein structure. After all, it is suspected that half of transient interactions between peptides and proteins are uncharacterized, and there still may be more biologically relevant interactions that are simply unknown.
A protein is its sequence of amino acids, yes; but each one also carries a storied, mutational history and has an intricate, molecular structure. A protein can be represented by its binding surfaces, its place and role in a complex signaling pathway, or by what drugs bind to it. Existing methods reveal facets of protein function, interactions, and structure. And, together, they are quite useful to study protein-peptide interactions. But these are partial descriptions of a grander picture — proteins are part of an unfathomably complex biological environment. They are jostled by neighbors and molded by their chemical conditions and partners. Let’s study them in context, and in newfound ways, to reveal their intricate and dynamic whole.
Edited by Niko McCarty
Thanks to Sasha Targ for feedback on this essay.
Cite this essay:
“Measure Proteins in Place." newscience.org. 2022 September. https://doi.org/10.56416/182qnf