An Insider’s Guide to Asking the Right Questions During Your PhD

By Rishi Kulkarni.

Rishi received his PhD in Chemistry from UC Berkeley in 2018 and is currently a postdoctoral scholar at Stanford University.

Published 2021-10-28.

Introduction

A common refrain from freshly minted PhDs is something like “my first two years were a waste of time” or “looking back, I could have done my first project in six weeks.” These jokes discount the learning process - you have to acquire a number of skills to really become a productive researcher. However, there is a kernel of truth here: although PhDs are expected to start doing research almost immediately, typical PhD coursework continues in the vein of an undergraduate education - taking a few more theory classes while working in the lab.

Practical coursework is rare - few programs even mandate their graduate students take a statistics course, despite the fact that every one of them will be building statistical models when analyzing their data. This leads to fresh PhD students stumbling around needlessly when performing their first real experiments and data analyses. Given that most people choose to do a PhD in the life sciences because they want to become professional scientists, why is there such little formal training in project planning and experimental science in graduate school? Why is there no curriculum on how to become a scientist?

The unfortunate reality is that successful PhD students tend to be those that quickly intuit the key aspects of project planning, management, and experimental design. I wrote this article in an effort to make the transition from an undergraduate researcher (or just a student) to a professional scientist more comfortable for new graduate students - especially those who join very large labs in which direct mentorship from your adviser is rare. As you plan your first research project in your PhD lab, if you ask yourself some of the following questions, you may avoid some of the common pitfalls of graduate school.

Is your proposed research question answerable?

Meaningful research projects begin with a well-crafted scientific question. Many graduate students start their PhDs with a general topic of study in mind, e.g. an interest in treating glioma. However, narrowing a general topic into a proper research question is a necessary next step that few people receive training on before they begin working in the lab.

Ideally, a trainee would be taught on how to craft a project by their advisor, but the degree to which an advisor provides hands-on mentorship varies greatly from lab to lab. Even if your advisor is an active participant in deciding the directions of your projects, you should think about the following aspects of your project for your own development as a scientist. Careful construction of a research project will involve consideration of 1) project goals, 2) the knowledge gaps that the project will fill, and 3) the impact that the project will have on both the scientific community and society at large. These aspects of a project might initially seem unrelated to experimental design and data analysis, but will be very handy when it comes time to write grants, fellowship applications, and manuscripts.

First off, is your proposed research question even answerable? Can you imagine a series of experiments that will result in a satisfactory answer to your question? Does the technology needed to satisfactorily answer your question exist within your reach? As your project progresses, your question will evolve and the experiments and technologies you need will change. But either way, it is better to start with some plan rather than no plan. Second, has anyone else already answered your question? The scientific literature is vast and, honestly, at the start of your training, your first idea for a new scientific project probably isn’t new territory.

Postdocs and even professors rarely think of an original research project without careful reading of the surrounding literature. Replicating prior work is important, but your goal should be extending prior results and generating new knowledge. Ideally, your project replicates prior results that are foundational to your research question, then goes on to answer deeper questions.

As you engage with the literature, you will come up with a number of questions that, as-of-right-now, have no answer. Let’s consider two example projects:

Microglia Project

Observations:

  1. Proximity to microglia is correlated with coherence in neuronal electrophysiology
  2. Secretions from reactive microglia have sialidase activity
  3. Exogenous sialidase treatment modifies neuronal electrophysiology
  4. Neurodegenerative disorders have a neuroinflammatory component

Questions:

  1. Do reactive microglia play a causal role in modulating neuronal electrophysiology?
  2. If so, does the mechanism of action involve sialidase secretion from reactive microglia?
  3. If so, which sialidase is it?

Your list of questions will grow as you perform experiments that refine your research hypothesis Eventually, the experiments that answer your questions will aggregate into the figures of your eventual manuscript.

Sometimes, however, a project idea might arise from a simple physical hypothesis:

Chemical Design Project

Observations:

  1. The sensitivity of our dye falls short of theoretical predictions based on electronics alone
  2. The theoretical predictions assume that the dye is perpendicular to the electric field it is sensing.
  3. The predicted “anchor” moiety in the dye is not perpendicular to the electric field the dye is sensing.

Questions:

  1. Is the dye tilted in relation to the electric field of interest?
  2. Is there a simple chemical modification that can be made to rectify this problem?

While crafting your research question, read everything you can find that seems even tangentially related to your idea and note how the papers relate to each other - you might find inconsistencies or gaps that help narrow your project to a key question.

As always, make sure to maintain a healthy amount of skepticism toward the conclusions of the papers you are reading. Read the abstracts to find relevant literature, but focus on the figures and methods. Ask yourself - are the methods they are using appropriate? Based on the data, do you come to the same conclusions that the authors did? Are the arguments they are making well-supported by their data and by appropriate statistical methods?

How does your proposed project fit into and expand current knowledge?

Engaging with the literature will lead you to the next point of consideration - does anyone care about the answer to your research question? This is often an exercise in framing your project in an intriguing light, but given that you hope to eventually publish your project and perhaps even apply for funding on the basis of it, it is important to give this some thought during the project planning stage.

Your career success depends on your ability to appeal to other scientists, so give the allure of your project some consideration. After your review of the relevant literature, you should have an idea of what knowledge gaps your research question might address.

A second, broader review of the literature might be a good idea. Are there other subfields that your project relates to? Or, better yet - does your project tie different subfields together in a new way? These questions will guide the introductory section to a research proposal or manuscript.

This raises another round of questions - what are the broader impacts of your project? Does anyone outside your field care about the answer to this question? Why? Is it testing a new therapy? Or is it mapping out a pathway that may someday prove to contain new therapeutic targets? These examples are focused on the life sciences, but the same basic idea applies to any field of research.

Answering these questions will also guide how you communicate your work. Your research should be guided by your own interests (after all, you are going to be working on your project for a long time), but at the end of the day, your salary and the materials you need for the project are paid for by other people, be they taxpayers or donors. Being able to explain why your project matters to other scientists and to laypeople is an important skill. This skill is also useful for acquiring your own funding, since most funding agencies require you to explicitly state the broader impacts of your proposed research in grant applications.

Do you or your lab have unique skills that can be leveraged on this project?

Over the course of your career, but especially during your training, you should always take on scientific projects that are at the edge of your abilities. Note that this is not the same thing as taking on projects that are far outside your ability. Even though you should be learning new skills during your training, leaning on some skills you already have will make starting a project much smoother.

Ideally, some of your background will give you a unique angle on your research question that is missing in the literature - probably you’ve got some skill or technique that no one else in the subfield has. More commonly, however, your already-acquired skills will make initial project setup as smooth as possible - setting up model systems, basic characterization, etc.

This is especially likely to be true at the start of your training. Ideally, you can bring your lab’s technical expertise to a new subfield while acquiring that technical expertise yourself. Using a novel technique should not be done for its own sake - identify the benefits and limitations of your approach in reference to those traditionally done in the field. If possible, validate your approach against the “gold standard” models or techniques used in your subfield of interest. These validations are useful controls that also build your audience’s confidence in your approach.

Collaborating with other research groups can also be a useful way to acquire and/or leverage new skills. During project development, you will likely plan one or more experiments that involve a technique that you do not have. Perhaps you would like to test a hypothesis in a model system you are unfamiliar with or perform some kind of genetic manipulation via a vector you have never worked with. Sometimes, your proposed experiments will involve techniques outside of your research group’s scope of interest, but still bolster the narrative of your research project enough that they are worth doing.

In these cases, you should ask yourself - what skills are you going to have to acquire from outside your research group to perform these experiments? Who is going to teach them to you? Perhaps most importantly - how much time will it take to learn these skills? In the case of a one-off experiment, is it going to save significant time to entirely outsource the experiment to a collaborator? Students are often reluctant to add co-authors to their papers, but the truth is that collaborations are the best way to meet interesting people who will both teach you and help you make your project as impactful as possible, both in terms of performing key experiments and contributing valuable ideas. As long as you remain the first author, there’s no way having another name on the paper will hurt your career.

Let’s consider our examples again:

Microglia Project

Initial question: Do reactive microglia play a causal role in modulating neuronal electrophysiology?

To answer this question, we would need a model system that allows us to activate the microglia with some stimulant, then make measurements of the effects on nearby neurons. There are a number of systems we could use - coculture, conditioned media, or even an ex vivo system. The important point, aside from choosing a model that captures the essence of the biological mechanism in question, is choosing a model that we have the technical expertise to use correctly. If we cannot find such a model, can we find a collaborator who can help us set one up?

Chemical Design Project

Initial question: Is the dye perpendicular to the short axis of the plasma membrane?

Again, there are a number of ways to answer this question. Perhaps the most tractable is in silico - molecular dynamics simulations directly interrogate the physical mechanism in question. If we are going to go this route, however, it is important to identify a mentor with experience in computational chemistry.


How many experiments will it take to hit the “go/no-go” point of the project?

Every successful project has an experiment that evolves the research question from a “yes/no” question to a “to what extent” question. To use the example above, once an experiment has been performed confirming that microglia can control neuronal electrophysiology, the question becomes, “To what extent do microglia control neuronal electrophysiology?” The answers to this new question make up the results section of a manuscript, i.e. in this disease state, via this pathway, in response to this stimulus, etc. Unfortunately, the amount of time and resources you have to invest to reach this “go/no-go” experiment vary wildly from project to project and field to field.

Sometimes, especially if you have a simple model system that you are quite familiar with, the rate-limiting step is waiting for the proper reagents to perform the experiment. Other times, you might have to train an animal to perform a complex task before you can even perform the experiment, which could make the runway to the “go/no-go” experiment several weeks or even several months.

Regardless of the length of the runway, however, the “go/no-go” experiment should be the first experiment you perform. There are many slowdowns you cannot avoid, so you should avoid what you can by setting your first experiment up to definitively answer both your initial research question and “will this project ever be publishable?”

Once you have identified the “go/no-go” experiment, what additional experiments do you need to do to make the project publishable? The answer to this question depends entirely on your field and on the narrative you are trying to build. The key point here is to identify when the project is done. You can always perform another experiment and there is always more to learn, but the mark of a successful project is the publication in which you communicate your results to the community at large.

Early in your training, it is good to have a few irons in the fire - you want to hedge against bad luck. After all, it is not possible for every one of your ideas to pan out, especially your first few ideas. Furthermore, you can hedge against a project that has a longer runway to the “go/no-go” experiment by having a side project or two that can “pass go” more quickly. Particularly in the life sciences, you will have a lot of downtime as you wait for cells or animals to grow, learn, etc., so having multiple projects moving forward in parallel is a safer strategy than putting all your eggs in one basket.

At the end of your PhD, you want your resume to show that you learned useful skills and produced useful knowledge, both of which demonstrated by completing successful projects (that is, publishing papers). Maintaining a healthy portfolio of projects is the best way to stay busy in the lab and to ensure that you leave graduate school with an impressive set of skills and papers.

Planning the first experiment

Once you have identified the key experiment that will make or break the project, it is time to actually plan out the experiment. Ideally, you have already selected a tractable model system that still captures the essentials of your research question. The next step is to make explicit the hypothesis that this experiment is testing, both conceptually and mathematically. These exercises are important both for clarity of thinking and to ensure that the statistical tests you use to support your decision-making are actually testing the intended hypothesis. Let’s consider our two examples:

Microglia Project

Hypothesis: Media conditioned by LPS-activated microglia cleaves sugars on neuronal surfaces.

The first step is turning our scientific hypothesis into a statistical model. This might sound counterintuitive to you if you are a biologist and don’t normally translate your hypotheses into mathematical language. At the end of the day, however, scientific results are communicated to other scientists in the form of statistical models. For example, confidence intervals and p-values describing a treatment effect are just properties of a coefficient in a regression model. Because every statistical method implicitly tests or describes a statistical model, we want to be sure that the model is appropriate for our data. The best way to do this is to make the model you want to test explicit during the experimental design stage.

In this case, we might find a fluorescent label for the cell-surface sugars that we are interested in, so we might choose to do an imaging experiment to visualize the fluorescence on the cell surface. The treatment, then, might be exposure to media conditioned by LPS-activated microglia. With this dependent variable (cell-surface fluorescence) and independent variable (presence or absence of LPS in the microglia media) in mind, we can build the following statistical model:

Statistical hypothesis: Cell-surface fluorescence intensity ~ β * LPS Treatment + error

However, it is rare in biology that we can make the Stable Unit Treatment Value Assumption (SUTVA) on a cell-to-cell level. In essence, this is the assumption that our decision to treat or not treat one of our experimental units does not affect the same decision for another experimental unit. In this case, it is impossible to treat a single neuron without treating the neurons around it - when we add the conditioned media to a well, all the cells in the well experience the same treatment. This leads us to an important point - the sample size, or n, for each treatment group in an experiment is the number of times the treatment was administered - which is not necessarily the number of measured values nor the number of subjects. We can reflect that with a hierarchical model:

Neuronal fluorescence intensity ~ Interceptwell + β * LPS Treatment + error

A full discussion of hierarchical models is far beyond this piece, but it’s important to note that the coefficients in a hierarchical model do not necessarily have the same number of degrees of freedom. In this case, there are many more values for the error term, which is per neuron, than the well intercept. Because the treatment is applied to wells, not neurons, it too has fewer degrees of freedom than the error term. For a more extensive discussion of hierarchical data analysis, consider reading “Data Analysis Using Regression and Multilevel/Hierarchical Models” by Gelman and Hill. Another resource that helped refine my own experimental designs was “Statistics for Experimenters: Design, Innovation, and Discovery” by Box, Hunter, and Hunter.

Chemical Design Project

Hypothesis: The tilt angle of the dye in the plasma membrane affects its sensitivity. Because this is a physical hypothesis, the mathematical model can be more precise:

Statistical hypothesis: Sensitivity ~ β * cos(Tilt Angle) + error

Coming up with an explicit mathematical hypothesis may be difficult, especially early on in a project. However, this ought to be taken as a sign that you are not ready to move on to performing experiments and analyzing the resulting data. If the model you are testing is not clear to you, you have no assurance that you are choosing the right methods to analyze your data, and therefore have no rigorous reason to believe that you are making the right decision.

What decision is your experiment informing?

This brings us to the next point - once you have a rudimentary model of the hypothesis you are trying to test with your experiment, it is important to clarify to yourself what decision your experiment is helping you make. This is useful to consider before every experiment, but especially before the “make-or-break” experiment of a project.

What experimental outcome would you consider “making” the project vs. “breaking” it? It might be tempting to define this in terms of the outcome of a statistical hypothesis test, e.g. if you reject the null hypothesis, you will continue with the project. However, rather than making a decision based on a default null hypothesis, you should select your null hypothesis to fit the actual decision you are trying to make. Perhaps you really are interested in a two-tailed hypothesis against a null of no effect, but perhaps a one-tailed test against a null hypothesis of a minimum “uninteresting” effect size might do. Early on in a project, your experiments will typically be tools to help you decide “should I perform the next experiment or should I move on to another project?”

Let’s consider our example projects:

Microglia Project

Hypothesis: Media conditioned by LPS-activated microglia cleaves sugars on neuronal surfaces.

Given that we proposed testing this hypothesis using a fluorescent marker of cell-surface sugars, measuring a decrease in neuronal fluorescence would be consistent with our model. Thus, if our decision is “perform the next experiment or abandon the project,” we would continue onto the next experiment if we witnessed a decrease in fluorescence, while abandoning the project if we saw an increase or no change in fluorescence (or troubleshooting the experiment if you had good reason to believe it failed for technical reasons). Now, we can map our decision to a choice of null and alternative hypotheses:

Neuronal fluorescence intensity ~ Interceptwell + β * LPS Treatment + error

H0: β ≥ 0

Ha: β < 0

Chemical Design Project

Hypothesis: The tilt angle of the dye in the plasma membrane affects its sensitivity.

We can produce a similar mapping for the chemistry project. Again, we have a directional hypothesis - after all, we would be uninterested in pursuing a project that explores modifications to our dye that make it perform worse. As such, we would only continue the project if we can modify the tilt angle to make the sensitivity higher.

Statistical hypothesis: Sensitivity ~ β * cos(Tilt Angle) + error

H0: β ≤ 0

Ha: β > 0

After data collection, we can test these one-tailed hypotheses with the appropriate statistical hypothesis tests. This exercise highlights why hypothesis tests are often appropriate in experimental science - a scientific manuscript reports the results of a series of experiments that the authors performed to verify a working model of their research topic. However, with each experiment, the authors were confronted with a fork in the path - if the data came out one way, they performed follow-up experiment A, while if it had come out the other way, they might have performed follow-up experiment B. Null hypothesis testing is a useful tool for the authors to justify to themselves and their readership the chain of decisions that they made.

Now, this is not to say null hypothesis testing is always correctly executed - in fact, it often is not. Nonetheless, null hypothesis testing does have a purpose in the scientific research pipeline - when an appropriate null hypothesis is chosen, researchers can control Type I error rate and, as a result, control how often they pursue unfruitful research projects.


What is the endpoint of the project?

Unfortunately, you can never prove your theories - you can only collect data consistent or inconsistent with them. However, at some point, you need to finish performing experiments and write up a paper. After all, you have only contributed to human knowledge once other people can read your results. At this point, you have a well-defined research question and an initial experiment that informs the question.

Once the first experiment is done and the data is analyzed, it is good to once again consider - what experiment (or set of experiments) make up the final figure of the paper? The answer to this question is field-specific and should be informed by the other papers you have read, conversations with your advisor, and the resources that are available to you. Perhaps the final figure will involve the elucidation of a biochemical mechanism or testing your hypothesis in a complex system (such as in vivo) - whatever it may be, it is good to set your target early.

At some point, you have to stop performing new experiments, write up a manuscript, and submit it for peer review. This point will change as a project evolves, but it is important to set a finish line at the beginning - otherwise, it is easy to indefinitely put off wrapping things up. In my own experience, this is one of the most common problems that trainees run into, especially before their first publication. It is much easier to continue doing experiments and puttering around in the lab than to start making figures and writing up a paper - at that point in your training, you will feel much more competent at performing lab work than you will at writing a manuscript, after all. Avoid this common pitfall by setting a target during the planning phase of a project. As you perform experiments and analyze your data, begin making a backbone of your paper. In particular, write down what pieces of your narrative are confirmed by each experiment - this will make it easier to plan your next experiments and, when the time comes, will provide a solid outline for your manuscript when the time comes to write.

As a quick aside, be sure to talk to your adviser about submitting your finished manuscript to a preprint server (i.e. bioRxiv) while you wait for the peer review process. Peer review takes several weeks to months on average, and can even take years. In the meantime, as you graduate and apply for your next job, you need something to point at representing your PhD work. Preprints can even start accumulating citations for your paper before it is published, which is a bonus.

Consider other models that your first dataset is consistent with and compare them to your working model. What experiments might help you choose one model over the other? Engaging with the literature is a good way of finding such experiments - perhaps someone else has reported a pharmacological intervention that inhibits the pathway you are interested in, or perhaps there is a genetically-modified cell line that might allow you to approach your research hypothesis from another angle. Whatever it is, these experiments are necessary to bolster your proposed model against competing models.

Peer reviewers will often argue that your results have alternative explanations and will propose different models of what happened in your experiments after you submit your manuscript, so it is worth being practiced in developing experiments that compare two models. By planning experiments from this perspective, you can preempt reviewer objections and avoid performing laborious experiments proposed in peer review. Admittedly, you will likely still have to perform a few of these.

As you perform experiments and as your project progresses, your working model should update. Plan experiments that are at the edge of your current model’s predictions so that each dataset could falsify your current model for a better one. At the end, your project should produce evidence for a model that not only you can believe in, but the community at large can, too.

Finally, the questions left unanswered by your manuscript can (and ought to) be the seeds of future projects. Walk through this list of questions again and ask one more - what new skills do I want to acquire by doing this project? As a trainee, your best work will be done once you develop a good breadth of knowledge in your subfield and the skills necessary to execute the variety of experiments necessary to answer your research questions. To do that, always push yourself - you will learn the most by working at the limit of your abilities.