Skip to main content

Economics of doing a PhD: An Informal Introduction (Part 1)

I often ask, “What’s expected of a PhD?” The literature suggests there are both private and social returns to pursuing one. But we must tread carefully: prestige, for instance, shouldn’t be conflated with a social return—it’s as private as the higher earnings a PhD might bring.

Digging deeper, it’s clear that PhDs are inherently self-paced and heterogeneous. This is one level of education where each observation (in an imaginary dataset) is so unique that it could almost justify individual dummy variables in a regression, though that’s plainly impractical.

Still, if we try to model the PhD journey, we might focus on two broad dimensions: the rhetorical returns (e.g., skills, signaling) and the costs (time, opportunity costs, mental toll). But here’s the catch: superimposing assumptions from lower levels of education often distorts findings, especially when analysing student costs.

In this blog, I’ll offer an informal take on the Economics of a PhD, with a focus on India’s unique academic landscape.

Potential Private Returns to Ph.D.:

The private returns to a PhD constitute a multiverse of complexity. When we narrow our focus to India, this complexity doesn't just persist - it intensifies. Consider a simple regression approach:

$\ln Y_i = \alpha + \beta \cdot \text{PhD}_i + \epsilon_i \quad \text{(1)}$

This raises an important question: Why isn't this specification a standard Mincer equation? [1] Technically, it could be. However, PhD education introduces unique econometric challenges that violate the Mincer framework's assumptions. To see why, let's expand Equation (1) into a full Mincerian specification:

      $\ln Y_i = \alpha + \beta \cdot \text{PhD}_i + \gamma \cdot \text{Exp}_i + \delta \cdot \text{Exp}_i^2 + \mathbf{\theta} \cdot \mathbf{X}_i + \epsilon_i \quad \text{(2)}$

It is understood that Xᵢ is the vector of all other covariates that are controlled for, and Yᵢ is the nominal return to a PhD. The previously mentioned issues are as followsthough we continue to use the Mincerian setup for the rest part of the blog:

(i) Theoretically, it can be considered both a proposition and a fact that a PhD is training rather than just a degree. In the literature on the economics of education, framing a level of education as training shifts its theoretical and econometric treatment. Moreover, institutional frameworks—particularly in the corporate sector—sometimes consider a PhD as experience (i.e., training), yet paradoxically exclude Teaching Assistantships (TA, henceforth) from that classification.

Further, the well-celebrated (and equally criticised) Mincerian equation for private returns to higher education includes necessary linear and quadratic experience terms. Given the preceding discussion, disentangling "experience" from PhD training without violating collinearity assumptions becomes theoretically untenable.

(ii) Two major drawbacks of the traditional Mincerian equation persist here:

Endogeneity: Enrollment in higher education (and thus PhD programs) and future wage streams suffer from endogenous selection. Indian datasets compound this issue by lacking crucial ability variables—there is no nationally representative test score to serve as an appropriate proxy. As Arrow (1973) [2] argues, available ability indicators (e.g., test scores) measure intellectual ability, not productive ability, and the two are not interchangeable.

In practice, accounting for all endogeneity sources in estimating private returns to education is impossible—a problem exacerbated for PhDs.

Selection bias: This is self-evident in this context and requires no elaboration.

Thus, beyond standard econometric challenges, the structural uniqueness and heterogeneity of PhD programs render the Mincerian equation unusable without strict sample restrictions.

Individual-Level Heterogeneity:

At first glance, this heterogeneity could be modelled through completion timelines (i.e., the longer the educational journey, the greater the individual variation). Yet, this isn’t the kind of heterogeneity that truly matters. The more meaningful variation lies in how PhD candidates invoke heterogeneity—as a defence mechanism—when faced with the dreaded question: “When will you submit?”

Is that a valid excuse? No.
Is it commonly used? Absolutely.

Unlike undergraduate studies, where interruptions such as marriage, financial constraints, or health issues predictably delay graduation, PhD timelines diverge wildly, even among candidates who appear to face no observable hurdles. This calls for a dedicated regression framework, with completion time 
$𝑇_𝑖$ as the dependent variable:

$T_i = \alpha_i + \beta_i \cdot \text{Ic}_i + \gamma_i \cdot \text{In}_i + \delta_i \cdot \mathbf{P}_i + \epsilon_i \quad \text{(3)}$

Let’s break this down.

The intercept $𝛼_𝑖$ represents the existing research aptitude of student $𝑖$. This can be shaped by several factors: institutional background, curriculum exposure, income levels, personal motivation, and the point in time when the individual discovered a genuine interest in research. Implicit here is an assumption of monotonicity over time—that is, all else equal, more time equals more progress. But in the Indian context, the slope of this equation can vary dramatically.

Even within a single discipline like Economics, there are multiple layers of inequality, curriculum-level inequality being a major one. Consider the gap between (a) how much a Master’s curriculum potentially equips a student for research, and (b) how much preparation is actually needed to engage in doctoral-level work. In India, this gap varies widely across institutions and individuals, forming what we might describe as an uneven Euclidean space. That gap—the “distance”—influences the intercept $𝛼_𝑖$ in our equation.

Next, $I^c_i$ and $I^n_i$ represent institutional and individual characteristics, respectively. The vector $𝑃_i$ is particularly crucial, as it captures the unique elements of PhD-level research practices.

Now, research practices $𝑃$ are highly heterogeneous, not only across disciplines but within them. In Economics, for instance, research may involve primary or secondary data work, and for others, it may lean heavily on theory, mathematics, or qualitative interpretation. The boundaries are often grey, and hybrid approaches are common. The rise of AI-assisted and technology-driven research has only made these lines blurrier.

This complexity deepens when $𝑃$ interacts with individual characteristics $I^c$. To understand this interaction, let’s borrow from Neumann & Morgenstern’s (1947) expected utility theory. Assuming perfect knowledge of $𝑃$, we can classify students by their risk preferences. For simplicity, we’ll ignore the risk-neutral category, as it implies no strong preference.
  1. Risk-Averse: A risk-averse student tends to avoid stepping outside their comfort zone. Suppose they are given a choice between a challenging research direction that could yield high future returns $(𝑌_𝑖)$ but involves significant methodological uncertainty—and a familiar path with more modest returns. The risk-averse student will likely choose the latter.
  2. Risk-Taking: A risk-taking student, by contrast, is more likely to deviate from their prior training. For example, a student whose Master’s education was modestly applied in nature might actively pursue a PhD in microeconomic theory or complex applied methods. Rather than deter her, the steep learning curve excites her.
Risk preferences shape research choices, success rates, and, ultimately, $𝑌_i$ —the outcome. And of course, the error term $\epsilon_i$ soaks up everything else: psychological states, academic culture, mentorship quality, personal circumstances, and many other unobservables.

Role of Institutions:

Since Acemoglu, Gallego, & Robinson (2014) [4], the role of institutions in return to education has gained ample attention. The interaction between individual characteristics and the role of institutions is critical, and its importance increases with the level of education. However, here the scope of definition is not confined to the HEIs. For instance, in Indian higher education, the number of private HEIs is increasing. Further, it could be because of and resulting (bi-directional causality) enrollment in higher education. Most likely, though, GER in Ph.D. is going up, and the number of public HEIs is stagnant. Moreover, given the nature of inflation, decreasing government interest in sponsoring evidence-based research has resulted in a flood of private Ph.D. enrollments. Though tainted numbers are still lower than government institutions, the corner is coming close. Thus, even though it is more important to discuss the role of institutions in Ph.D. enrollment and output. Let's modify the Mincerian equation in (3) and adapt $𝐼^𝑐_i$ from (4)  in the modified equation below: 

$$ \ln Y_i = \alpha + \beta \cdot \text{PhD}_i + \gamma \cdot \text{Exp}_i + \delta \cdot \text{Exp}_i^2 + \boldsymbol{\theta}^\top \mathbf{X}_i + \phi \cdot \text{Ic}_i + \epsilon_i \tag{4} $$

It is evident that introducing $𝐼^𝑐_i$ into the model improves the overall fit and reduces the risk of misspecification. In particular, it helps mitigate endogeneity concerns stemming from omitted variable bias—especially due to the lack of a robust indicator for an individual’s productivity potential.

However, in empirical studies, it is notoriously difficult to find a single variable that adequately represents institutional factors such as bottlenecks, facilitators, or systemic hurdles. This difficulty is especially pronounced in the Indian context, given the country’s vast scale and the intricate interplay of religion, caste, and government systems.

The political landscape of Indian institutions, particularly in the sphere of higher education, has been notably adaptive. On one hand, it has welcomed the entry of private Higher Education Institutions (HEIs); on the other, it has largely left the sector to market forces. Beyond this adaptiveness, there is also a visible reluctance on the part of the government to formulate targeted policies. This is evident from the fact that no substantial policy interventions have been introduced in recent years, making the National Education Policy, 2020 (NEP 2020), the default anchor for most discussions on the subject.

While macro-level indices—such as the Rule of Law Index used in [4]—are sometimes employed to proxy institutional context, their application alongside micro-level data comes with inherent limitations. The mismatch in granularity between macro-level indicators and individual-level outcomes poses a significant challenge in empirical modelling.

Last Words:

The discussion doesn't end here. 

References:

1. Mincer, J. (1958). Investment in human capital and personal income distributionJournal of Political Economy66(4), 281-302.

2. Arrow, K. J. (1973). Higher education as a filterJournal of Public Economics2(3), 193-216.

3. Von Neumann, J., & Morgenstern, O. (1947). Theory of games and economic behaviour, 2nd rev.

4. Acemoglu, D., Gallego, F. A., & Robinson, J. A. (2014). Institutions, human capital, and development. Annu. Rev. Econ., 6(1), 875-912.

Comments

Popular Posts

Publish Yet Perish?

Increasingly, the notion that seems to backfire is “publish or perish.” Publishing research papers in all areas of generations of knowledge (i.e., disciplines) seems to have updated its course. Lately, however, what seems to dominate the business of the generation of knowledge is the attribute of expansion. Expansion in volume and numbers; still doubtful of the quality, though. What seems to change its course is yet an obvious question that research organizations, universities, and other educational institutions increasingly face is the question of “Why fund research?”. Increasingly, private players' intervention seems to impact the existing research landscape worldwide. Looking at just the discipline of economics, the below graph from Aigner, Greenspon, & Rodrik (2021) demonstrates the irrational yet exponential growth in several journals and published research papers over the last four decades.  While growth in numbers is evident, a question of perspective remains. As follows...

Problem with (not) understanding the role of aiming high

       In my novice attempts to understand higher education choices during my PhD the last few years, I have always wondered what factors matter the most. Of course, there are models of the orthodox or the newer types. I have mostly relied on the orthodox ones that take a general pool of variables and then predict the probabilities. I am using " orthodox " to indicate the pool of models that are accepted but are old and honestly have less empirical relevance. The fact that these are old doesn't reduce their logical validity; it is just that they are too general to learn from and imply something without relying on stronger assumptions. These are mainly the discrete choice family models that have existed for decades. They give us statements like, " Having the per-capita income increased by X amount, the Pr(Choosing Enrollment in Higher Education) increases (or decreases in case of the circumstances) by Y%". They are more of confirmation (not useless research ...