Economics of doing a PhD: An Informal Introduction (Part 1)

I often ask, “What’s expected of a PhD?” The literature suggests there are both private and social returns to pursuing one. But we must tread carefully: prestige, for instance, shouldn’t be conflated with a social return—it’s as private as the higher earnings a PhD might bring.

Digging deeper, it’s clear that PhDs are inherently self-paced and heterogeneous. This is one level of education where each observation (in an imaginary dataset) is so unique that it could almost justify individual dummy variables in a regression, though that’s plainly impractical.

Still, if we try to model the PhD journey, we might focus on two broad dimensions: the rhetorical returns (e.g., skills, signaling) and the costs (time, opportunity costs, mental toll). But here’s the catch: superimposing assumptions from lower levels of education often distorts findings, especially when analysing student costs.

In this blog, I’ll offer an informal take on the Economics of a PhD, with a focus on India’s unique academic landscape.

Potential Private Returns to Ph.D.:

The private returns to a PhD constitute a multiverse of complexity. When we narrow our focus to India, this complexity doesn't just persist - it intensifies. Consider a simple regression approach:

$\ln Y_i = \alpha + \beta \cdot \text{PhD}_i + \epsilon_i \quad \text{(1)}$

This raises an important question: Why isn't this specification a standard Mincer equation? [1] Technically, it could be. However, PhD education introduces unique econometric challenges that violate the Mincer framework's assumptions. To see why, let's expand Equation (1) into a full Mincerian specification:

$\ln Y_i = \alpha + \beta \cdot \text{PhD}_i + \gamma \cdot \text{Exp}_i + \delta \cdot \text{Exp}_i^2 + \mathbf{\theta} \cdot \mathbf{X}_i + \epsilon_i \quad \text{(2)}$

It is understood that Xᵢ is the vector of all other covariates that are controlled for, and Yᵢ is the nominal return to a PhD. The previously mentioned issues are as followsthough we continue to use the Mincerian setup for the rest part of the blog:

(i) Theoretically, it can be considered both a proposition and a fact that a PhD is training rather than just a degree. In the literature on the economics of education, framing a level of education as training shifts its theoretical and econometric treatment. Moreover, institutional frameworks—particularly in the corporate sector—sometimes consider a PhD as experience (i.e., training), yet paradoxically exclude Teaching Assistantships (TA, henceforth) from that classification.

Further, the well-celebrated (and equally criticised) Mincerian equation for private returns to higher education includes necessary linear and quadratic experience terms. Given the preceding discussion, disentangling "experience" from PhD training without violating collinearity assumptions becomes theoretically untenable.

(ii) Two major drawbacks of the traditional Mincerian equation persist here:

Endogeneity: Enrollment in higher education (and thus PhD programs) and future wage streams suffer from endogenous selection. Indian datasets compound this issue by lacking crucial ability variables—there is no nationally representative test score to serve as an appropriate proxy. As Arrow (1973) [2] argues, available ability indicators (e.g., test scores) measure intellectual ability, not productive ability, and the two are not interchangeable.

In practice, accounting for all endogeneity sources in estimating private returns to education is impossible—a problem exacerbated for PhDs.

Selection bias: This is self-evident in this context and requires no elaboration.

Thus, beyond standard econometric challenges, the structural uniqueness and heterogeneity of PhD programs render the Mincerian equation unusable without strict sample restrictions.

Individual-Level Heterogeneity:

At first glance, this heterogeneity could be modelled through completion timelines (i.e., the longer the educational journey, the greater the individual variation). Yet, this isn’t the kind of heterogeneity that truly matters. The more meaningful variation lies in how PhD candidates invoke heterogeneity—as a defence mechanism—when faced with the dreaded question: “When will you submit?”

Is that a valid excuse? No.

Is it commonly used? Absolutely.

Unlike undergraduate studies, where interruptions such as marriage, financial constraints, or health issues predictably delay graduation, PhD timelines diverge wildly, even among candidates who appear to face no observable hurdles. This calls for a dedicated regression framework, with completion time

$𝑇_𝑖$ as the dependent variable:

$T_i = \alpha_i + \beta_i \cdot \text{Ic}_i + \gamma_i \cdot \text{In}_i + \delta_i \cdot \mathbf{P}_i + \epsilon_i \quad \text{(3)}$

Let’s break this down.

The intercept $𝛼_𝑖$ represents the existing research aptitude of student $𝑖$. This can be shaped by several factors: institutional background, curriculum exposure, income levels, personal motivation, and the point in time when the individual discovered a genuine interest in research. Implicit here is an assumption of monotonicity over time—that is, all else equal, more time equals more progress. But in the Indian context, the slope of this equation can vary dramatically.

Even within a single discipline like Economics, there are multiple layers of inequality, curriculum-level inequality being a major one. Consider the gap between (a) how much a Master’s curriculum potentially equips a student for research, and (b) how much preparation is actually needed to engage in doctoral-level work. In India, this gap varies widely across institutions and individuals, forming what we might describe as an uneven Euclidean space. That gap—the “distance”—influences the intercept $𝛼_𝑖$ in our equation.

Next, $I^c_i$ and $I^n_i$ represent institutional and individual characteristics, respectively. The vector $𝑃_i$ is particularly crucial, as it captures the unique elements of PhD-level research practices.

Now, research practices $𝑃$ are highly heterogeneous, not only across disciplines but within them. In Economics, for instance, research may involve primary or secondary data work, and for others, it may lean heavily on theory, mathematics, or qualitative interpretation. The boundaries are often grey, and hybrid approaches are common. The rise of AI-assisted and technology-driven research has only made these lines blurrier.

This complexity deepens when $𝑃$ interacts with individual characteristics $I^c$. To understand this interaction, let’s borrow from Neumann & Morgenstern’s (1947) expected utility theory. Assuming perfect knowledge of $𝑃$, we can classify students by their risk preferences. For simplicity, we’ll ignore the risk-neutral category, as it implies no strong preference.

Risk-Averse: A risk-averse student tends to avoid stepping outside their comfort zone. Suppose they are given a choice between a challenging research direction that could yield high future returns $(𝑌_𝑖)$ but involves significant methodological uncertainty—and a familiar path with more modest returns. The risk-averse student will likely choose the latter.
Risk-Taking: A risk-taking student, by contrast, is more likely to deviate from their prior training. For example, a student whose Master’s education was modestly applied in nature might actively pursue a PhD in microeconomic theory or complex applied methods. Rather than deter her, the steep learning curve excites her.

Risk preferences shape research choices, success rates, and, ultimately, $𝑌_i$ —the outcome. And of course, the error term $\epsilon_i$ soaks up everything else: psychological states, academic culture, mentorship quality, personal circumstances, and many other unobservables.

Role of Institutions:

Since Acemoglu, Gallego, & Robinson (2014) [4], the role of institutions in return to education has gained ample attention. The interaction between individual characteristics and the role of institutions is critical, and its importance increases with the level of education. However, here the scope of definition is not confined to the HEIs. For instance, in Indian higher education, the number of private HEIs is increasing. Further, it could be because of and resulting (bi-directional causality) enrollment in higher education. Most likely, though, GER in Ph.D. is going up, and the number of public HEIs is stagnant. Moreover, given the nature of inflation, decreasing government interest in sponsoring evidence-based research has resulted in a flood of private Ph.D. enrollments. Though tainted numbers are still lower than government institutions, the corner is coming close. Thus, even though it is more important to discuss the role of institutions in Ph.D. enrollment and output. Let's modify the Mincerian equation in (3) and adapt $𝐼^𝑐_i$ from (4) in the modified equation below:

$$ \ln Y_i = \alpha + \beta \cdot \text{PhD}_i + \gamma \cdot \text{Exp}_i + \delta \cdot \text{Exp}_i^2 + \boldsymbol{\theta}^\top \mathbf{X}_i + \phi \cdot \text{Ic}_i + \epsilon_i \tag{4} $$

It is evident that introducing $𝐼^𝑐_i$ into the model improves the overall fit and reduces the risk of misspecification. In particular, it helps mitigate endogeneity concerns stemming from omitted variable bias—especially due to the lack of a robust indicator for an individual’s productivity potential.

However, in empirical studies, it is notoriously difficult to find a single variable that adequately represents institutional factors such as bottlenecks, facilitators, or systemic hurdles. This difficulty is especially pronounced in the Indian context, given the country’s vast scale and the intricate interplay of religion, caste, and government systems.

The political landscape of Indian institutions, particularly in the sphere of higher education, has been notably adaptive. On one hand, it has welcomed the entry of private Higher Education Institutions (HEIs); on the other, it has largely left the sector to market forces. Beyond this adaptiveness, there is also a visible reluctance on the part of the government to formulate targeted policies. This is evident from the fact that no substantial policy interventions have been introduced in recent years, making the National Education Policy, 2020 (NEP 2020), the default anchor for most discussions on the subject.

While macro-level indices—such as the Rule of Law Index used in [4]—are sometimes employed to proxy institutional context, their application alongside micro-level data comes with inherent limitations. The mismatch in granularity between macro-level indicators and individual-level outcomes poses a significant challenge in empirical modelling.

Last Words:

The discussion doesn't end here.

References:

1. Mincer, J. (1958). Investment in human capital and personal income distribution. Journal of Political Economy, 66(4), 281-302.

2. Arrow, K. J. (1973). Higher education as a filter. Journal of Public Economics, 2(3), 193-216.

3. Von Neumann, J., & Morgenstern, O. (1947). Theory of games and economic behaviour, 2nd rev.

4. Acemoglu, D., Gallego, F. A., & Robinson, J. A. (2014). Institutions, human capital, and development. Annu. Rev. Econ., 6(1), 875-912.

Questions

Search This Blog