Factor investing and asset pricing anomalies - Machine Learning for Factor Investing

Asset pricing anomalies are the foundations of factor investing. In this chapter our aim is twofold:

first, to present simple ideas and concepts: basic factor models and common empirical facts (time-varying nature of returns and risk premia);
second, provide the reader with lists of articles that go much deeper to stimulate and satisfy curiosity. Indeed, the literature on anomalies encompasses thousands of research articles and it is impossible to do it justice in one single chapter.

The purpose of this chapter is not to provide a full treatment of the many topics related to factor investing. Rather, it is intended to give a broad overview and cover the essential themes so that the reader is guided towards the relevant references. As such, it can serve as a short, non-exhaustive, review of the literature. The subject of factor modelling in finance is incredibly vast and the number of papers dedicated to it is substantial and still rapidly increasing, even as of 2026.

The universe of peer-reviewed financial journals can be split in two:

The first kind journals are the academic journals. Their articles are mostly written by professors, and the audience consists mostly of scholars. The articles are long and often technical. Prominent examples are the Journal of Finance, the Review of Financial Studies and the Journal of Financial Economics.
The second type is more practitioner-orientated. The papers are shorter, easier to read, and target finance professionals predominantly. Two emblematic examples are the Journal of Portfolio Management and the Financial Analysts Journal.

This chapter reviews and mentions articles published mostly in the first family of journals.

Beyond academic articles, several monographs are already dedicated to the topic of style allocation (a synonym of factor investing used for instance in theoretical articles Barberis & Shleifer (2003) or practitioner papers Asness et al. (2015)). To cite but a few, we mention:

Ilmanen (2011): an exhaustive excursion into risk premia, across many asset classes, with a large spectrum of descriptive statistics (across factors and periods),
Ang (2014): covers factor investing with a strong focus on the money management industry,
Bali et al. (2016): very complete book on the cross-section of signals with statistical analyses (univariate metrics, correlations, persistence, etc.),
Jurczenko (2017): a tour on various topics given by field experts (factor purity, predictability, selection versus weighting, factor timing, etc.).
Zhang et al. (2024): a more recent treatment of various topics in the fields, including, e.g., quantamental views and behavioral finance.

Finally, we can mention a few wide-scope and survey papers on this topic: Goyal (2012), Cazalet & Roncalli (2014), Baz et al. (2015), Giglio et al. (2022) and Shi (2026).

3.1Introduction: why factors?¶

The topic of factor investing, though a decades-old academic theme, has gained traction concurrently with the rise of equity traded funds (ETFs) as vectors of investment (see Figure 3.1 below). Both have gathered momentum in the 2010 decade. Not so surprisingly, the feedback loop between practical financial engineering and academic research has stimulated both sides in a mutually beneficial manner. Practitioners rely on key scholarly findings (e.g., on asset pricing anomalies) while researchers dig deeper into pragmatic topics (e.g., factor exposure or transaction costs). Recently, researchers have also tried to quantify and qualify the impact of factor indices on financial markets. For instance, Krkoska & Schenk-Hoppé (2019) analyze herding behaviors while Cong & Xu (2021) show that the introduction of composite securities increases volatility and cross-asset correlations. DeMiguel et al. (2025) find that the increase competition in the factor space may increase profits due to the netting of trades across factors - but this is far from guaranteed, too.

Figure 3.1:Evolution of the ETF market (assets under management).

The core aim of factor models is to understand the drivers of asset prices and their changes. Broadly speaking, the rationale behind factor investing is that the financial performance of firms depends on factors, whether they be latent and unobservable, or related to intrinsic characteristics (like accounting ratios for instance). Indeed, as Cochrane (2011) frames it, the first essential question is which characteristics really provide independent information about average returns? Answering this question helps understand the cross-section of returns and may open the door to their prediction.

Theoretically, linear factor models can be viewed as special cases of the arbitrage pricing theory (APT) of Ross (1976), which assumes that the return of an asset $n$ can be modelled as a linear combination of underlying factors $f_k$ :

r_{t,n}= \alpha_n+\sum_{k=1}^K\beta_{n,k}f_{t,k}+\epsilon_{t,n},

(3.1)

where the usual econometric constraints on linear models hold: $\mathbb{E}[\epsilon_{t,n}]=0$ , $\text{cov}(\epsilon_{t,n},\epsilon_{t,m})=0$ for $n\neq m$ and $\text{cov}(\textbf{f}_n,\boldsymbol{\epsilon}_n)=0$ . If such factors do exist, then they are in contradiction with the cornerstone model in asset pricing: the capital asset pricing model (CAPM) of Sharpe (1964), Lintner (1965) and Mossin (1966). Indeed, according to the CAPM, the only driver of returns should be the market portfolio. This explains why factors are also called ‘anomalies’.

One robust theoretical foundation for anomalies stems from the recent demand-based pricing literature initiated by Koijen & Yogo (2019). Therein, the authors show that if moments of returns are linked to firms characteristics, then optimal portfolios, too, are naturally tied to these characteristics. Hence, upon market clearing, returns should also be driven by firm attributes. In short, the model assumes:

\text{demand}(\text{price},\text{other characteristics}) = \text{supply}(\text{orthogonal motives}) ,

(3.2)

and hence price changes are linked to the characteristics. Now, as Coqueret (2022) shows, if preferences are logarithmic, as in Koijen & Yogo (2019), then it is possible to derive that returns depend on these characteristics, as well as on relative changes in the characteristics (this is sometimes referred to as characteristic momentum). If demand is linear in firm attributes, then returns, too, become linear in these attributes (and their changes).

In the end, we obtain a (predictive) panel model of the form:

r_{t+1,n}=g(\textbf{c}_{t,n}, \textbf{c}_{t-1,n}) + e_{t+1, n}.

(3.3)

An interesting question pertains to the relative importance of the error term, $e_{t+1,n}$ . This is sometimes referred to as the latent demand, i.e., the portion of demand (or, in the above equation, of returns) that cannot be explained by the characteristics. Koijen & Yogo (2019) find that it is in fact sizeable, accounting for at least of 80% of variation in demands. A growing body of work has since refined and challenged this framework; we dedicate a full section below (Section 3.5) to recent theoretical foundations, identification issues, and extensions to richer preference structures.

Empirical evidence of asset pricing anomalies has accumulated since the dual publication of Fama & French (1992) and Fama & French (1993). This seminal work has paved the way for a blossoming stream of literature that has its meta-studies (e.g., Green et al. (2013), Harvey et al. (2016) and McLean & Pontiff (2016)). The regression in Equation 3.1 can be evaluated once (unconditionally) or sequentially over different time frames. In the latter case, the parameters (coefficient estimates) change and the models are thus called conditional (we refer to Ang & Kristensen (2012) and to Cooper & Maio (2019) for recent results on this topic as well as for a detailed review on the related research). Conditional models are more flexible because they acknowledge that the drivers of asset prices may not be constant, which seems like a reasonable postulate.

3.2The factor zoo¶

3.2.1The canonical factors¶

The construction of so-called factors follows the same lines as above. Portfolios are based on one characteristic and the factor is a long-short ensemble of one extreme portfolio minus the opposite extreme (small minus large for the size factor or high book-to-market ratio minus low book-to-market ratio for the value factor). Sometimes, subtleties include forming bivariate sorts and aggregating several portfolios together, as in the original contribution of Fama & French (1993). The most common factors are listed below, along with a few references. We refer to the books listed at the beginning of the chapter for a more exhaustive treatment of factor idiosyncrasies. For most anomalies, theoretical justifications have been brought forward, whether risk-based or behavioral. We list the most frequently cited factors below:

Size (SMB = small firms minus large firms): Banz (1981), Fama & French (1992), Fama & French (1993), Van Dijk (2011), Asness et al. (2018) and Astakhov et al. (2019).
Value (HML = high minus low: undervalued minus growth firms): Fama & French (1992), Fama & French (1993), Asness & Frazzini (2013).
Momentum (WML = winners minus losers): Jegadeesh & Titman (1993), Carhart (1997) and Asness & Frazzini (2013). The winners are the assets that have experienced the highest returns over the last year (sometimes the computation of the return is truncated to omit the last month). Cross-sectional momentum is linked, but not equivalent, to time series momentum (trend following), see e.g., Moskowitz et al. (2012) and Lempérière et al. (2014). Momentum is also related to contrarian movements that occur both at higher and lower frequencies (short-term and long-term reversals), see Luo et al. (2021).
Profitability (RMW = robust minus weak profits): Fama & French (2015), Bouchaud et al. (2019). In the former reference, profitability is measured as (revenues - (cost and expenses))/equity.
Investment (CMA = conservative minus aggressive): Fama & French (2015), Hou et al. (2015). Investment is measured via the growth of total assets (divided by total assets). Aggressive firms are those that experience the largest growth in assets.
Low risk (sometimes, BAB = betting against beta): Ang et al. (2006), Baker et al. (2011), Frazzini & Pedersen (2014), Boloorforoosh et al. (2020), Baker et al. (2020) and Asness et al. (2020). In this case, the computation of risk changes from one article to the other (simple volatility, market beta, idiosyncratic volatility, etc.).
Green (GMB = green minus brown): Pástor et al. (2021), Lioui & Tarelli (2022), see also Coqueret (2022) for an early review on sustainable equity investing. This subject, the greenium, has become so vast that it is impossible to review thoroughly.

With the notable exception of the low risk premium and the greenium, the most mainstream anomalies are kept and updated in the data library of Kenneth French. Of course, the computation of the factors follows a particular set of rules, but they are generally accepted in the academic sphere. Another source of data is the AQR repository. Finally, one recent and very popular repository is that of the Open Source Asset Pricing project that regularly updates the work of Chen & Zimmermann (2022).

In the dataset we use for the book, we proxy the value anomaly not with the book-to-market ratio but with the price-to-book ratio (the book value is located in the denominator). As is shown in Asness & Frazzini (2013), the choice (and timing) of the variable for value can have sizable effects.

We first import the librairies used throughout the chapter.

# Required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import statsmodels.api as sm
from myst_nb import glue
from statsmodels.graphics.tsaplots import plot_acf

# Display settings
pd.set_option('display.max_columns', 10)
plt.style.use('seaborn-v0_8-whitegrid')

# Building the data
from data_build import generate_data
from data_build import get_significance
from data_build import compute_size_portfolios
data_ml, features, features_short, returns, stock_ids, stock_ids_short = generate_data()

Below, we import data from Ken French’s data library. We will use it later on in the chapter.

min_date = pd.to_datetime('1963-07-01')
max_date = pd.to_datetime('2024-03-01')
ff_url = "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp"
ff_url += "/F-F_Research_Data_5_Factors_2x3_CSV.zip"
# Create the download url
df_ff = pd.read_csv(ff_url, sep=',', skiprows=3, quotechar='"')
df_ff.rename(columns = {'Unnamed: 0':'date'},
             inplace = True) # renaming for clarity
df_ff.rename(columns = {'Mkt-RF':'MKT_RF'},
             inplace = True) # renaming for clarity
df_ff = df_ff.iloc[0:743, :]    
df_ff['date'] = pd.to_datetime(df_ff['date'], format='%Y%m')  
df_ff["date"] = df_ff["date"] + pd.DateOffset(months=1) - pd.Timedelta(days=1)       
df_ff[['MKT_RF','SMB','HML','RMW','CMA','RF']] = df_ff[['MKT_RF','SMB','HML','RMW','CMA','RF']].apply(pd.to_numeric)
df_ff[['MKT_RF','SMB','HML','RMW','CMA','RF']]=df_ff[
    ['MKT_RF','SMB','HML','RMW','CMA','RF']].values/100.0 # Scale returns
idx_ff=df_ff.index[(df_ff['date']>=min_date)&(
    df_ff['date']<=max_date)].tolist()
FF_factors=df_ff.iloc[idx_ff].copy()
FF_factors.loc[:,'year']=FF_factors.date.astype(str).str[:4]
FF_factors.iloc[1:6,0:7].head()

Figure 3.2:Snapshot of Fama-French factor returns. Source: Ken French library.

3.2.2Theoretical foundations¶

Posterior to the discovery of these stylized facts, some contributions have aimed at building theoretical models that capture these properties. Theorizing a posteriori is sometimes called HARKing (see Kerr (1998) and Hollenbeck & Wright (2017)). We cite a handful below:

size and value: Berk et al. (1999), Daniel et al. (2001), Barberis & Shleifer (2003), Gomes et al. (2003), Carlson et al. (2004), Arnott et al. (2014);
momentum: Johnson (2002), Grinblatt & Han (2005), Vayanos & Woolley (2013), Choi & Kim (2014).

In addition, bridges have also been built between risk-based factor representations and behavioural theories. We refer essentially to Barberis et al. (2016) and Daniel et al. (2020) and the references therein.

The individual attributes of investors who allocate towards particular factors is a blossoming topic. We list a few references below, even though they somewhat lie out of the scope of this book. Betermier et al. (2017) show that value investors are older, wealthier and face lower income risk compared to growth investors who are those in the best position to take financial risks. The study Cronqvist et al. (2015) leads to different conclusions: it finds that the propensity to invest in value versus growth assets has roots in genetics and in life events (the latter effect being confirmed in Cocco et al. (2021), and the former being further detailed in a more general context in Cronqvist et al. (2015)). Psychological traits can also explain some factors: when agents extrapolate, they are likely to fuel momentum (this topic is thoroughly reviewed in Barberis (2018)). Micro- and macro-economic consequences of these preferences are detailed in Bhamra & Uppal (2019). To conclude this paragraph, we mention that theoretical models have also been proposed that link agents’ preferences and beliefs (via prospect theory) to market anomalies (see for instance Du et al. (2020)).

3.2.3Time-varying premia and anomaly dynamics¶

While these factors (i.e., long-short portfolios) exhibit time-varying risk premia and are magnified by corporate news and announcements (Engelberg et al. (2018)), it is well-documented (and accepted) that they deliver positive returns over long horizons. We refer to Gagliardini et al. (2016) and to the survey Gagliardini et al. (2019), as well as to the related bibliography for technical details on estimation procedures of risk premia and the corresponding empirical results. A large sample study that documents regime changes in factor premia was also carried out by Ilmanen et al. (2021). Moreover, the predictability of returns is also time-varying (as documented in Farmer et al. (2023), Tsiakas et al. (2020) and Liu et al. (2020)), and estimation methods can be improved (Johnson (2019)).

In Figure 3.3, we plot the average monthly return aggregated over each calendar year for five common factors and the risk-free rate. The risk free rate (which is not a factor per se) is the logically most stable, while the market factor (aggregate market returns minus the risk-free rate) is the most volatile. This makes sense because it is the only long equity factor among the five series, hence the short leg cannot hedge the long one.

<Figure size 1200x500 with 1 Axes> — Figure 3.3:Average returns of common anomalies plus the risk-free rate (1963-2024). Source: Ken French library.

Anomalies also differ in their dynamic nature. Binsbergen et al. (2023) classify anomalies into two types: build-up anomalies (e.g., momentum, profitability) that exacerbate mispricing over time, and resolution anomalies (e.g., value, size, investment) that correct it. This taxonomy has practical implications: build-up anomalies are driven by overreaction and may contribute to real capital misallocation, while resolution anomalies are self-correcting.

3.2.4Fading anomalies and replicability¶

Some researchers document fading anomalies because of publication: once the anomaly becomes public, agents invest in it, which pushes prices up and the anomaly disappears. McLean & Pontiff (2016) document this effect in the US but Jacobs & Müller (2020) find that all other countries experience sustained post-publication factor returns. With a different methodology, Chen & Zimmermann (2020) introduce a publication bias adjustment for returns and the authors note that this (negative) adjustment is in fact rather small. Penasse (2022) recommends the notion of alpha decay to study the persistence or attenuation of anomalies.

The destruction of factor premia may be due to herding (Krkoska & Schenk-Hoppé (2019), Volpati et al. (2020)) and could be accelerated by the democratization of so-called smart-beta products (ETFs notably) that allow investors to directly invest in particular styles (value, low volatility, etc.). For a theoretical perspective on the attractivity of factor investing, we refer to Jin (2022). On the other hand, DeMiguel et al. (2025) argue that the price impact of crowding in the smart-beta universe is mitigated by trading diversification stemming from external institutions that trade according to strategies outside this space (e.g., high frequency traders betting via order-book algorithms).

Finally, we highlight the need of replicability of factor premia and echo the recent editorial by Harvey et al. (2020). As is shown by Linnainmaa & Roberts (2018) and Hou et al. (2020), many proclaimed factors are in fact very much data-dependent and often fail to deliver sustained profitability when the investment universe is altered or when the definition of variable changes (Asness & Frazzini (2013)). Nevertheless, Chen & Zimmermann (2020) and Chen & Zimmermann (2022) challenge the idea of low replicability, even if out-of-sample evaluation and transaction costs strongly curtail performance (Chen & Velikov (2023)). All in all, as Chen et al. (2025) argue, whether published anomalies fare better than pure data mining remains an open question.

A methodological contribution that sheds light on which characteristics truly matter is provided by Seyfi (2025), who flips the standard approach: instead of sorting stocks by characteristics and measuring returns, the author sorts by realized returns and then identifies which characteristics best replicate those return-sorted portfolios out-of-sample. This “reverse engineering” approach theoretically comes closest to the unknown optimal sorting on SDF exposures and empirically finds that price-based characteristics and their interactions are the strongest predictors — outperforming prominent ML methods.

Campbell Harvey and his co-authors, in a series of papers, tried to synthesize the research on factors in Harvey et al. (2016), Harvey & Liu (2017) and Harvey & Liu (2017). This work underlines the need to set high bars for an anomaly to be called a ‘true’ factor. Increasing thresholds for $p$ -values is only a partial answer (critiqued in Chen (2025)), as it is always possible to resort to data snooping in order to find an optimized strategy that will fail out-of-sample but that will deliver a $t$ -statistic larger than three (or even four). Harvey (2017) recommends to resort to a Bayesian approach which blends data-based significance with a prior into a so-called Bayesianized p-value (see subsection below).

Following this work, researchers have continued to explore the richness of this zoo. Bryzgalova (2019) propose a tractable Bayesian estimation of large-dimensional factor models and evaluate all possible combinations of more than 50 factors, yielding an incredibly large number of coefficients. This combined with a Fama & MacBeth (1973) procedure allows to distinguish between pervasive and superfluous factors. Chordia et al. (2020) use simulations of 2 million trading strategies to estimate the rate of false discoveries, that is, when a spurious factor is detected (type I error). They also advise to use thresholds for t-statistics that are well above three. In a similar vein, Harvey & Liu (2020) also underline that sometimes true anomalies may be missed because of a one time $t$ -statistic that is too low (type II error).

The propensity of journals to publish positive results has led researchers to estimate the difference between reported returns and true returns. Chen (2021) call this difference the publication bias and estimate it as roughly 12%. That is, if a published average return is 8%, the actual value may in fact be closer to (1-12%)*8%=7%. Qualitatively, this estimation of 12% is smaller than the out-of-sample reduction in returns found in McLean & Pontiff (2016).

3.3Detecting and testing anomalies¶

3.3.1Challenges¶

Obviously, a crucial step is to be able to identify an anomaly and the complexity of this task should not be underestimated. Given the publication bias towards positive results (see, e.g., Harvey (2017) in financial economics), researchers are often tempted to report partial results that are sometimes invalidated by further studies. The need for replication is therefore high and many findings have no tomorrow (Linnainmaa & Roberts (2018), Johannesson et al. (2023)), especially if transaction costs are taken into account (Patton & Weller (2020), Chen & Velikov (2023)). Nevertheless, as is demonstrated by Chen (2021), $p$ -hacking alone cannot account for all the anomalies documented in the literature. One way to reduce the risk of spurious detection is to increase the hurdles (often, the $t$ -statistics) but the debate is still ongoing (Harvey et al. (2016), Chen (2025)), or to resort to multiple testing (Harvey et al. (2020), Vincent et al. (2020)).

The remainder of this subsection was inspired from Baker et al. (2017) and Harvey & Liu (2017).

3.3.2Simple portfolio sorts¶

This is the most common procedure and the one used in Fama & French (1992). The idea is simple. On one date,

rank firms according to a particular criterion (e.g., size, book-to-market ratio);
form $J\ge 2$ portfolios (i.e., homogeneous groups) consisting of the same number of stocks according to the ranking (usually, $J=2$ , $J=3$ , $J=5$ or $J=10$ portfolios are built, based on the median, terciles, quintiles or deciles of the criterion);
the weight of stocks inside the portfolio is either uniform (equal weights), or proportional to market capitalization;
at a future date (usually one month), report the returns of the portfolios. Then, iterate the procedure until the chronological end of the sample is reached.

The outcome is a time series of portfolio returns $r_t^j$ for each grouping $j$ . An anomaly is identified if the $t$ -test between the first ( $j=1$ ) and the last group ( $j=J$ ) unveils a significant difference in average returns. More robust tests are described in Cattaneo et al. (2020). A strong limitation of this approach is that the sorting criterion could have a non-monotonic impact on returns and a test based on the two extreme portfolios would not detect it. Several articles address this concern: Patton & Timmermann (2010) and Romano & Wolf (2013) for instance. Another concern is that these sorted portfolios may capture not only the priced risk associated with the characteristic but also some unpriced risk. Daniel et al. (2020) show that it is possible to disentangle the two and make the most of altered sorted portfolios. We recall a lesser known fact: average returns from sorted long-short portfolios can be viewed as estimates from linear regressions without intercepts (see page 326 of Fama (1976) and the Appendix of Freyberger et al. (2020)).

Instead of focusing on only one criterion, it is possible to group assets according to more characteristics. The original paper Fama & French (1992) also combines market capitalization with book-to-market ratios. Each characteristic is divided into 10 buckets, which makes 100 portfolios in total. Beyond data availability, there is no upper bound on the number of features that can be included in the sorting process. In fact, some authors investigate more complex sorting algorithms that can manage a potentially large number of characteristics (see e.g., Feng et al. (2020) and Bryzgalova et al. (2023)).

Finally, we refer to Ledoit et al. (2020) for refinements that take into account the covariance structure of asset returns and to Cattaneo et al. (2020) for a theoretical study on the statistical properties of the sorting procedure (including theoretical links with regression-based approaches). Notably, the latter paper discusses the optimal number of portfolios and suggests that it is probably larger than the usual 10 often used in the literature.

More recently, Hoechle et al. (2025) propose a Generalized Portfolio Sorts (GPS) framework that nests all conventional sort variants but can additionally separate a characteristic’s genuine predictive power from persistent firm-level heterogeneity (e.g., permanent differences across industries). Applying GPS to a large set of predictors, nearly half of the anomalies documented in the literature lose their statistical significance once this heterogeneity is accounted for — a sobering finding for factor-based strategies.

In a different direction, Liu & others (2026) argue that the standard practice of sorting stocks by point predictions discards valuable information about prediction uncertainty. They propose uncertainty-adjusted sorting, in which long positions are ranked by optimistic (upper-bound) predictions and short positions by pessimistic (lower-bound) ones. This approach systematically improves Sharpe ratios relative to standard sorts, with the gains concentrated in stocks for which prediction uncertainty is highest.

In the code and Figure 3.4 below, we compute size portfolios (equally weighted: above versus below the median capitalization). According to the size anomaly, the firms with below median market cap should earn higher returns on average. This is verified whenever the orange bar in the plot is above the blue one (it happens most of the time).

Next, we compute and plot the annual returns of the size anomaly (small-minus-big).

<Figure size 1000x500 with 1 Axes> — Figure 3.4:**The size factor**: average returns of smaller versus larger firms.

3.3.3Predictive regressions, sorts, and p-value issues¶

For simplicity, we assume a simple form for stock returns (written in compact form):

\textbf{r} = a+b\textbf{x}+\textbf{e},

(3.4)

where the vector $\textbf{r}$ stacks all returns of all stocks and $\textbf{x}$ is a lagged variable (e.g., a factor) so that the regression is indeed predictive. If the estimated $\hat{b}$ is significant given a specified confidence threshold, then it can be tempting to conclude that $\textbf{x}$ does a good job at predicting returns. Hence, long-short portfolios related to extreme values of $\textbf{x}$ (mind the sign of $\hat{b}$ ) are expected to generate profits. This is unfortunately often false because $\hat{b}$ gives information on the past ability of $\textbf{x}$ to forecast returns. What happens in the future may be another story. In fact, as we discuss later, the problem is that it is mostly correlation that can be captured from financial data, whereas one would ideally prefer to infer causality (see, e.g., Prado & Zoonekynd (2026)).

Statistical tests are also used for portfolio sorts. Assume two extreme portfolios are expected to yield very different average returns (like very small cap versus very large cap, or strong winners versus bad losers). The portfolio returns are written $r_t^+$ and $r_t^-$ . The simplest test for the mean is $t=\sqrt{T}\frac{m_{r_+}-m_{r_-}}{\sigma_{r_+-r_-}}$ , where $T$ is the number of points and $m_{r_\pm}$ denotes the means of returns and $\sigma_{r_+-r_-}$ is the standard deviation of the difference between the two series, i.e., the volatility of the long-short portfolio. In short, the statistic can be viewed as a scaled Sharpe ratio (though usually these ratios are computed for long-only portfolios) and can in turn be used to compute $p$ -values to assess the robustness of an anomaly. As is shown in Linnainmaa & Roberts (2018) and Hou et al. (2020), many factors discovered by reasearchers fail to survive in out-of-sample tests.

One reason why people are overly optimistic about anomalies they detect is the widespread reverse interpretation of the p-value. Often, it is thought of as the probability of one hypothesis (e.g., my anomaly exists) given the data. In fact, it’s the opposite; it’s the likelihood of your data sample, knowing that the anomaly holds.

\begin{align*} p-\text{value} &= P[D|H] \\ \text{target prob.}& = P[H|D]=\frac{P[D|H]}{P[D]}\times P[H], \end{align*}

(3.5)

where $H$ stands for hypothesis and $D$ for data. The equality in the second row is a plain application of Bayes’ identity: the interesting probability is in fact a transform of the $p$ -value.

Two articles (at least) discuss this idea. Harvey (2017) introduces Bayesianized $p$ -values:

:label: bayes_pvalue \text{Bayesianized } p-\text{value}=\text{Bpv}= e^{-t^2/2}\times\frac{\text{prior}}{1+e^{-t^2/2}\times \text{prior}} ,

(3.6)

where $t$ is the $t$ -statistic obtained from the regression (i.e., the one that defines the p-value) and prior is the analyst’s estimation of the odds that the hypothesis (anomaly) is true. The prior is coded as follows. Suppose there is a p% chance that the null holds (i.e., (1-p)% for the anomaly). The odds are coded as $p/(1-p)$ . Thus, if the t-statistic is equal to 2 (corresponding to a p-value of 5% roughly) and the prior odds are equal to 6, then the Bpv is equal to $e^{-2}\times 6 \times(1+e^{-2}\times 6)^{-1}\approx 0.448$ and there is a 44.8% chance that the null is true. This interpretation stands in sharp contrast with the original $p$ -value which cannot be viewed as a probability that the null holds. Of course, one drawback is that the level of the prior is crucial and solely user-specified.

The work of Chinco et al. (2021) is very different but shares some key concepts, like the introduction of Bayesian priors in regression outputs. They show that coercing the predictive regression with an $L^2$ constraint (see the ridge regression in Chapter 5) amounts to introducing views on what the true distribution of $b$ is. The stronger the constraint, the more the estimate $\hat{b}$ will be shrunk towards zero. One key idea in their work is the assumption of a distribution for the true $b$ across many anomalies. It is assumed to be Gaussian and centered. The interesting parameter is the standard deviation: the larger it is, the more frequently significant anomalies are discovered. Notably, the authors show that this parameter changes through time and we refer to the original paper for more details on this subject.

3.3.4Fama-MacBeth regressions¶

Another approach (one of the first in fact) was proposed by Fama & MacBeth (1973) through a two-stage regression analysis of risk premia. Take the average of Equation 3.1:

\mathbb{E}[\textbf{r}] = \alpha+\beta\mathbb{\textbf{f}},

(3.7)

where the $\beta$ are the loadings and $\lambda = \mathbb{\textbf{x}}$ are the prices of risk, i.e., the risk premia. \index{risk premia} The first stage of the procedure is a simple estimation of the initial relationship: the regressions 3.1 are run on a stock-by-stock basis over the corresponding time series. The resulting estimates $\hat{\beta}_{i,k}$ are then plugged into a second series of regressions:

\begin{align*} r_{t,n}= \gamma_{t,0} + \sum_{k=1}^K\gamma_{t,k}\hat{\beta}_{n,k} + \varepsilon_{t,n}, \end{align*}

(3.8)

which are run date-by-date on the cross-section of assets. Theoretically, the betas would be known and the regression would be run on the $\beta_{n,k}$ instead of their estimated values. The $\hat{\gamma}_{t,k}$ estimate the premia of factor $k$ at time $t$ . Under suitable distributional assumptions on the $\varepsilon_{t,n}$ , statistical tests can be performed to determine whether these premia are significant or not. Typically, the statistic on the time-aggregated (average) premia $\hat{\gamma}_k=\frac{1}{T}\sum_{t=1}^T\hat{\gamma}_{t,k}$ :

\begin{align*} t_k=\frac{\hat{\gamma}_k}{\hat{\sigma_k}/\sqrt{T}} \end{align*}

(3.9)

is often used (under Gaussian assumptions) to assess whether or not the factor is significant ( $\hat{\sigma}_k$ is the standard deviation of the $\hat{\gamma}_{t,k}$ ). In short:

Step 1 (first pass) estimates what risks assets are exposed to;
Step 2 (second pass) estimates how the market prices those risks, period by period, and then averages.

We refer to Jagannathan & Wang (1998) and Petersen (2009) for technical discussions on the biases and losses in accuracy that can be induced by standard ordinary least squares (OLS) estimations. Moreover, as the $\hat\beta_{i,k}$ in the second-pass regression are estimates, a second level of errors can arise (the so-called errors in variables). The interested reader will find some extensions and solutions in Shanken (1992), Ang et al. (2019) and Jegadeesh et al. (2019). There even exists a three-pass method (see Giglio & Xiu (2021)).

A subtler issue concerns the strength of the factors used in the first pass. Pesaran & Smith (2022) show that the standard two-pass estimator’s properties depend critically on the degree of factor strength — a continuous quantity, not the binary strong/weak dichotomy usually assumed. Empirically, among the five Fama-French factors, only the market factor can be classified as genuinely strong; the others are semi-strong at best, which affects the reliability of the estimated risk premia.

Below, we perform Fama & MacBeth (1973) regressions on our sample. We first build a dedicated dataset.

# Select a subset of stocks for the Fama-MacBeth analysis
stock_ids_short = data_ml['fsym_id'].unique()[:30]

# Prepare data: merge returns with factors
returns_subset = data_ml[data_ml['fsym_id'].isin(stock_ids_short)][['date', 'fsym_id', 'R1M']]
returns_pivot = returns_subset.pivot(index='date', columns='fsym_id', values='R1M')

# Merge with factors
data_FM = returns_pivot.join(FF_factors.set_index('date')[['MKT_RF', 'SMB', 'HML', 'RMW', 'CMA']])

# Select a subset of stocks for the Fama-MacBeth analysis
stock_ids_short = data_ml['fsym_id'].unique()[:30]

# Prepare data: merge returns with factors
returns_subset = data_ml[data_ml['fsym_id'].isin(stock_ids_short)][['date', 'fsym_id', 'R1M']]
returns_pivot = returns_subset.pivot(index='date', columns='fsym_id', values='R1M')

# Merge with factors
data_FM = returns_pivot.join(FF_factors.set_index('date')[['MKT_RF', 'SMB', 'HML', 'RMW', 'CMA']])
data_FM = data_FM.dropna()

print(f"Data shape for FM regressions: {data_FM.shape}")

Data shape for FM regressions: (226, 35)

Then we move onto the first pass: individual estimation of betas. To do so, we loop across stocks and provide a snapshot of estimates.

# First pass: estimate betas for each stock
factor_cols = ['MKT_RF', 'SMB', 'HML', 'RMW', 'CMA']
betas = {}
for fsym_id in stock_ids_short:
    if fsym_id in data_FM.columns:
        y = data_FM[fsym_id].shift(-1).dropna()  # Lagged returns
        X = data_FM[factor_cols].loc[y.index]
        X = sm.add_constant(X)
        # Drop any remaining NaN
        mask = ~(y.isna() | X.isna().any(axis=1))
        y_clean = y[mask]
        X_clean = X[mask]
        if len(y_clean) > 10:  # Minimum number of observations
            model = sm.OLS(y_clean, X_clean).fit()
            betas[fsym_id] = model.params
# Convert to DataFrame & show
betas_df = pd.DataFrame(betas).T
betas_df.columns = ['Constant'] + factor_cols
betas_df.head(6).round(3)

Figure 3.5:Sample of beta values.

In the table, MKT_RF is the market return minus the risk free rate. The corresponding coefficient is often referred to as the beta, especially in univariate regressions. We then reformat these betas from the table to prepare the second pass. Each line corresponds to one asset: the first 5 columns are the estimated factor loadings and the remaining ones are the asset returns (date by date).

# Second pass: cross-sectional regressions
loadings = betas_df[factor_cols]
returns_for_FM = returns_pivot[loadings.index].T  # Transpose: stocks as rows, dates as columns

# Run cross-sectional regressions for each date
gammas = {}
for date in returns_for_FM.columns:
    y = returns_for_FM[date].dropna()
    X = loadings.loc[y.index]
    X = sm.add_constant(X)
    if len(y) > 5:  # Minimum observations
        model = sm.OLS(y, X).fit()
        gammas[date] = model.params

# Convert to DataFrame
gammas_df = pd.DataFrame(gammas).T
gammas_df.columns = ['Constant'] + factor_cols
gammas_df.head(6).round(3)

Figure 3.6:Sample of gamma (premia) values.

Visually, the estimated premia are also very volatile. We plot in Figure 3.7 their estimated values for the market, SMB and HML factors.

<Figure size 1200x800 with 3 Axes> — Figure 3.7:Time-series of gamma values (premia).

3.3.5Factor competition and redundancy¶

The core purpose of factors is to explain the cross-section of stock returns. For theoretical and practical reasons, it is preferable to avoid redundancies within factors. Indeed, redundancies imply collinearity which is known to perturb estimates (Belsley et al. (2005)). In addition, when asset managers decompose the performance of their returns into factors, overlaps (high absolute correlations) between factors yield exposures that are less interpretable; positive and negative exposures compensate each other spuriously.

A simple protocol to sort out redundant factors is to run regressions of each factor against all others:

f_{t,k} = a_k +\sum_{j\neq k} \delta_{k,j} f_{t,j} + \epsilon_{t,k}.

(3.10)

The interesting metric is then the test statistic associated to the estimation of $a_k$ . If $a_k$ is significantly different from zero (i.e., the statistic has magnitude larger than 2 or 3), then the cross-section of (other) factors fails to explain exhaustively the average return of factor $k$ . Otherwise, the return of the factor can be captured by exposures to the other factors and is thus redundant.

One mainstream application of this technique was performed in Fama & French (2015), in which the authors show that the HML factor is redundant when taking into account four other factors (Market, SMB, RMW and CMA). Below, we reproduce their analysis on an updated sample. We start our analysis directly with the database maintained by Kenneth French.

We can run the regressions that determine the redundancy of factors via the procedure defined in Equation 3.10.

# Factor competition analysis
factors = ['MKT_RF', 'SMB', 'HML', 'RMW', 'CMA']
results = []
for dep_var in factors:
    other_factors = [f for f in factors if f != dep_var]
    y = FF_factors[dep_var]
    X = FF_factors[other_factors]
    X = sm.add_constant(X)
    model = sm.OLS(y, X).fit()
    row = {'Dep. Variable': dep_var}
    coef = model.params['const']
    pval = model.pvalues['const']
    row['Intercept'] = f"{coef:.3f}" + get_significance(pval)
    for factor in factors:
        if factor == dep_var:
            row[factor] = '-'
        else:
            coef = model.params[factor]
            pval = model.pvalues[factor]
            row[factor] = f"{coef:.3f}" + get_significance(pval)
    results.append(row)

We obtain the vector of $α$ values from Equation 3.10. Below, we format these figures along with $p$ -value thresholds and export them in a summary table (Figure 3.8). The significance levels of coefficients is coded as follows: $0<(***)<0.001<(**)<0.01<(*)<0.05$

factor_competition_df = pd.DataFrame(results)
print("Significance: *** p<0.001, ** p<0.01, * p<0.05")
factor_competition_df

Figure 3.8:Factor Competition among the five Fama-French Factors.

Significance: *** p<0.001, ** p<0.01, * p<0.05

We confirm that the HML factor remains redundant when the four others are present in the asset pricing model (its alpha - in the Intercept column - is not significantly different from zero). The figures we obtain are reasonably close to the ones in the original paper (Fama & French (2015)), which makes sense, since we “only” add 15 years to their initial sample.

At a more macro-level, researchers also try to figure out which models (i.e., combinations of factors) are the most likely, given the data empirically observed (and possibly given priors formulated by the econometrician). For instance, this stream of literature seeks to quantify to which extent the 3-factor model of Fama & French (1993) outperforms the 5 factors in Fama & French (2015). In this direction, De Moor et al. (2015) introduce a novel computation for p-values that compare the relative likelihood that two models pass a zero-alpha test. More generally, the Bayesian method of Barillas & Shanken (2018) was subsequently improved by Chib et al. (2020).

Lastly, even the optimal number of factors is a subject of disagreement among conclusions of recent work. While the traditional literature focuses on a limited number (3-5) of factors, more recent research by DeMiguel et al. (2020), He et al. (2021), Kozak et al. (2020) and Freyberger et al. (2020) advocates the need to use at least 15 or more. Kozak et al. (2020) provide perhaps the strongest counter-evidence, arguing that “the quest for a sparse characteristics-based factor model is ultimately futile” since such models fail out-of-sample. A middle ground emerges from Kelly et al. (2019), who find that only 10 characteristics (or latent factors) are significant in their IPCA framework, and Swade et al. (2024), who conclude that approximately 15 factors span the zoo—more than traditional models but far fewer than hundreds. The synthesis suggests that while extreme factor proliferation reflects data mining (see Chen et al. (2025)), traditional 3–6 factor models likely underfit the true complexity; dimension reduction to 10–15 factors or latent components offers the best balance between overfitting and explanatory power. Green et al. (2017) even find that the number of characteristics that help explain the cross-section of returns varies in time.

Complementing these cross-sectional approaches, O'Doherty et al. (2025) take a different angle by estimating risk premia for 190 candidate macroeconomic factors. They find that more than 40 carry significant premia, and that parsimonious two-factor models pairing the market with a single macro factor (especially those tied to national income accounts or housing) frequently outperform leading multifactor models in explaining CAPM anomalies. This suggests that the factor zoo may partly reflect a disconnect between financial and macroeconomic research traditions.

3.3.6Advanced selection techniques¶

The ever increasing number of factors combined to their importance in asset management has led researchers to craft more subtle methods in order to “organize” the so-called factor zoo \index{factor zoo} and, more importantly, to detect spurious anomalies and compare different asset pricing model specifications. We list a few of them below.

Feng et al. (2020) combine LASSO selection with Fama-MacBeth regressions to test if new factor models are worth it. They quantify the gain of adding one new factor to a set of predefined factors and show that many factors reported in papers published in the 2010 decade do not add much incremental value;
Harvey & Liu (2017), in a similar vein, use bootstrap on orthogonalized factors. They make the case that correlations among predictors is a major issue and their method aims at solving this problem. Their lengthy procedure seeks to test if maximal additional contribution of a candidate variable is significant;
Fama & French (2018) compare asset pricing models through squared maximum Sharpe ratios;
Giglio & Xiu (2021) estimate factor risk premia using a three-pass method based on principal component analysis;
Pukthuanthong et al. (2018) disentangle priced and non-priced factors via a combination of principal component analysis and Fama & MacBeth (1973) regressions;
Gospodinov et al. (2019) warn against factor misspecification (when spurious factors are included in the list of regressors). Traded factors (resp., macro-economic factors) seem more likely (resp., less likely) to yield robust identifications (see also Bryzgalova (2019));
Harvey et al. (2026) revisit the multiple hypothesis testing problem in factor model evaluation and reconcile contradictory findings about the replication crisis. Proper inference requires accounting for dependence across tests, correctly specifying the null distribution, and mitigating sample-selection bias; the authors advocate a $t$ -statistic cutoff of at least 3.0 and the use of local False Discovery Rates;
Da et al. (2024) investigate the statistical limits of arbitrage in high dimensions. When alphas are weak and rare, estimation errors prevent even optimally-equipped ML arbitrageurs from fully exploiting pricing errors, creating a significant gap between the feasible and theoretical maximum Sharpe ratios. This result provides a theoretical ceiling on what ML-based strategies can achieve.

There is obviously no infallible method, but the number of contributions in the field highlights the need for robustness. This is evidently a major concern when crafting investment decisions based on factor intuitions. One major hurdle for short-term strategies is the likely time-varying feature of factors. We refer for instance to Ang & Kristensen (2012) and Cooper & Maio (2019) for practical results and to Gagliardini et al. (2016) and S. Ma et al. (2020) for more theoretical treatments (with additional empirical results).

3.4Factors or characteristics?¶

The decomposition of returns into linear factor models is convenient because of its simple interpretation. There is nonetheless a debate in the academic literature about whether firm returns are indeed explained by exposure to macro-economic factors or simply by the characteristics of firms. In their early study, Lakonishok et al. (1994) argue that one explanation of the value premium comes from incorrect extrapolation of past earning growth rates. Investors are overly optimistic about firms subject to recent profitability. Consequently, future returns are (also) driven by the core (accounting) features of the firms. The question is then to disentangle which effect is the most pronounced when explaining returns: characteristics versus exposures to macro-economic factors.

In their seminal contribution on this topic, Daniel & Titman (1997) provide evidence in favour of the former (two follow-up papers are K. Daniel et al. (2001) and Daniel & Titman (2012)). They show that firms with high book-to-market ratios or small capitalizations display higher average returns, even if they are negatively loaded on the HML or SMB factors. Therefore, it seems that it is indeed the intrinsic characteristics that matter, and not the factor exposure. For further material on characteristics’ role in return explanation or prediction, we refer to the following contributions:

Section 2.5.2. in Goyal (2012) surveys pre-2010 results on this topic;
Chordia et al. (2017) find that characteristics explain a larger proportion of variation in estimated expected returns than factor loadings;
Kozak et al. (2018) reconcile factor-based explanations of premia to a theoretical model in which some agents’ demands are sentiment driven;
Han et al. (2022) show with penalized regressions that 20 to 30 characteristics (out of 94) are useful for the prediction of monthly returns of US stocks. Their methodology is interesting: they regress returns against characteristics to build forecasts and then regress the returns on the forecast to assess if they are reliable. The latter regression uses a LASSO-type penalization (see Chapter 5) so that useless characteristics are excluded from the model. The penalization is extended to elasticnet in Rapach & Zhou (2021).
Kelly et al. (2019) and Kim et al. (2021) both estimate models in which factors are latent but loadings (betas) and possibly alphas depend on characteristics. Kirby (2021) generalizes the first approach by introducing regime-switching. In contrast, Lettau & Pelger (2020) and Lettau & Pelger (2020) estimate latent factors without any link to particular characteristics (and provide large sample asymptotic properties of their methods).
In the same vein as Hoechle et al. (2020), Gospodinov et al. (2019) and Bryzgalova (2019) and discuss potential errors that arise when working with portfolio sorts that yield long-short returns. The authors show that in some cases, tests based on this procedure may be deceitful. This happens when the characteristic chosen to perform the sort is correlated with an external (unobservable) factor. They propose a novel regression-based approach aimed at bypassing this problem.
Even Fama & French (2020) conclude: “time-series models that use only cross-section factors provide better descriptions of average returns than time-series models that use time-series factors”.

More recently and in a separate stream of literature, Koijen et al. (2019) have introduced a demand model in which investors form their portfolios according to their preferences towards particular firm characteristics. They show that this allows them to mimic the portfolios of large institutional investors. In their model, aggregate demands (and hence, prices) are directly linked to characteristics, not to factors. In a follow-up paper, Koijen et al. (2019) show that a few sets of characteristics suffice to predict future returns. They also show that, based on institutional holdings from the UK and the US, the largest investors are those who are the most influencial in the formation of prices. In a similar vein, Betermier et al. (2020) derive an elegant (theoretical) general equilibrium model that generates some well-documented anomalies (size, book-to-market). The models of Arnott et al. (2014) and Alti & Titman (2019) are also able to theoretically generate known anomalies. Finally, in Martin & Fu (2019), characteristics influence returns via the role they play in the predictability of dividend growth. This paper discussed the asymptotic case when the number of assets and the number of characteristics are proportional and both increase to infinity.

Several recent contributions push the characteristics-based view further. Ge et al. (2023) propose a semi-parametric model with B-spline sieves and use hierarchical clustering to identify “peer groups” of characteristics that lead to similar arbitrage returns, revealing structure in the mispricing component. In a different vein, Cartea et al. (2025) introduce a self-supervised tabular transformer that compresses over 400 firm characteristics into low-dimensional embeddings capturing both cross-sectional and temporal patterns. These embeddings yield similarity-based strategies that outperform standard benchmarks, suggesting that the relevant information in characteristics may be better captured by learned representations than by hand-picked subsets.

3.5Demand-based asset pricing¶

As introduced at the beginning of this chapter, the demand-system approach of Koijen & Yogo (2019) provides a theoretical foundation linking firm characteristics to equilibrium prices through portfolio demand. This framework has generated substantial follow-up work that we review here.

Theoretical foundations. Fuchs et al. (2025) show that the original framework does not adequately account for cross-asset complementarities in portfolio choice, leading to significantly biased demand elasticity estimates. Measured elasticities can appear close to one even when the true values are near-infinite, which helps reconcile the striking gap between low demand-system estimates and higher theoretical benchmarks from other approaches.

Identification challenges. Binsbergen et al. (2025) establish strict necessary conditions for valid identification of demand elasticities in dynamic markets. Instruments borrowed from static industrial organization models introduce large biases when applied to financial assets. Valid instruments must induce price variation lasting at most one trading period and be unanticipated ex ante — a demanding requirement that calls into question many existing empirical estimates.

Extensions. Gehrig et al. (2024) generalize the framework beyond log-utility to constant absolute and constant relative risk aversion preferences. They also introduce a shrinkage device that pulls portfolio weights toward a predetermined policy portfolio, reducing parameter uncertainty and outperforming both the parametric approach and naive $1/N$ strategies empirically.

These developments show that while the demand-system approach offers a compelling structural link between characteristics and prices, its empirical implementation remains delicate. The interplay between investor heterogeneity, preference specification, and identification strategy is an active and rapidly evolving research frontier.

3.6Factor dynamics: momentum, timing, and ESG¶

3.6.1Factor momentum¶

A recent body of literature unveils a time series momentum property of factor returns. For instance, Gupta & Kelly (2019) report that autocorrelation patterns within these returns is statistically significant. Autocorrelation in aggregate/portfolio returns is a widely documented effect since the seminal paper Lo & MacKinlay (1990) (see also Moskowitz et al. (2012)). Similar results are obtained in Falck et al. (2022). In the same vein, Arnott et al. (2023) make the case that the industry momentum found in Moskowitz & Grinblatt (1999) can in fact be explained by this factor momentum. Going even further, Ehsani & Linnainmaa (2022) conclude that the original momentum factor is in fact the aggregation of the autocorrelation that can be found in all other factors. Recently, the strength of factor momentum is scrutinized by Fan et al. (2022). The authors find that it is only robust for a small number of factors. Lastly, Garcia et al. (2021) document factor momentum at the daily frequency.

Acknowledging the profitability of factor momentum, Yang (2020) seeks to understand its source and decomposes stock factor momentum portfolios into two components: factor timing portfolio and a static portfolio. The former seeks to profit from the serial correlations of factor returns while the latter tries to harness factor premia. The author shows that it is the static portfolio that explains the larger portion of factor momentum returns. In Yang (2020), the same author presents a new estimator to gauge factor momentum predictability. Words of caution are provided in Leippold & Yang (2022).

Given the data obtained on Ken French’s website, we compute the autocorrelation function (ACF) of factors.

\text{ACF}_k(\textbf{x}_t)=\mathbb{E}[(\textbf{x}_t-\bar{\textbf{x}})(\textbf{x}_{t+k}-\bar{\textbf{x}})].

(3.11)

# ACF plots for factor returns
fig, axes = plt.subplots(2, 2, figsize=(12, 8))

factors_acf = ['SMB', 'HML', 'RMW', 'CMA']

for ax, factor in zip(axes.flatten(), factors_acf):
    plot_acf(FF_factors[factor].dropna(), ax=ax, zero=False, lags=10, title=f'ACF - {factor}')
    ax.set_ylim(-0.08, 0.2)

plt.tight_layout()
plt.show()

FIGURE 3.4: Autocorrelograms of common factor portfolios.

Of the four chosen series, only the size factor is not significantly autocorrelated at the first order.

3.6.2Factor timing¶

Given the abundance of evidence of the time-varying nature of factor premia, it is legitimate to wonder if it is possible to predict when factor will perform well or badly. The evidence on the effectiveness of timing is diverse:

positive for Greenwood & Hanson (2012), Hodges et al. (2017), Hasler et al. (2019), Haddad et al. (2020), Lioui & Tarelli (2023), Neuhierl et al. (2023) and Lehnherr et al. (2025);
negative for Asness et al. (2017); and
mixed for Dichtl et al. (2019).

We nevertheless underline that the dominant share of favorable findings may be due to the publishing bias towards positive results. There is no consensus on which predictors to use (general macroeconomic indicators in Hodges et al. (2017), stock issuances versus repurchases in Greenwood & Hanson (2012), and aggregate fundamental data in Dichtl et al. (2019)). In fact, it might be best to consider many variables (Kagkadis et al. (2024)).

A method for building reasonable timing strategies for long-only portfolios with sustainable transaction costs is laid out in Leippold & Rüegg (2022). In ML-based factor investing, it is possible to resort to more granularity by combining firm-specific attributes to macro-economic data as we explain in Section 4.8.2.

On the question of whether any strategy — traditional or ML-based — can reliably time factors, Allen et al. (2026) provide a comprehensive evaluation over 1970–2024. They find that no strategy delivers robust unconditional alpha relative to buy-and-hold, consistent with market efficiency. However, conditioning on an “ecology vector” capturing crowding, funding, liquidity, and volatility reveals systematic state-dependent performance variation, supporting the Adaptive Markets Hypothesis of Lo (2004). This suggests that factor timing may work, but only intermittently and for agents who can identify the right market regime.

3.6.3The green factors¶

The demand for ethical financial products has sharply risen during the 2010 decade, leading to the creation of numerous funds dedicated to socially responsible investing (SRI - see Camilleri (2020)). Though this phenomenon is not really new (Schueth (2003), Hill et al. (2007)), its acceleration has prompted research about whether or not characteristics related to ESG criteria (environment, social, governance) are priced.

Dozens and even possibly hundreds of papers have been devoted to this question, but no consensus has been reached. More and more, researchers study the financial impact of climate change (see Bernstein et al. (2019), Hong et al. (2019) and Hong et al. (2020)) and the societal push for responsible corporate behavior (Fabozzi (2020), Kurtz (2020)). We gather below a very short list of early papers that suggests conflicting results (see Coqueret (2022) for a more detailed account):

favorable: ESG investing works (Kempf & Osthoff (2007), Cheema-Fox et al. (2020)), can work (Nagy et al. (2016), Alessandrini & Jondeau (2020)), or can at least be rendered efficient (Branch & Cai (2012)). A large meta-study reports overwhelming favorable results (Friede et al. (2015)), but of course, they could well stem from the publication bias towards positive results.
unfavorable: Ethical investing is not profitable according to Adler & Kritzman (2008) and Blitz & Swinkels (2020). An ESG factor should be long unethical firms and short ethical ones (Lioui (2022)).
mixed: ESG investing may be beneficial globally but not locally (Chakrabarti & Sen (2020)). Portfolios relying on ESG screening do not significantly outperform those with no screening but are subject to lower levels of volatility (Gibson et al. (2023), Gougler & Utz (2020)). As is often the case, the devil is in the details, and results depend on whether to use E, S or G (Bruder et al. (2019)).

On top of these contradicting results, several articles point towards complexities in the measurement of ESG. Depending on the chosen criteria and on the data provider, results can change drastically, a phenomenon known as ESG confusion or ESG divergence (see Galema et al. (2008), Atta-Darkua et al. (2020), Dimson et al. (2020), Avramov et al. (2022) and Berg et al. (2022)). Two large-scale studies (Beyer & Bauckloh (2024) and alves2025drawing) show that the performance of ESG-related portfolios also varies across time, geographies, sectors, etc. But overall, it is hard to conclude that sustainability is priced. We end this short section by noting that, of course, ESG criteria can directly be integrated into ML model, as is for instance done in Franco et al. (2020), and they can also be complemented by traditional firm characteristics (Coqueret et al. (2022)).

3.7Machine learning meets asset pricing¶

3.7.1The canonical predictive model¶

Given the exponential increase in data availability, the obvious temptation of any asset manager is to try to infer future returns from the abundance of attributes available at the firm level. We allude to classical data like accounting ratios, price-based measures, risk proxies, sustainability metrics, analyst ratings, but also to alternative data, such as sentiment stemming from social media coverage. This task is precisely the aim of this book. Given a large set of predictor variables ( $\mathbf{X}$ ), the goal is to predict a proxy for future performance $\mathbf{y}$ through a model of the form (2.1).

Some attempts toward this direction have already been made through portfolio optimization (e.g., Brandt et al. (2009), Hjalmarsson & Manchev (2012), Ammann et al. (2016), DeMiguel et al. (2020)), but not with any ML intent or focus originally. In retrospect, these approaches do share some links with ML tools. The general formulation is the following. At time
$T$ , the agent or investor seeks to solve the following program:

\begin{align*} \underset{\boldsymbol{\theta}_T}{\max} \ \mathbb{E}_T\left[ u(r_{p,T+1})\right] = \underset{\boldsymbol{\theta}_T}{\max} \ \mathbb{E}_T\left[ u\left(\left(\bar{\textbf{w}}_T+\textbf{x}_T\boldsymbol{\theta}_T\right)'\textbf{r}_{T+1}\right)\right] , \end{align*}

(3.12)

where $u$ is some utility function and $r_{p,T+1}=\left(\bar{\textbf{w}}_T+\textbf{x}_T\boldsymbol{\theta}_T\right)'\textbf{r}_{T+1}$ is the return of the portfolio, which is defined as a benchmark $\bar{\textbf{w}}_T$ plus some deviations from this benchmark that are a linear function of features $\textbf{x}_T\boldsymbol{\theta}_T$ . The above program may be subject to some external constraints (e.g., to limit leverage).

In practice, the vector $\boldsymbol{\theta}_T$ must be estimated using past data (from $T-\tau$ to $T-1$ ): the agent seeks the solution of

\underset{\boldsymbol{\theta}_T}{\text{max}} \ \frac{1}{\tau} \sum_{t=T-\tau}^{T-1} u \left( \sum_{i=1}^{N_T}\left(\bar{w}_{i,t}+ \boldsymbol{\theta}'_T \textbf{x}_{i,t} \right)r_{i,t+1} \right),

(3.13)

on a sample of size $τ$ where $N_T$ is the number of asset in the universe. The above formulation can be viewed as a learning task in which the parameters are chosen such that the reward (expected utility over the portfolio return) is maximized.

3.7.2Further references¶

Independent of a characteristics-based approach, ML applications in finance have blossomed, initially working with price data only and later on integrating firm characteristics as predictors. We cite a few references below, grouped by methodological approach:

penalized quadratic programming: Goto & Xu (2015), Ban, El Ban et al. (2016) and Perrin & Roncalli (2020),
regularized predictive regressions: Rapach et al. (2013) and Alexander Chinco et al. (2019),
support vector machines: Cao & Tay (2003) (and the references therein),
model comparison and/or aggregation: Kim (2003), Huang et al. (2005), Matı́as and Reboredo et al. (2012), Reboredo, Matı́as, and Reboredo et al. (2012), Dunis et al. (2013), Gu et al. (2020) and Guida & Coqueret (2018). The latter two more recent articles work with a large cross-section of characteristics.

We provide more detailed lists for tree-based methods, neural networks and reinforcement learning techniques in Chapters 6, 7 and 19, respectively. Moreover, we refer to Ballings et al. (2015) for a comparison of classifiers and to Henrique et al. (2019) and Bustos & Pomares-Quimbaya (2020) for surveys on ML-based forecasting techniques.

3.7.3Explicit connections with asset pricing models¶

The first and obvious link between factor investing and asset pricing is (average) return prediction. The main canonical academic reference is Gu et al. (2020). Let us first write the general equation (non-linear panel) and then comment on it:

r_{t+1,n}=g(\textbf{x}_{t,n}) + \epsilon_{t+1}.

(3.14)

The interesting discussion lies in the differences between the above model and that of Equation (3.1). The first obvious difference is the introduction of the nonlinear function $g$ : indeed, there is no reason (beyond simplicity and interpretability) why we should restrict the model to linear relationships. One early reference for nonlinearities in asset pricing kernels is Bansal & Viswanathan (1993).

More importantly, the second difference between (3.6) and (3.1) is the shift in the time index. Indeed, from an investor’s perspective, the interest is to be able to predict some information about the structure of the cross-section of assets. Explaining asset returns with synchronous factors is not useful because the realization of factor values is not known in advance. Hence, if one seeks to extract value from the model, there needs to be a time interval between the observation of the state space (which we call $\textbf{x}_{t,n}$ ) and the occurrence of the returns. Once the model $\hat{g}$ is estimated, the time- $t$ (measurable) value $g(\textbf{x}_{t,n})$ will give a forecast for the (average) future returns. These predictions can then serve as signals in the crafting of portfolio weights (see Chapter 12 for more on that topic).

While most studies do work with returns on the l.h.s. of (3.6), there is no reason why other indicators should not be used. Returns are straightforward and simple to compute, but they could very well be replaced by more sophisticated metrics, like the Sharpe ratio, for instance. The firms’ features would then be used to predict a risk-adjusted performance rather than simple returns.

Beyond the explicit form of Equation (3.6), several other ML-related tools can also be used to estimate asset pricing models. This can be achieved in several ways, some of which we list below.

3.7.3.1Stochastic discount factor estimation¶

First, one mainstream problem in asset pricing is to characterize the stochastic discount factor (SDF) $M_t$ , which satisfies $\mathbb{E}_t[M_{t+1}(r_{t+1,n}-r_{t+1,f})]=0$ for any asset $n$ (see Cochrane (2009)). This equation is a natural playing field for the generalized method of moment (Hansen (1982)): $M_t$ must be such that

\mathbb{E}[M_{t+1}R_{t+1,n}g(V_t)]=0,

(3.15)

where the instrumental variables $V_t$ are $\mathcal{F}_t$ -measurable (i.e., are known at time $t$ ) and the capital $R_{t+1,n}$ denotes the excess return of asset $n$ . In order to reduce and simplify the estimation problem, it is customary to define the SDF as a portfolio of assets (see chapter 3 in Back (2010)). In Luyang Chen & Zimmermann (2020), the authors use a generative adversarial network (GAN, see Section 7.7.1) to estimate the weights of the portfolios that are the closest to satisfy (3.7) under a strongly penalizing form.

3.7.3.2Time-varying factor loadings¶

A second approach is to try to model asset returns as linear combinations of factors, just as in (3.1). We write in compact notation

\begin{align*} r_{t,n}=\alpha_n+\boldsymbol{\beta}_{t,n}'\textbf{f}_t+\epsilon_{t,n}, \end{align*}

(3.16)

and we allow the loadings $\boldsymbol{\beta}_{t,n}$ to be time-dependent. The trick is then to introduce the firm characteristics in the above equation. Traditionally, the characteristics are present in the definition of factors (as in the seminal definition of Fama & French (1993)). The decomposition of the return is made according to the exposition of the firm’s return to these factors constructed according to market size, accounting ratios, past performance, etc. Given the exposures, the performance of the stock is attributed to particular style profiles (e.g., small stock, or value stock, etc.).

Habitually, the factors are heuristic portfolios constructed from simple rules like thresholding. For instance, firms below the 1/3 quantile in book-to-market are growth firms and those above the 2/3 quantile are the value firms. A value factor can then be defined by the long-short portfolio of these two sets, with uniform weights. Note that Fama & French (1993) use a more complex approach which also takes market capitalization into account both in the weighting scheme and also in the composition of the portfolios.

One of the advances enabled by machine learning is to automate the construction of the factors. It is for instance the approach of Feng et al. (2023). Instead of building the factors heuristically, the authors optimize the construction to maximize the fit in the cross-section of returns. The optimization is performed via a relatively deep feed-forward neural network and the feature space is lagged so that the relationship is indeed predictive, as in Equation (3.6). Theoretically, the resulting factors help explain a substantially larger proportion of the in-sample variance in the returns. The prediction ability of the model depends on how well it generalizes out-of-sample.

3.7.3.3Characteristics-dependent betas¶

A third approach is that of Kelly et al. (2019) (though the statistical treatment is not machine learning per se). Their idea is the opposite: factors are latent (unobserved) and it is the betas (loadings) that depend on the characteristics. This allows many degrees of freedom because in $r_{t,n}=\alpha_n+(\boldsymbol{\beta}_{t,n}(\textbf{x}_{t-1,n}))'\textbf{f}_t+\epsilon_{t,n},$ , only the characteristics $\textbf{x}_{t-1,n}$ are known and both the factors $\textbf{f}_t$ and the functional forms $\boldsymbol{\beta}_{t,n}(\cdot)$ must be estimated. In their article, Kelly et al. (2019) work with a linear form, which is naturally more tractable.

3.7.3.4Autoencoder-based factor models¶

Lastly, a fourth approach (introduced in Gu et al. (2020)) goes even further and combines two neural network architectures. The first neural network takes characteristics $\textbf{x}_{t-1}$ as inputs and generates factor loadings $\boldsymbol{\beta}_{t-1}(\textbf{x}_{t-1})$ . The second network transforms returns $\textbf{r}_t$ into factor values $\textbf{f}_t(\textbf{r}_t)$ (in Feng et al. (2023)). The aggregate model can then be written:

\textbf{r}_t=\boldsymbol{\beta}_{t-1}(\textbf{x}_{t-1})'\textbf{f}_t(\textbf{r}_t)+\boldsymbol{\epsilon}_t.

(3.17)

The above specification is quite special because the output (on the l.h.s.) is also present as input (in the r.h.s.). In machine learning, autoencoders (see Section 7.6.2) share the same property. Their aim, just like in principal component analysis, is to find a parsimonious nonlinear representation form for a dataset (in this case, returns). In Equation (3.8), the input is $\textbf{r}_t$ and the output function is $\boldsymbol{\beta}_{t-1}(\textbf{x}_{t-1})'\textbf{f}_t(\textbf{r}_t)$ . The aim is to minimize the difference between the two just as is any regression-like model.

Autoencoders are neural networks which have outputs as close as possible to the inputs with an objective of dimensional reduction. The innovation in Gu et al. (2020) is that the pure autoencoder part is merged with a vanilla perceptron used to model the loadings. The structure of the neural network is summarized below.

$\left. \begin{array}{rl} \text{returns } (\textbf{r}_t) & \overset{NN_1}{\longrightarrow} \quad \text{ factors } (\textbf{f}_t=NN_1(\textbf{r}_t)) \\ \text{characteristics } (\textbf{x}_{t-1}) & \overset{NN_2}{\longrightarrow} \quad \text{ loadings } (\boldsymbol{\beta}_{t-1}=NN_2(\textbf{x}_{t-1})) \end{array} \right\} \longrightarrow \text{ returns } (r_t)$

A simple autoencoder would consist of only the first line of the model. This specification is discussed in more details in Section 7.6.2.

3.7.4Recent advances and critical perspectives¶

The models described above assume, implicitly or explicitly, that patterns in historical data will persist. Several recent contributions extend or challenge this view.

Epstein et al. (2025) propose the Attention Factor Model, which jointly learns conditional latent factors from firm characteristic embeddings and an arbitrage trading policy in a single end-to-end framework. By avoiding the two-step separation of factor estimation and portfolio construction, the model achieves out-of-sample Sharpe ratios above 4 (2.3 net of trading costs) on U.S. equities, substantially outperforming sequential approaches.

On the other hand, Allen et al. (2025) argue that ML approaches in finance face a fundamental misalignment: ML models assume stationarity, while financial markets are evolutionary systems where profitable opportunities are transient ecological niches. The authors review recent architectures (patch-based transformers, non-stationary transformers, neural hidden Markov models) and find that none fully navigates market evolution due to reflexivity — the very act of trading on a discovered pattern changes the pattern.

A related theoretical result is provided by Da et al. (2024) (see also the advanced techniques subsection above): even with optimal ML techniques, estimation errors in high-dimensional settings prevent full exploitation of weak and rare alphas, creating an irreducible gap between achievable and theoretical Sharpe ratios.

Finally, Coulombe et al. (2023) develop a Shapley-based methodology to decompose ML portfolio performance into the contributions of individual predictors or predictor groups. This “opening the black box” approach reveals which characteristics actually drive the economic value of ML predictions — an essential diagnostic for practitioners seeking to understand whether their models capture genuine economic signals or statistical artifacts.

As a conclusion of this chapter, it appears undeniable that the intersection between the two fields of asset pricing and machine learning offers a rich variety of applications. The literature is already exhaustive and it is often hard to disentangle the noise from the great ideas in the continuous flow of publications on these topics. Practice and implementation is the only way forward to extricate value from hype. This is especially true because agents often tend to overestimate the role of factors in the allocation decision process of real-world investors (see Alex Chinco et al. (2019) and Castaneda & Sabat (2019)). For methodological extensions on uncertainty quantification for ML predictions and for a deeper treatment of return predictability, we refer to the dedicated chapters later in the book.

3.8Exercises¶

Compute annual returns of the growth versus value portfolios, that is, the average return of firms with above median price-to-book ratio (the variable is called `Pb’ in the dataset).
Same exercise, but compute the monthly returns and plot the value (through time) of the corresponding portfolios.
Instead of a unique threshold, compute simply sorted portfolios based on quartiles of market capitalization. Compute their annual returns and plot them.
Download the Fama-French 5-factor data and replicate the factor competition analysis of Section 3.2. Does HML remain redundant if you restrict the sample to post-2000 data?
Using the GPS framework of Hoechle et al. (2025), discuss conceptually how firm-level heterogeneity could affect the size anomaly computed earlier in this chapter.

References¶

Barberis, N., & Shleifer, A. (2003). Style investing. Journal of Financial Economics, 68(2), 161–199.
Asness, C., Ilmanen, A., Israel, R., & Moskowitz, T. (2015). Investing with style. Journal of Investment Management, 13(1), 27–63.
Ilmanen, A. (2011). Expected returns: An investor’s guide to harvesting market rewards. John Wiley & Sons.
Ang, A. (2014). Asset management: A systematic approach to factor investing. Oxford University Press.
Bali, T. G., Engle, R. F., & Murray, S. (2016). Empirical asset pricing: the cross section of stock returns. John Wiley & Sons.
Jurczenko, E. (2017). Factor Investing: From Traditional to Alternative Risk Premia. Elsevier.
Zhang, M., Lu, T., & Shi, C. (2024). Navigating the Factor Zoo: The Science of Quantitative Investing. Routledge.
Goyal, A. (2012). Empirical cross-sectional asset pricing: a survey. Financial Markets And Portfolio Management, 26(1), 3–38.
Cazalet, Z., & Roncalli, T. (2014). Facts and fantasies about factor investing. SSRN Working Paper, 2524547.
Baz, J., Granger, N., Harvey, C. R., Le Roux, N., & Rattray, S. (2015). Dissecting investment strategies in the cross section and time series. SSRN Working Paper, 2695101.
Giglio, S., Kelly, B., & Xiu, D. (2022). Factor models, machine learning, and asset pricing. Annual Review of Financial Economics, 14(1), 337–368.
Shi, C. (2026). From econometrics to machine learning: Transforming empirical asset pricing. Journal of Economic Surveys, 40(1), 528–548.
Krkoska, E., & Schenk-Hoppé, K. R. (2019). Herding in Smart-Beta Investment Products. Journal Of Risk And Financial Management, 12(1), 47.
Cong, L. W., & Xu, D. (2021). Rise of factor investing: asset prices, informational efficiency, and security design. Review of Financial Studies, 34(7), 3046–3112.
DeMiguel, V., Martin-Utrera, A., & Uppal, R. (2025). Can competition increase profits in factor investing? Management Science, 71(7), 5552–5571.