Pair Trading Across 10 Industries

A statistical pair-trading strategy applied to ten US equity industries. We screen 386 candidate pairs by correlation, Engle–Granger cointegration, and ADF on the spread, then backtest a Z-score-driven entry/exit rule with stop-loss and timeout over three years of Interactive Brokers data.

Backtest window

2023–2026

Stocks fetched

92 / 96

Pairs screened

386

Pairs selected

Trades

Sharpe

1.45

Total return

+3.92%

Max drawdown

−0.85%

01. Strategy Overview

This project implements a statistical pair-trading strategy across 10 GICS-style US equity industries. The strategy looks for two stocks in the same industry whose prices have moved together historically, identifies them with formal statistical tests, and then trades the spread between them whenever it deviates far from its historical mean.

The trading hypothesis is simple: when two economically similar stocks share a stable long-run relationship (cointegration), short-term dislocations in their relative price tend to revert. We enter when the spread is unusually wide, exit when it reverts toward the mean, and stop out if the relationship appears to break.

Pipeline

The end-to-end workflow proceeds in seven stages:

Define ten industries and assemble a candidate stock pool of roughly ten stocks each, for a total of 96 tickers.
Pull three years of historical daily bars for each ticker from Interactive Brokers.
For every pair within an industry, compute four statistics: daily-return correlation, Engle-Granger cointegration p-value on log prices, OLS hedge ratio, and an ADF p-value on the constructed spread.
Rank the pairs and select the best one per industry, subject to the statistical thresholds.
Run the Z-score backtest (entry at ±2, success exit at |Z| < 0.5, stop-loss at |Z| > 3, timeout at 20 trading days).
Produce the blotter (trades.csv) and the daily ledger (ledger.csv).
Compute the performance metrics that drive the dashboards in the later sections.

Backtest Window

Start: 2023-05-02
End: 2026-04-30
Bars per stock: 752 daily bars
Data source: Interactive Brokers (TWS API via shinybroker)

High-Level Result

Out of 386 candidate pairs across 10 industries, 3 pairs met the strict selection thresholds (correlation ≥ 0.50, cointegration p ≤ 0.10, ADF p ≤ 0.10):

Industry	Selected Pair
Banks	PNC / TFC
Energy	COP / OXY
Semiconductors	MU / LRCX

The remaining 7 industries did not contain any pair that satisfied all three statistical thresholds during this backtest window.

Aggregate backtest performance (3 selected pairs, equal-notional):

68 round-trip trades
Sharpe ratio: 1.45
Total return: 3.92%
Max drawdown: -0.85%
Success rate: 45.6%, stop-loss rate: 38.2%, timeout rate: 16.2%

Aggregate equity curve and drawdown for the three selected pairs over the 2023–2026 backtest window

02. Market Phenomenon

What we are trying to capture

This strategy attempts to profit from temporary relative mispricing between two economically similar stocks in the same industry. If two stocks have historically moved together and the spread between their prices is statistically stationary, then large deviations from the historical average represent temporary dislocations rather than a permanent change in fundamentals. The strategy enters when the spread becomes unusually wide and exits when the spread reverts toward its mean.

Why this happens

Stocks within the same industry are exposed to the same macro factors: sector-wide demand shocks, regulation, commodity inputs, the broader market. When investors react to firm-specific news (an earnings surprise, an analyst upgrade, an executive change), the relative prices of two otherwise similar companies can drift apart even though their long-run cash flows remain coupled. That short-term divergence is what creates the trading opportunity.

What the strategy does NOT assume

It does not assume the two stocks have the same price level.
It does not assume the two stocks have the same volatility.
It does not assume markets are inefficient on average.
It only assumes that, conditional on having passed the cointegration test, the spread has a tendency to revert during the backtest horizon.

Risk: when the assumption fails

A pair can stop being cointegrated. M&A activity, a major change in business mix, a regulatory shock that hits one firm but not the other, or simply a structural break in the relationship can make the spread non-stationary. When that happens, the strategy will keep entering trades that no longer revert, which is what the stop-loss and the live monitoring rules in section 16 are designed to detect.

03. Industry Universe & Pair Selection

3.1 Why these 10 industries

The 10 industries are chosen to cover a broad cross-section of the US equity market based on the GICS sector framework (S&P/MSCI). Within each industry the candidate pool is filled with large-cap, liquid names that share business model and demand drivers, which is the precondition that makes a stable long-run spread plausible.

3.2 Candidate stock universe (96 tickers)

Industry	Candidate stocks
Beverages	KO, PEP, MNST, KDP, STZ, TAP, CELH, FIZZ
Payments	V, MA, AXP, PYPL, FIS, FI, GPN, DFS
Energy	XOM, CVX, COP, EOG, SLB, OXY, PSX, MPC, VLO, HAL
Semiconductors	NVDA, AMD, INTC, AVGO, QCOM, TXN, MU, AMAT, LRCX, KLAC
Airlines	DAL, UAL, AAL, LUV, ALK, JBLU, CPA, RYAAY, SAVE, HA
Banks	JPM, BAC, C, WFC, GS, MS, USB, PNC, TFC, BK
Retail	WMT, TGT, COST, DG, DLTR, KR, HD, LOW, TJX, ROST
Autos	TSLA, F, GM, RIVN, LCID, TM, HMC, STLA, NIO, XPEV
Health Care	JNJ, PFE, MRK, ABBV, LLY, BMY, GILD, AMGN, REGN, BIIB
Communication	GOOGL, META, NFLX, DIS, CMCSA, T, VZ, TMUS, SNAP, PINS

Of these 96 candidates, 92 returned valid 3-year daily bars from IBKR. 4 tickers were excluded because the contract did not resolve on TWS during the fetch window (FI, DFS, SAVE, HA).

3.3 Selection process

For every distinct pair within each industry, we compute four statistics:

Daily return correlation
Engle-Granger cointegration p-value (on log prices)
OLS hedge ratio (slope of log(A) ~ log(B))
Augmented Dickey-Fuller p-value on the spread

We then rank candidates within each industry using a composite score and pick the top pair. A pair is selected for trading only if it passes all three statistical thresholds simultaneously: correlation of at least 0.50, cointegration p-value of at most 0.10, and ADF p-value of at most 0.10 on the constructed spread.

If a top-ranked pair fails the threshold, it is rejected and that industry contributes no pair to the live strategy. This is intentional: we would rather trade fewer pairs with strong statistical support than force a pair from every industry.

04. Statistical Tests

Three statistical tests are run on every candidate pair. The tests are deliberately layered so that a passing pair shows evidence of (a) a contemporaneous relationship, (b) a stable long-run relationship, and (c) mean-reverting short-term deviations.

4.1 Daily return correlation (quick screen)

Pairs trading requires both stocks to respond to similar shocks; otherwise the spread is just two unrelated random walks. A simple Pearson correlation on daily returns is fast and gives us a first cut. We require a correlation of at least 0.50.

Correlation alone is not sufficient: two stocks can be highly correlated in returns while having a non-stationary spread (a permanently widening or narrowing relationship). That is why the next two tests matter more.

4.2 Engle-Granger cointegration test

We run statsmodels.tsa.stattools.coint on the log prices of each pair. The null hypothesis is "no cointegration." A small p-value indicates that the two log-price series share a stable long-run relationship - the residual from their linear combination is stationary.

Selection rule: cointegration p-value of at most 0.10.

The plan ideally wants p < 0.05. We loosen to 0.10 because 0.05 leaves too few qualifying pairs in this 3-year window; 0.10 gives a slightly noisier but still defensible signal.

4.3 ADF test on the spread

After computing the spread (section 5), we run an Augmented Dickey-Fuller test on the spread itself. The null is "spread has a unit root" (non-stationary, random walk). A small p-value means the spread is stationary - exactly the condition the strategy needs.

Selection rule: ADF p-value of at most 0.10.

In practice, ADF and cointegration tend to agree. The combined test acts as a cross-check.

4.4 Combined ranking score

For pairs that pass all three filters, we compute a composite score score = z(correlation) − z(cointegration_p) − z(ADF_p), where z(·) standardizes each statistic across all candidate pairs. Higher correlation pushes the score up, lower p-values push the score up. The top- scoring pair within each industry is the one we report.

4.5 Why we do not use only correlation

A pair like V / MA has a daily-return correlation of 0.85 - extremely high - but its cointegration p-value is 0.24, so it does not pass the cointegration test. The two stocks move together day-to-day, but their long-run spread is not stationary in the test window. Trading the spread of such a pair would be exposed to permanent drift, which is exactly what the cointegration test is designed to filter out.

Scatter of 386 candidate pairs by daily-return correlation vs. Engle–Granger cointegration p-value (log scale), with the three selected pairs marked in green and V/MA highlighted as the cautionary high-correlation/failed-cointegration example

05. Hedge Ratio & Spread

5.1 Hedge ratio via OLS on log prices

For each pair (A, B), the hedge ratio is the slope of an ordinary least squares regression of log(price_A) on log(price_B), i.e. log(price_A) = α + β · log(price_B) + ε.

β is the hedge ratio: how many units of stock B are needed to hedge one unit of stock A in the constructed spread.

We use log prices, not raw prices, for two reasons:

Log returns are roughly symmetric and approximately constant in magnitude, so the regression is less sensitive to absolute price levels (KO at $80 vs. PEP at $145 in this sample).
Cointegration in log space corresponds to a stable price ratio, which is the economically meaningful relationship for two similar firms.

5.2 Spread definition

The spread on day t is defined as spread_t = log(price_A_t) − hedge_ratio · log(price_B_t).

If A and B are cointegrated, this spread is approximately stationary. The backtest then standardizes it via a rolling Z-score (section 6).

5.3 Hedge ratios estimated for the 3 selected pairs

Industry	Pair	Hedge Ratio (β)	Interpretation
Banks	PNC / TFC	1.121	1 unit of log(PNC) hedged by 1.12 units of log(TFC)
Energy	COP / OXY	0.634	1 unit of log(COP) hedged by 0.63 units of log(OXY)
Semiconductors	MU / LRCX	1.343	1 unit of log(MU) hedged by 1.34 units of log(LRCX)

5.4 Hedge ratio in the trading book

When sizing positions, the hedge ratio determines the relative dollar exposure on the two legs. We commit $10,000 of notional to stock A, which means shares_A = 10000 / price_A. The opposite leg uses $10,000 scaled by |beta|, so shares_B = (10000 * |beta|) / price_B.

This keeps the spread position approximately neutral with respect to the common factor that drives both stocks. Both legs round to the nearest whole share; any small remaining residual exposure is recorded in the ledger and absorbed into the cash account.

Three-panel chart for the PNC/TFC pair: log prices of both legs, the constructed spread with its 60-day rolling mean, and the rolling Z-score with horizontal threshold lines at ±0.5, ±2, and ±3

06. Entry Rules

6.1 Z-score signal

Every trading day, we compute a rolling Z-score of the spread. Let μ_t be the rolling mean of the spread over the last 60 trading days, and σ_t be the rolling standard deviation over the same window. The Z-score is then Z_t = (spread_t − μ_t) / σ_t.

6.2 Why a 60-day rolling window

A 60-day window is approximately three months of trading data. It is long enough to give a stable estimate of the spread's local mean and standard deviation, but short enough to adapt when market conditions change. A shorter window (e.g. 20 days) was rejected because it produces a noisy Z-score that generates whipsaw trades; a longer window (e.g. 252 days) was rejected because it reacts too slowly to regime shifts.

6.3 Entry rule

When the Z-score exceeds +2.0, the spread is unusually wide and we enter a short spread position: short stock A and go long stock B with a size determined by the hedge ratio. When the Z-score falls below −2.0, the spread is unusually narrow and we enter a long spread position: long stock A and short stock B. Between −2.0 and +2.0 we take no new position.

Only one position per pair is open at a time. If the strategy already has an open position on a pair, no new entry is taken for that pair until the existing trade is closed.

6.4 Why ±2 standard deviations

The ±2 threshold filters out normal daily noise: under a roughly normal distribution, only about 5% of daily Z-scores would be expected to exceed ±2, so this is a meaningful deviation rather than an ordinary fluctuation. Trying ±1 would generate too many marginal trades; trying ±3 would generate almost none. The 2-sigma threshold is the standard choice in academic and practitioner references.

6.5 Sizing

Entries are sized to $10,000 of notional on each leg, scaled by the hedge ratio. In other words, shares_A = floor(10000 / price_A) and shares_B = floor(10000 * |hedge_ratio| / price_B).

This produces a roughly dollar-neutral spread position. With a starting cash of $100,000 and three concurrent pairs, peak exposure is bounded.

07. Exit Rules

A trade can exit through one of four channels. Each is checked once per day in this order: success, stop-loss, timeout, end-of-data.

7.1 Success exit (mean reversion)

When the absolute Z-score drops below 0.5, the position is closed and tagged with the fate success.

We exit when the spread mostly reverts toward its mean. We use 0.5 instead of exactly 0 because waiting for the spread to return to exactly zero often means giving back already-realized profit. 0.5 captures most of the reversion while leaving a small buffer.

7.2 Stop-loss exit

When the absolute Z-score exceeds 3.0, the position is closed and tagged with the fate stop_loss.

A stop-loss is triggered when the spread has continued moving against the trade beyond 3 standard deviations. At this point the historical spread distribution no longer supports the trade, and the more likely explanation is that the underlying relationship has shifted.

7.3 Timeout exit

If a position has been open for 20 trading days without hitting either of the two exits above, it is closed and tagged with the fate timeout.

Pair trading is a short- to medium-term mean-reversion strategy. If the spread has not reverted within roughly one trading month, it is more likely to be trending than reverting, and the original statistical edge has expired. Closing the position frees capital for a fresher signal.

7.4 End-of-data close

If a trade is still open on the last day of the backtest, it is closed at that day's price and tagged with the fate end_close. This is a bookkeeping rule for the backtest, not a live trading rule — in production the position would simply remain open into the next session.

7.5 Realized fate distribution (3 selected pairs, full backtest)

Fate	Trades	Share
success	31	45.6%
stop_loss	26	38.2%
timeout	11	16.2%
end_close	0	0.0%
Total	68	100%

Roughly half of trades exit cleanly via mean reversion; about 38% are stopped out. Stop-loss frequency is something we monitor closely - a sustained rise in that rate is one of the early warning signals described in section 16.

Annotated PNC/TFC Z-score over a representative 90-day window, with entry circles and exit markers shaped and colored by fate (success, stop-loss, timeout)

08. Trade Fates

Every closed trade in trades.csv is tagged with exactly one of four fates. This makes downstream attribution and live monitoring straightforward.

Fate	Trigger	Interpretation
`success`	abs(Z) drops below 0.5	Spread reverted as predicted - the strategy worked.
`stop_loss`	abs(Z) exceeds 3.0	Spread continued in the wrong direction - relationship may have broken.
`timeout`	Position held 20 trading days without success or stop-loss	Spread drifted sideways - signal expired without resolving.
`end_close`	Backtest ended while position was still open	Bookkeeping. Not a real fate in live trading.

Realized counts (3 selected pairs, 2023-05 to 2026-04)

Industry	Pair	n_trades	success	stop_loss	timeout
Banks	PNC / TFC	20	10	6	4
Energy	COP / OXY	22	11	6	5
Semiconductors	MU / LRCX	26	10	14	2
Total		68	31	26	11

MU / LRCX shows the highest stop-loss share (54%) - the spread had a few strong directional moves during the semiconductor cycle that were not absorbed by the rolling Z-score window. Its overall contribution to portfolio P&L is still positive thanks to the favorable success-trade payoff distribution, but this pair is the one most likely to be re-screened first under the live monitoring rules in section 16.

See outputs/trades.csv for the full per-trade record.

Why we report fate distribution explicitly

Two strategies with the same Sharpe ratio can have very different fate distributions, and they age very differently in production:

A strategy with high success rate and low stop-loss rate is "comfortable" and tends to be stable.
A strategy with comparable returns achieved through frequent stop-losses is taking on path risk - the realized P&L distribution has fatter tails and the strategy is more sensitive to slippage and transaction costs.
A strategy dominated by timeouts is generating signals the market is not resolving - this typically means the spread has lost its mean-reverting character and the strategy may need to be re-screened.

We therefore monitor success / stop-loss / timeout shares as primary live diagnostics, alongside Sharpe and drawdown.

Stacked horizontal bar chart of trade fate distribution per pair, showing success / stop-loss / timeout shares

09. Backtest Design

9.1 Architecture

The backtest is event-driven and runs day by day. For each pair and each day, the engine performs three steps in order:

Manage any open trade. Read the current Z-score and close the position if it has reverted (success), gone too far against us (stop-loss), or aged out (timeout). On the final day of the backtest, any still-open trade is closed at that day's price.
Check for an entry. If no trade is open and the Z-score is beyond ±2.0, open a long-spread or short-spread position accordingly.
Record a ledger row capturing the date, prices, positions, cash, portfolio value, current Z-score, and any open trade identifier.

Exits are evaluated before entries on the same day, so a trade can only be opened on a day when no other position is active for that pair.

9.2 Inputs

Daily close prices for the 92 successfully-fetched tickers.
Date-aligned price panel via inner-join on dates.
Hedge ratio per pair from OLS on log prices.
The 3 selected pairs from the screening step.

9.3 Capital and sizing

Starting cash per pair: $100,000 (each pair has an independent ledger account in this implementation).
Notional per leg per trade: $10,000, scaled by hedge ratio on stock B.
Cash account: tracks proceeds and outlays, and pays the residual P&L on close.
No margin, no commissions, no slippage are modelled. These are conservative simplifications - in production they would be added at the trade level.

9.4 Outputs

The backtest produces three artifacts in outputs/:

trades.csv - one row per closed trade (the blotter).
ledger.csv - one row per trading day per pair (the daily ledger).
summary.json - aggregate trade and portfolio metrics.

The screening step also produces:

screening_all_pairs.csv - all pairwise statistics, all industries.
screening_ranked.csv - same data, ranked by composite score.
best_per_industry.csv - the top-scored pair per industry with the selected/rejected decision (10 rows).
selected_pairs.csv - the subset of best_per_industry that actually entered the backtest.

9.5 What is intentionally NOT modelled

Intraday execution (we use daily closes only).
Borrow cost on the short leg.
Dividends, splits, and other corporate actions (the IBKR TRADES series used here is unadjusted; the impact on a 3-year window with all listed large caps is small but non-zero).
Slippage and bid-ask spread.
Capital constraints across pairs (each pair is treated as an independent $100k book).

These omissions make the reported Sharpe a slight overstatement relative to what a live deployment would realize. The relative ranking of pairs and the qualitative shape of the equity curve are unaffected.

10. Blotter & Ledger

The backtest writes two CSVs that together fully reconstruct the strategy's behavior over the test window.

10.1 `trades.csv` — the blotter (one row per closed trade)

Column	Type	Meaning
`trade_id`	int	Sequential trade identifier across the run
`pair`	string	e.g. `PNC/TFC`
`industry`	string	Industry tag
`entry_date`	date	Day the position was opened
`exit_date`	date	Day the position was closed
`direction`	string	`long_spread` or `short_spread`
`entry_z`	float	Z-score at entry
`exit_z`	float	Z-score at exit
`hedge_ratio`	float	Hedge ratio used at entry
`entry_price_A`	float	Stock A close on entry day
`entry_price_B`	float	Stock B close on entry day
`exit_price_A`	float	Stock A close on exit day
`exit_price_B`	float	Stock B close on exit day
`shares_A`	int	Signed share count for stock A (>0 long, <0 short)
`shares_B`	int	Signed share count for stock B
`pnl`	float	Realized dollar P&L on the trade
`return_pct`	float	P&L divided by gross entry cost
`holding_days`	int	Days the trade was open
`fate`	string	`success`, `stop_loss`, `timeout`, or `end_close`

10.2 `ledger.csv` — the daily ledger (one row per pair per day)

Column	Type	Meaning
`date`	date	Trading day
`pair`	string	Pair this row belongs to
`industry`	string	Industry tag
`position_A`	int	Current signed shares of A (0 if flat)
`position_B`	int	Current signed shares of B (0 if flat)
`price_A`	float	Daily close of A
`price_B`	float	Daily close of B
`cash`	float	Pair's cash balance after entries/exits
`portfolio_value`	float	`cash + position_A * price_A + position_B * price_B`
`daily_pnl`	float	Day-over-day change in `portfolio_value`
`daily_return`	float	`daily_pnl / prev portfolio_value`
`zscore`	float	Spread Z-score on this day (NaN until 60-day window fills)
`open_trade_id`	int	`trade_id` if a position is open, else NaN

10.3 Reconciliation

A trade row in trades.csv and the daily ledger rows in ledger.csv are linked by trade_id / open_trade_id. Specifically:

The set of ledger rows where open_trade_id == k covers exactly the days from entry_date through exit_date - 1 for trade k (the exit-day P&L is realized into cash on the exit day itself).
Summing daily_pnl over those rows reproduces the trade's pnl up to rounding.

10.4 Sample record (illustrative)

A successful entry-and-exit on PNC / TFC might appear in the blotter as:

Field	Value	Note
`trade_id`	12
`pair`	PNC/TFC
`industry`	Banks
`entry_date`	2024-08-19
`exit_date`	2024-08-26
`direction`	short_spread
`entry_z`	+2.41
`exit_z`	+0.31
`hedge_ratio`	1.121
`entry_price_A`	162.34
`entry_price_B`	39.20
`exit_price_A`	159.12
`exit_price_B`	39.81
`shares_A`	−61	short ~$10k of PNC
`shares_B`	+286	long ~$10k × 1.121 of TFC
`pnl`	+371.21
`return_pct`	+0.0186
`holding_days`	5
`fate`	success

Numbers are illustrative — see outputs/trades.csv for the actual blotter.

11. Performance Metrics

All numbers below are computed by src/metrics.py from trades.csv and ledger.csv over the full backtest window (2023-05-02 to 2026-04-30).

11.1 Selection results (10 rows, one per industry)

Industry	Pair	Correlation	Coint p-value	ADF p-value	Hedge Ratio	Decision
Airlines	AAL / JBLU	0.560	0.136	0.046	0.529	rejected
Autos	F / GM	0.671	0.211	0.080	0.091	rejected
Banks	PNC / TFC	0.821	0.0001	0.00001	1.121	selected
Beverages	PEP / STZ	0.428	0.163	0.057	0.321	rejected
Communication	META / NFLX	0.304	0.063	0.018	0.800	rejected
Energy	COP / OXY	0.798	0.018	0.004	0.634	selected
HealthCare	PFE / BIIB	0.442	0.065	0.018	0.436	rejected
Payments	V / MA	0.851	0.238	0.094	1.027	rejected
Retail	DG / DLTR	0.494	0.001	0.0002	0.979	rejected
Semiconductors	MU / LRCX	0.742	0.012	0.002	1.343	selected

Three industries pass all three filters. Notable rejections: - V / MA has the highest correlation in the table (0.85) but fails the cointegration test (p = 0.24). High correlation alone is not enough. - DG / DLTR has very strong cointegration evidence but borderline correlation (0.49 < 0.50). We keep the threshold strict to avoid trading pairs that are statistically related but practically unstable.

11.2 Aggregate trade statistics (3 selected pairs)

Metric	Value
Number of trades	68
Average return per trade	0.85%
Average holding period	9.07 days
Success rate	45.6%
Stop-loss rate	38.2%
Timeout rate	16.2%
End-of-backtest close rate	0.0%

11.3 Aggregate portfolio statistics (3 selected pairs)

Metric	Value
Sharpe ratio (ann.)	1.45
Annualized return	1.30%
Annualized volatility	0.89%
Total return	3.92%
Max drawdown	-0.85%

11.4 Per-pair contribution

Industry	Pair	Trades	Success	Stop-loss	Timeout
Banks	PNC / TFC	20	50.0%	30.0%	20.0%
Energy	COP / OXY	22	50.0%	27.3%	22.7%
Semiconductors	MU / LRCX	26	38.5%	53.8%	7.7%

Banks and Energy contribute clean, balanced profiles. Semiconductors is the highest-frequency, lowest-quality pair in this run - 53.8% of MU / LRCX trades hit the stop-loss, and this is the pair to watch most closely going forward.

11.5 Interpreting the numbers

Sharpe of 1.45 is solid for a market-neutral pair-trading strategy. The realized volatility (0.89% ann.) is low because each pair is dollar-neutral and the three pairs are largely uncorrelated.
Total return of 3.92% over 3 years is modest in absolute terms, but the capital efficiency (low gross book size: $30k notional out of $300k cash) means the per-dollar-deployed return is meaningfully higher.
Max drawdown of -0.85% is small. With three uncorrelated pairs, even simultaneous adverse moves rarely accumulate.
The ratio of stop-loss to success (0.84:1) is on the higher side and is the main quality concern. The live monitor described in section 16 is set up specifically to catch a deterioration in this ratio.

Histogram of per-trade returns stacked by fate, showing the asymmetry between winning and losing trade magnitudes

Underwater drawdown curve over the full 2023–2026 backtest, with maximum drawdown labeled

12. Strategy Monitoring

The backtest gives us a distribution of expected outcomes. In live trading, we compare realized behavior against that distribution and ask: is the strategy still doing what the backtest said it would do?

12.1 What we monitor (and how often)

Monthly cadence, computed from the production blotter:

Metric	Backtest value	Watch band
Sharpe (ann., trailing 6 months)	1.45	flag if < 0.5 for 2 consecutive months
Average return per trade	0.85%	flag if rolling 20-trade mean < 0
Success rate	45.6%	flag if rolling 20-trade rate < 30%
Stop-loss rate	38.2%	flag if rolling 20-trade rate > 55%
Timeout rate	16.2%	flag if rolling 20-trade rate > 35%
Average holding days	9.07	flag if rolling 20-trade mean > 18 days
Max drawdown	-0.85%	flag if live drawdown < -1.3% (1.5x)

Daily cadence, computed from the production ledger:

Live drawdown vs. running peak.
Open-position count vs. expected (one per pair).
Spread Z-score per pair (so we can see entries forming).

12.2 Per-pair sanity checks

Every weekend (or on the last day of each month), we re-run the screening calculations on the trailing 12 months of data for each pair currently in production:

The rolling daily-return correlation, with a warning if it falls below 0.30.
The rolling Engle-Granger cointegration p-value, with a warning if it stays above 0.10 for two consecutive monthly checks.
The ADF p-value on the spread, with a warning if it stays above 0.10 for two consecutive monthly checks.

These are the same tests used at selection time, applied to the most recent window. If a pair's underlying statistical evidence has deteriorated, the live trade behavior may not have caught up yet — but the next few trades almost certainly will, so we want the early signal.

12.3 Reporting

A weekly status note records:

Current open positions (pair, direction, days held, current Z-score, MTM P&L).
Trades closed since the previous report and their fates.
Trailing-month Sharpe, success rate, stop-loss rate, timeout rate.
Drawdown vs. peak.
Per-pair trailing 12-month correlation, cointegration p-value, ADF p-value.
Any flags raised by the watch bands above.

This is the input that feeds the section 16 stop-trading decision.

Rolling 20-trade diagnostics over the backtest: trailing Sharpe, success rate, and stop-loss rate, each annotated with the watch-band threshold from the monitoring framework

13. When the Strategy Stops Working

A pair-trading strategy stops working when the statistical relationship between the two stocks breaks down. This can happen for many reasons: M&A activity, a major change in business mix, divergent management decisions, a regulatory shock that hits one firm but not the other, or simply a structural break in market regime. We do not need to identify the cause to react — we need a clear rule for stepping back.

13.1 Per-pair stop-trading conditions

We suspend new entries on a pair if any one of the following is true:

Trailing 12-month return correlation falls below 0.30 for two consecutive monthly checks.
Trailing 12-month Engle-Granger cointegration p-value stays above 0.10 for two consecutive monthly checks.
ADF p-value on the spread stays above 0.10 for two consecutive monthly checks.
Rolling 20-trade success rate drops below 23% (50% of the backtest 45.6%).
Rolling 20-trade stop-loss rate exceeds 55%.
Three or more stop-losses occur within any 10 consecutive trading days.

When suspended, we hold any existing open position to its natural exit (or to its stop-loss), then close the pair until the screening evidence recovers.

13.2 Portfolio-level stop conditions

We pause all new entries across all pairs if:

Live drawdown exceeds -1.3% (1.5x the backtest max drawdown of -0.85%).
Two consecutive monthly Sharpe ratios fall below 0.5.
Aggregate stop-loss rate over the trailing 60 trades exceeds 55%.

This is a circuit breaker, not a kill switch. Open positions still exit according to the standing rules; we simply do not put on new exposure until the underlying signals look healthy again.

13.3 Resumption rule

A suspended pair is eligible for re-screening in the next monthly cycle. To resume trading the pair, the same statistical thresholds must be met on the trailing 12-month window: correlation of at least 0.50, cointegration p-value of at most 0.10, and ADF p-value of at most 0.10 on the spread.

If a pair fails to re-qualify for two consecutive months, it is removed from the trading roster and the corresponding industry returns to the candidate pool for fresh selection.

13.4 What is NOT a stop signal

To avoid over-reacting, we deliberately do not treat any of the following as automatic stop signals:

A single losing trade, even a large one.
A single month of negative P&L.
An unusually wide Z-score on entry (this is the signal, not the failure).
A change in the absolute price level of either leg.

Pair trading is a tail-heavy strategy: the success-trade payoff distribution includes some that recover from very wide spreads. We need enough patience to let those resolve, while still cutting genuinely broken pairs quickly. The two-consecutive-month rule on statistical thresholds plus the trade-fate circuit breakers are calibrated to that tradeoff.