Pair Trading Across 10 Industries
A statistical pair-trading strategy applied to ten US equity industries. We screen 386 candidate pairs by correlation, Engle–Granger cointegration, and ADF on the spread, then backtest a Z-score-driven entry/exit rule with stop-loss and timeout over three years of Interactive Brokers data.
01. Strategy Overview
This project implements a statistical pair-trading strategy across 10 GICS-style US equity industries. The strategy looks for two stocks in the same industry whose prices have moved together historically, identifies them with formal statistical tests, and then trades the spread between them whenever it deviates far from its historical mean.
The trading hypothesis is simple: when two economically similar stocks share a stable long-run relationship (cointegration), short-term dislocations in their relative price tend to revert. We enter when the spread is unusually wide, exit when it reverts toward the mean, and stop out if the relationship appears to break.
Pipeline
The end-to-end workflow proceeds in seven stages:
- Define ten industries and assemble a candidate stock pool of roughly ten stocks each, for a total of 96 tickers.
- Pull three years of historical daily bars for each ticker from Interactive Brokers.
- For every pair within an industry, compute four statistics: daily-return correlation, Engle-Granger cointegration p-value on log prices, OLS hedge ratio, and an ADF p-value on the constructed spread.
- Rank the pairs and select the best one per industry, subject to the statistical thresholds.
- Run the Z-score backtest (entry at ±2, success exit at |Z| < 0.5, stop-loss at |Z| > 3, timeout at 20 trading days).
- Produce the blotter (
trades.csv) and the daily ledger (ledger.csv). - Compute the performance metrics that drive the dashboards in the later sections.
Backtest Window
- Start: 2023-05-02
- End: 2026-04-30
- Bars per stock: 752 daily bars
- Data source: Interactive Brokers (TWS API via
shinybroker)
High-Level Result
Out of 386 candidate pairs across 10 industries, 3 pairs met the strict selection thresholds (correlation ≥ 0.50, cointegration p ≤ 0.10, ADF p ≤ 0.10):
| Industry | Selected Pair |
|---|---|
| Banks | PNC / TFC |
| Energy | COP / OXY |
| Semiconductors | MU / LRCX |
The remaining 7 industries did not contain any pair that satisfied all three statistical thresholds during this backtest window.
Aggregate backtest performance (3 selected pairs, equal-notional):
- 68 round-trip trades
- Sharpe ratio: 1.45
- Total return: 3.92%
- Max drawdown: -0.85%
- Success rate: 45.6%, stop-loss rate: 38.2%, timeout rate: 16.2%
02. Market Phenomenon
What we are trying to capture
This strategy attempts to profit from temporary relative mispricing between two economically similar stocks in the same industry. If two stocks have historically moved together and the spread between their prices is statistically stationary, then large deviations from the historical average represent temporary dislocations rather than a permanent change in fundamentals. The strategy enters when the spread becomes unusually wide and exits when the spread reverts toward its mean.
Why this happens
Stocks within the same industry are exposed to the same macro factors: sector-wide demand shocks, regulation, commodity inputs, the broader market. When investors react to firm-specific news (an earnings surprise, an analyst upgrade, an executive change), the relative prices of two otherwise similar companies can drift apart even though their long-run cash flows remain coupled. That short-term divergence is what creates the trading opportunity.
What the strategy does NOT assume
- It does not assume the two stocks have the same price level.
- It does not assume the two stocks have the same volatility.
- It does not assume markets are inefficient on average.
- It only assumes that, conditional on having passed the cointegration test, the spread has a tendency to revert during the backtest horizon.
Risk: when the assumption fails
A pair can stop being cointegrated. M&A activity, a major change in business mix, a regulatory shock that hits one firm but not the other, or simply a structural break in the relationship can make the spread non-stationary. When that happens, the strategy will keep entering trades that no longer revert, which is what the stop-loss and the live monitoring rules in section 16 are designed to detect.
03. Industry Universe & Pair Selection
3.1 Why these 10 industries
The 10 industries are chosen to cover a broad cross-section of the US equity market based on the GICS sector framework (S&P/MSCI). Within each industry the candidate pool is filled with large-cap, liquid names that share business model and demand drivers, which is the precondition that makes a stable long-run spread plausible.
3.2 Candidate stock universe (96 tickers)
| Industry | Candidate stocks |
|---|---|
| Beverages | KO, PEP, MNST, KDP, STZ, TAP, CELH, FIZZ |
| Payments | V, MA, AXP, PYPL, FIS, FI, GPN, DFS |
| Energy | XOM, CVX, COP, EOG, SLB, OXY, PSX, MPC, VLO, HAL |
| Semiconductors | NVDA, AMD, INTC, AVGO, QCOM, TXN, MU, AMAT, LRCX, KLAC |
| Airlines | DAL, UAL, AAL, LUV, ALK, JBLU, CPA, RYAAY, SAVE, HA |
| Banks | JPM, BAC, C, WFC, GS, MS, USB, PNC, TFC, BK |
| Retail | WMT, TGT, COST, DG, DLTR, KR, HD, LOW, TJX, ROST |
| Autos | TSLA, F, GM, RIVN, LCID, TM, HMC, STLA, NIO, XPEV |
| Health Care | JNJ, PFE, MRK, ABBV, LLY, BMY, GILD, AMGN, REGN, BIIB |
| Communication | GOOGL, META, NFLX, DIS, CMCSA, T, VZ, TMUS, SNAP, PINS |
Of these 96 candidates, 92 returned valid 3-year daily bars from IBKR.
4 tickers were excluded because the contract did not resolve on TWS during the
fetch window (FI, DFS, SAVE, HA).
3.3 Selection process
For every distinct pair within each industry, we compute four statistics:
- Daily return correlation
- Engle-Granger cointegration p-value (on log prices)
- OLS hedge ratio (slope of
log(A) ~ log(B)) - Augmented Dickey-Fuller p-value on the spread
We then rank candidates within each industry using a composite score and pick the top pair. A pair is selected for trading only if it passes all three statistical thresholds simultaneously: correlation of at least 0.50, cointegration p-value of at most 0.10, and ADF p-value of at most 0.10 on the constructed spread.
If a top-ranked pair fails the threshold, it is rejected and that industry contributes no pair to the live strategy. This is intentional: we would rather trade fewer pairs with strong statistical support than force a pair from every industry.
04. Statistical Tests
Three statistical tests are run on every candidate pair. The tests are deliberately layered so that a passing pair shows evidence of (a) a contemporaneous relationship, (b) a stable long-run relationship, and (c) mean-reverting short-term deviations.
4.1 Daily return correlation (quick screen)
Pairs trading requires both stocks to respond to similar shocks; otherwise the spread is just two unrelated random walks. A simple Pearson correlation on daily returns is fast and gives us a first cut. We require a correlation of at least 0.50.
Correlation alone is not sufficient: two stocks can be highly correlated in returns while having a non-stationary spread (a permanently widening or narrowing relationship). That is why the next two tests matter more.
4.2 Engle-Granger cointegration test
We run statsmodels.tsa.stattools.coint on the log prices of each pair.
The null hypothesis is "no cointegration." A small p-value indicates that the
two log-price series share a stable long-run relationship - the residual from
their linear combination is stationary.
Selection rule: cointegration p-value of at most 0.10.
The plan ideally wants p < 0.05. We loosen to 0.10 because 0.05 leaves too few qualifying pairs in this 3-year window; 0.10 gives a slightly noisier but still defensible signal.
4.3 ADF test on the spread
After computing the spread (section 5), we run an Augmented Dickey-Fuller test on the spread itself. The null is "spread has a unit root" (non-stationary, random walk). A small p-value means the spread is stationary - exactly the condition the strategy needs.
Selection rule: ADF p-value of at most 0.10.
In practice, ADF and cointegration tend to agree. The combined test acts as a cross-check.
4.4 Combined ranking score
For pairs that pass all three filters, we compute a composite score
score = z(correlation) − z(cointegration_p) − z(ADF_p), where z(·)
standardizes each statistic across all candidate pairs. Higher
correlation pushes the score up, lower p-values push the score up. The top-
scoring pair within each industry is the one we report.
4.5 Why we do not use only correlation
A pair like V / MA has a daily-return correlation of 0.85 - extremely high -
but its cointegration p-value is 0.24, so it does not pass the cointegration
test. The two stocks move together day-to-day, but their long-run spread is
not stationary in the test window. Trading the spread of such a pair would be
exposed to permanent drift, which is exactly what the cointegration test is
designed to filter out.
05. Hedge Ratio & Spread
5.1 Hedge ratio via OLS on log prices
For each pair (A, B), the hedge ratio is the slope of an ordinary least squares
regression of log(price_A) on log(price_B), i.e.
log(price_A) = α + β · log(price_B) + ε.
β is the hedge ratio: how many units of stock B are needed to hedge
one unit of stock A in the constructed spread.
We use log prices, not raw prices, for two reasons:
- Log returns are roughly symmetric and approximately constant in magnitude, so the regression is less sensitive to absolute price levels (KO at $80 vs. PEP at $145 in this sample).
- Cointegration in log space corresponds to a stable price ratio, which is the economically meaningful relationship for two similar firms.
5.2 Spread definition
The spread on day t is defined as
spread_t = log(price_A_t) − hedge_ratio · log(price_B_t).
If A and B are cointegrated, this spread is approximately stationary. The backtest then standardizes it via a rolling Z-score (section 6).
5.3 Hedge ratios estimated for the 3 selected pairs
| Industry | Pair | Hedge Ratio (β) | Interpretation |
|---|---|---|---|
| Banks | PNC / TFC | 1.121 | 1 unit of log(PNC) hedged by 1.12 units of log(TFC) |
| Energy | COP / OXY | 0.634 | 1 unit of log(COP) hedged by 0.63 units of log(OXY) |
| Semiconductors | MU / LRCX | 1.343 | 1 unit of log(MU) hedged by 1.34 units of log(LRCX) |
5.4 Hedge ratio in the trading book
When sizing positions, the hedge ratio determines the relative dollar exposure
on the two legs. We commit $10,000 of notional to stock A, which means
shares_A = 10000 / price_A. The opposite leg uses $10,000 scaled by
|beta|, so shares_B = (10000 * |beta|) / price_B.
This keeps the spread position approximately neutral with respect to the common factor that drives both stocks. Both legs round to the nearest whole share; any small remaining residual exposure is recorded in the ledger and absorbed into the cash account.
06. Entry Rules
6.1 Z-score signal
Every trading day, we compute a rolling Z-score of the spread. Let μ_t be
the rolling mean of the spread over the last 60 trading days, and σ_t be
the rolling standard deviation over the same window. The Z-score is then
Z_t = (spread_t − μ_t) / σ_t.
6.2 Why a 60-day rolling window
A 60-day window is approximately three months of trading data. It is long enough to give a stable estimate of the spread's local mean and standard deviation, but short enough to adapt when market conditions change. A shorter window (e.g. 20 days) was rejected because it produces a noisy Z-score that generates whipsaw trades; a longer window (e.g. 252 days) was rejected because it reacts too slowly to regime shifts.
6.3 Entry rule
When the Z-score exceeds +2.0, the spread is unusually wide and we enter a short spread position: short stock A and go long stock B with a size determined by the hedge ratio. When the Z-score falls below −2.0, the spread is unusually narrow and we enter a long spread position: long stock A and short stock B. Between −2.0 and +2.0 we take no new position.
Only one position per pair is open at a time. If the strategy already has an open position on a pair, no new entry is taken for that pair until the existing trade is closed.
6.4 Why ±2 standard deviations
The ±2 threshold filters out normal daily noise: under a roughly normal distribution, only about 5% of daily Z-scores would be expected to exceed ±2, so this is a meaningful deviation rather than an ordinary fluctuation. Trying ±1 would generate too many marginal trades; trying ±3 would generate almost none. The 2-sigma threshold is the standard choice in academic and practitioner references.
6.5 Sizing
Entries are sized to $10,000 of notional on each leg, scaled by the hedge
ratio. In other words, shares_A = floor(10000 / price_A) and
shares_B = floor(10000 * |hedge_ratio| / price_B).
This produces a roughly dollar-neutral spread position. With a starting cash of $100,000 and three concurrent pairs, peak exposure is bounded.
07. Exit Rules
A trade can exit through one of four channels. Each is checked once per day in this order: success, stop-loss, timeout, end-of-data.
7.1 Success exit (mean reversion)
When the absolute Z-score drops below 0.5, the position is closed and
tagged with the fate success.
We exit when the spread mostly reverts toward its mean. We use 0.5 instead of exactly 0 because waiting for the spread to return to exactly zero often means giving back already-realized profit. 0.5 captures most of the reversion while leaving a small buffer.
7.2 Stop-loss exit
When the absolute Z-score exceeds 3.0, the position is closed and tagged
with the fate stop_loss.
A stop-loss is triggered when the spread has continued moving against the trade beyond 3 standard deviations. At this point the historical spread distribution no longer supports the trade, and the more likely explanation is that the underlying relationship has shifted.
7.3 Timeout exit
If a position has been open for 20 trading days without hitting either of
the two exits above, it is closed and tagged with the fate timeout.
Pair trading is a short- to medium-term mean-reversion strategy. If the spread has not reverted within roughly one trading month, it is more likely to be trending than reverting, and the original statistical edge has expired. Closing the position frees capital for a fresher signal.
7.4 End-of-data close
If a trade is still open on the last day of the backtest, it is closed at that
day's price and tagged with the fate end_close. This is a bookkeeping rule
for the backtest, not a live trading rule — in production the position would
simply remain open into the next session.
7.5 Realized fate distribution (3 selected pairs, full backtest)
| Fate | Trades | Share |
|---|---|---|
| success | 31 | 45.6% |
| stop_loss | 26 | 38.2% |
| timeout | 11 | 16.2% |
| end_close | 0 | 0.0% |
| Total | 68 | 100% |
Roughly half of trades exit cleanly via mean reversion; about 38% are stopped out. Stop-loss frequency is something we monitor closely - a sustained rise in that rate is one of the early warning signals described in section 16.
08. Trade Fates
Every closed trade in trades.csv is tagged with exactly one of four fates.
This makes downstream attribution and live monitoring straightforward.
| Fate | Trigger | Interpretation |
|---|---|---|
success |
abs(Z) drops below 0.5 | Spread reverted as predicted - the strategy worked. |
stop_loss |
abs(Z) exceeds 3.0 | Spread continued in the wrong direction - relationship may have broken. |
timeout |
Position held 20 trading days without success or stop-loss | Spread drifted sideways - signal expired without resolving. |
end_close |
Backtest ended while position was still open | Bookkeeping. Not a real fate in live trading. |
Realized counts (3 selected pairs, 2023-05 to 2026-04)
| Industry | Pair | n_trades | success | stop_loss | timeout | end_close |
|---|---|---|---|---|---|---|
| Banks | PNC / TFC | 20 | 10 | 6 | 4 | 0 |
| Energy | COP / OXY | 22 | 11 | 6 | 5 | 0 |
| Semiconductors | MU / LRCX | 26 | 10 | 14 | 2 | 0 |
| Total | 68 | 31 | 26 | 11 | 0 |
MU / LRCX shows the highest stop-loss share (54%) - the spread had a few
strong directional moves during the semiconductor cycle that were not absorbed
by the rolling Z-score window. Its overall contribution to portfolio P&L is
still positive thanks to the favorable success-trade payoff distribution, but
this pair is the one most likely to be re-screened first under the live
monitoring rules in section 16.
See outputs/trades.csv for the full per-trade record.
Why we report fate distribution explicitly
Two strategies with the same Sharpe ratio can have very different fate distributions, and they age very differently in production:
- A strategy with high success rate and low stop-loss rate is "comfortable" and tends to be stable.
- A strategy with comparable returns achieved through frequent stop-losses is taking on path risk - the realized P&L distribution has fatter tails and the strategy is more sensitive to slippage and transaction costs.
- A strategy dominated by timeouts is generating signals the market is not resolving - this typically means the spread has lost its mean-reverting character and the strategy may need to be re-screened.
We therefore monitor success / stop-loss / timeout shares as primary live diagnostics, alongside Sharpe and drawdown.
09. Backtest Design
9.1 Architecture
The backtest is event-driven and runs day by day. For each pair and each day, the engine performs three steps in order:
- Manage any open trade. Read the current Z-score and close the position if it has reverted (success), gone too far against us (stop-loss), or aged out (timeout). On the final day of the backtest, any still-open trade is closed at that day's price.
- Check for an entry. If no trade is open and the Z-score is beyond ±2.0, open a long-spread or short-spread position accordingly.
- Record a ledger row capturing the date, prices, positions, cash, portfolio value, current Z-score, and any open trade identifier.
Exits are evaluated before entries on the same day, so a trade can only be opened on a day when no other position is active for that pair.
9.2 Inputs
- Daily close prices for the 92 successfully-fetched tickers.
- Date-aligned price panel via inner-join on dates.
- Hedge ratio per pair from OLS on log prices.
- The 3 selected pairs from the screening step.
9.3 Capital and sizing
- Starting cash per pair: $100,000 (each pair has an independent ledger account in this implementation).
- Notional per leg per trade: $10,000, scaled by hedge ratio on stock B.
- Cash account: tracks proceeds and outlays, and pays the residual P&L on close.
- No margin, no commissions, no slippage are modelled. These are conservative simplifications - in production they would be added at the trade level.
9.4 Outputs
The backtest produces three artifacts in outputs/:
trades.csv- one row per closed trade (the blotter).ledger.csv- one row per trading day per pair (the daily ledger).summary.json- aggregate trade and portfolio metrics.
The screening step also produces:
screening_all_pairs.csv- all pairwise statistics, all industries.screening_ranked.csv- same data, ranked by composite score.best_per_industry.csv- the top-scored pair per industry with the selected/rejected decision (10 rows).selected_pairs.csv- the subset ofbest_per_industrythat actually entered the backtest.
9.5 What is intentionally NOT modelled
- Intraday execution (we use daily closes only).
- Borrow cost on the short leg.
- Dividends, splits, and other corporate actions (the IBKR
TRADESseries used here is unadjusted; the impact on a 3-year window with all listed large caps is small but non-zero). - Slippage and bid-ask spread.
- Capital constraints across pairs (each pair is treated as an independent $100k book).
These omissions make the reported Sharpe a slight overstatement relative to what a live deployment would realize. The relative ranking of pairs and the qualitative shape of the equity curve are unaffected.
10. Blotter & Ledger
The backtest writes two CSVs that together fully reconstruct the strategy's behavior over the test window.
10.1 trades.csv — the blotter (one row per closed trade)
| Column | Type | Meaning |
|---|---|---|
trade_id |
int | Sequential trade identifier across the run |
pair |
string | e.g. PNC/TFC |
industry |
string | Industry tag |
entry_date |
date | Day the position was opened |
exit_date |
date | Day the position was closed |
direction |
string | long_spread or short_spread |
entry_z |
float | Z-score at entry |
exit_z |
float | Z-score at exit |
hedge_ratio |
float | Hedge ratio used at entry |
entry_price_A |
float | Stock A close on entry day |
entry_price_B |
float | Stock B close on entry day |
exit_price_A |
float | Stock A close on exit day |
exit_price_B |
float | Stock B close on exit day |
shares_A |
int | Signed share count for stock A (>0 long, <0 short) |
shares_B |
int | Signed share count for stock B |
pnl |
float | Realized dollar P&L on the trade |
return_pct |
float | P&L divided by gross entry cost |
holding_days |
int | Days the trade was open |
fate |
string | success, stop_loss, timeout, or end_close |
10.2 ledger.csv — the daily ledger (one row per pair per day)
| Column | Type | Meaning |
|---|---|---|
date |
date | Trading day |
pair |
string | Pair this row belongs to |
industry |
string | Industry tag |
position_A |
int | Current signed shares of A (0 if flat) |
position_B |
int | Current signed shares of B (0 if flat) |
price_A |
float | Daily close of A |
price_B |
float | Daily close of B |
cash |
float | Pair's cash balance after entries/exits |
portfolio_value |
float | cash + position_A * price_A + position_B * price_B |
daily_pnl |
float | Day-over-day change in portfolio_value |
daily_return |
float | daily_pnl / prev portfolio_value |
zscore |
float | Spread Z-score on this day (NaN until 60-day window fills) |
open_trade_id |
int | trade_id if a position is open, else NaN |
10.3 Reconciliation
A trade row in trades.csv and the daily ledger rows in ledger.csv are
linked by trade_id / open_trade_id. Specifically:
- The set of ledger rows where
open_trade_id == kcovers exactly the days fromentry_datethroughexit_date - 1for tradek(the exit-day P&L is realized into cash on the exit day itself). - Summing
daily_pnlover those rows reproduces the trade'spnlup to rounding.
10.4 Sample record (illustrative)
A successful entry-and-exit on PNC / TFC might appear in the blotter as:
| Field | Value | Note |
|---|---|---|
trade_id |
12 | |
pair |
PNC/TFC | |
industry |
Banks | |
entry_date |
2024-08-19 | |
exit_date |
2024-08-26 | |
direction |
short_spread | |
entry_z |
+2.41 | |
exit_z |
+0.31 | |
hedge_ratio |
1.121 | |
entry_price_A |
162.34 | |
entry_price_B |
39.20 | |
exit_price_A |
159.12 | |
exit_price_B |
39.81 | |
shares_A |
−61 | short ~$10k of PNC |
shares_B |
+286 | long ~$10k × 1.121 of TFC |
pnl |
+371.21 | |
return_pct |
+0.0186 | |
holding_days |
5 | |
fate |
success |
Numbers are illustrative — see outputs/trades.csv for the actual blotter.
11. Performance Metrics
All numbers below are computed by src/metrics.py from trades.csv and
ledger.csv over the full backtest window (2023-05-02 to 2026-04-30).
11.1 Selection results (10 rows, one per industry)
| Industry | Pair | Correlation | Coint p-value | ADF p-value | Hedge Ratio | Decision |
|---|---|---|---|---|---|---|
| Airlines | AAL / JBLU | 0.560 | 0.136 | 0.046 | 0.529 | rejected |
| Autos | F / GM | 0.671 | 0.211 | 0.080 | 0.091 | rejected |
| Banks | PNC / TFC | 0.821 | 0.0001 | 0.00001 | 1.121 | selected |
| Beverages | PEP / STZ | 0.428 | 0.163 | 0.057 | 0.321 | rejected |
| Communication | META / NFLX | 0.304 | 0.063 | 0.018 | 0.800 | rejected |
| Energy | COP / OXY | 0.798 | 0.018 | 0.004 | 0.634 | selected |
| HealthCare | PFE / BIIB | 0.442 | 0.065 | 0.018 | 0.436 | rejected |
| Payments | V / MA | 0.851 | 0.238 | 0.094 | 1.027 | rejected |
| Retail | DG / DLTR | 0.494 | 0.001 | 0.0002 | 0.979 | rejected |
| Semiconductors | MU / LRCX | 0.742 | 0.012 | 0.002 | 1.343 | selected |
Three industries pass all three filters. Notable rejections: - V / MA has the highest correlation in the table (0.85) but fails the cointegration test (p = 0.24). High correlation alone is not enough. - DG / DLTR has very strong cointegration evidence but borderline correlation (0.49 < 0.50). We keep the threshold strict to avoid trading pairs that are statistically related but practically unstable.
11.2 Aggregate trade statistics (3 selected pairs)
| Metric | Value |
|---|---|
| Number of trades | 68 |
| Average return per trade | 0.85% |
| Average holding period | 9.07 days |
| Success rate | 45.6% |
| Stop-loss rate | 38.2% |
| Timeout rate | 16.2% |
| End-of-backtest close rate | 0.0% |
11.3 Aggregate portfolio statistics (3 selected pairs)
| Metric | Value |
|---|---|
| Sharpe ratio (ann.) | 1.45 |
| Annualized return | 1.30% |
| Annualized volatility | 0.89% |
| Total return | 3.92% |
| Max drawdown | -0.85% |
11.4 Per-pair contribution
| Industry | Pair | Trades | Success | Stop-loss | Timeout |
|---|---|---|---|---|---|
| Banks | PNC / TFC | 20 | 50.0% | 30.0% | 20.0% |
| Energy | COP / OXY | 22 | 50.0% | 27.3% | 22.7% |
| Semiconductors | MU / LRCX | 26 | 38.5% | 53.8% | 7.7% |
Banks and Energy contribute clean, balanced profiles. Semiconductors is the highest-frequency, lowest-quality pair in this run - 53.8% of MU / LRCX trades hit the stop-loss, and this is the pair to watch most closely going forward.
11.5 Interpreting the numbers
- Sharpe of 1.45 is solid for a market-neutral pair-trading strategy. The realized volatility (0.89% ann.) is low because each pair is dollar-neutral and the three pairs are largely uncorrelated.
- Total return of 3.92% over 3 years is modest in absolute terms, but the capital efficiency (low gross book size: $30k notional out of $300k cash) means the per-dollar-deployed return is meaningfully higher.
- Max drawdown of -0.85% is small. With three uncorrelated pairs, even simultaneous adverse moves rarely accumulate.
- The ratio of stop-loss to success (0.84:1) is on the higher side and is the main quality concern. The live monitor described in section 16 is set up specifically to catch a deterioration in this ratio.
12. Strategy Monitoring
The backtest gives us a distribution of expected outcomes. In live trading, we compare realized behavior against that distribution and ask: is the strategy still doing what the backtest said it would do?
12.1 What we monitor (and how often)
Monthly cadence, computed from the production blotter:
| Metric | Backtest value | Watch band |
|---|---|---|
| Sharpe (ann., trailing 6 months) | 1.45 | flag if < 0.5 for 2 consecutive months |
| Average return per trade | 0.85% | flag if rolling 20-trade mean < 0 |
| Success rate | 45.6% | flag if rolling 20-trade rate < 30% |
| Stop-loss rate | 38.2% | flag if rolling 20-trade rate > 55% |
| Timeout rate | 16.2% | flag if rolling 20-trade rate > 35% |
| Average holding days | 9.07 | flag if rolling 20-trade mean > 18 days |
| Max drawdown | -0.85% | flag if live drawdown < -1.3% (1.5x) |
Daily cadence, computed from the production ledger:
- Live drawdown vs. running peak.
- Open-position count vs. expected (one per pair).
- Spread Z-score per pair (so we can see entries forming).
12.2 Per-pair sanity checks
Every weekend (or on the last day of each month), we re-run the screening calculations on the trailing 12 months of data for each pair currently in production:
- The rolling daily-return correlation, with a warning if it falls below 0.30.
- The rolling Engle-Granger cointegration p-value, with a warning if it stays above 0.10 for two consecutive monthly checks.
- The ADF p-value on the spread, with a warning if it stays above 0.10 for two consecutive monthly checks.
These are the same tests used at selection time, applied to the most recent window. If a pair's underlying statistical evidence has deteriorated, the live trade behavior may not have caught up yet — but the next few trades almost certainly will, so we want the early signal.
12.3 Reporting
A weekly status note records:
- Current open positions (pair, direction, days held, current Z-score, MTM P&L).
- Trades closed since the previous report and their fates.
- Trailing-month Sharpe, success rate, stop-loss rate, timeout rate.
- Drawdown vs. peak.
- Per-pair trailing 12-month correlation, cointegration p-value, ADF p-value.
- Any flags raised by the watch bands above.
This is the input that feeds the section 16 stop-trading decision.
13. When the Strategy Stops Working
A pair-trading strategy stops working when the statistical relationship between the two stocks breaks down. This can happen for many reasons: M&A activity, a major change in business mix, divergent management decisions, a regulatory shock that hits one firm but not the other, or simply a structural break in market regime. We do not need to identify the cause to react — we need a clear rule for stepping back.
13.1 Per-pair stop-trading conditions
We suspend new entries on a pair if any one of the following is true:
- Trailing 12-month return correlation falls below 0.30 for two consecutive monthly checks.
- Trailing 12-month Engle-Granger cointegration p-value stays above 0.10 for two consecutive monthly checks.
- ADF p-value on the spread stays above 0.10 for two consecutive monthly checks.
- Rolling 20-trade success rate drops below 23% (50% of the backtest 45.6%).
- Rolling 20-trade stop-loss rate exceeds 55%.
- Three or more stop-losses occur within any 10 consecutive trading days.
When suspended, we hold any existing open position to its natural exit (or to its stop-loss), then close the pair until the screening evidence recovers.
13.2 Portfolio-level stop conditions
We pause all new entries across all pairs if:
- Live drawdown exceeds -1.3% (1.5x the backtest max drawdown of -0.85%).
- Two consecutive monthly Sharpe ratios fall below 0.5.
- Aggregate stop-loss rate over the trailing 60 trades exceeds 55%.
This is a circuit breaker, not a kill switch. Open positions still exit according to the standing rules; we simply do not put on new exposure until the underlying signals look healthy again.
13.3 Resumption rule
A suspended pair is eligible for re-screening in the next monthly cycle. To resume trading the pair, the same statistical thresholds must be met on the trailing 12-month window: correlation of at least 0.50, cointegration p-value of at most 0.10, and ADF p-value of at most 0.10 on the spread.
If a pair fails to re-qualify for two consecutive months, it is removed from the trading roster and the corresponding industry returns to the candidate pool for fresh selection.
13.4 What is NOT a stop signal
To avoid over-reacting, we deliberately do not treat any of the following as automatic stop signals:
- A single losing trade, even a large one.
- A single month of negative P&L.
- An unusually wide Z-score on entry (this is the signal, not the failure).
- A change in the absolute price level of either leg.
Pair trading is a tail-heavy strategy: the success-trade payoff distribution includes some that recover from very wide spreads. We need enough patience to let those resolve, while still cutting genuinely broken pairs quickly. The two-consecutive-month rule on statistical thresholds plus the trade-fate circuit breakers are calibrated to that tradeoff.