Why Your Trade-Tape OFI Caps at 35% R-Squared: The Cancel Stream Your Signal Pipeline Is Ignoring

Ariel Silahian

HFT Systems Architect & Consultant | 20+ years architecting high-frequency trading systems. Author of "Trading Systems Performance Unleashed" (Packt, 2024). Creator of VisualHFT.

I help financial institutions architect high-frequency trading systems that are fast, stable, and profitable.

>> Learn more about what I do:
https://hftAdvisory.com

>> Your execution logs contain $200K+ in recoverable edge.
>> Microstructure Diagnostics — one-time audit, 3-5 day turnaround
https://hftadvisory.com/microstructure-diagnostics

Table of Contents

The 30-Point Gap That Lives in Your Cancel Stream
Why 97% of Your Order Book Never Prints a Trade
The Venue Problem: Where Clean Cancel Data Exists, Where It Does Not
Signal Horizon, Endogeneity, and When the Model Breaks
The Architectural Implication and the Test Worth Running
A 5-Question Diagnostic Before Your Next Quarter Review

Introduction

If your OFI signal is built on the trade tape, R-squared of 0.35 is probably the ceiling you keep hitting on 10-second predictive windows. That is not a modeling problem. It is a data selection problem, and it has a specific cause: you are discarding the dominant event type in the order book before your model ever runs.

The research case for this has been accumulating for over a decade. Cont, Kukanov, and Stoikov (2014) showed that a unified order flow imbalance measure built from all order book events, not just trades, achieves R-squared near 0.65 on 10-second price-change windows across 50 NYSE stocks. A signed trade volume signal on the same data sits around 0.32 to 0.35. The 30-point gap is large enough to materially change a strategy’s alpha viability, its position sizing, and the range of instruments it can trade profitably.

Thank you for reading this post, don't forget to subscribe!

Subscribe by Email

More recent work has decomposed OFI further, separating add events, cancel events, and trade events as distinct regressors. The findings from Sitaru, Calinescu, and Cucuringu (2023) at ACM ICAIF flip the intuition that most practitioners carry: in forward-looking scenarios, add-OFI ranks first in predictive importance, cancel-OFI ranks second, and trade-OFI ranks last. The trade tape you are consuming is the weakest predictor of the three.

I have run this diagnostic across several venue configurations over more than 20 years in production HFT infrastructure. The gap is real, the research is replicable, and the architectural fix is straightforward once you understand where the data problem actually lives. This article expands on a LinkedIn post I published on April 27, 2026 (view original). The long form gives me room to be precise where the short form had to compress.

The 30-Point Gap That Lives in Your Cancel Stream

The Cont, Kukanov, and Stoikov (2014) paper in the Journal of Financial Econometrics is the foundational reference here, but it requires careful reading to avoid a common misattribution. The paper defines a unified OFI measure, not three separate components. The comparison in the paper is between this unified OFI measure (which incorporates all order book events) and signed trade volume, a simple trade-tape proxy. The unified measure achieves R-squared near 0.65 on 10-second windows. The trade-only proxy sits around 0.32 to 0.35. Across the 50 NYSE stocks in the study, R-squared of 0.50 or above was reached for 44 of 50.

That gap, roughly 30 percentage points, is what practitioners are leaving behind when they build OFI from the trade tape alone.

The three-component decomposition, separating add-OFI, cancel-OFI, and trade-OFI as distinct regressors, comes from a different paper: Sitaru, Calinescu, and Cucuringu (2023), published at ACM ICAIF. Their dataset covers 100 stocks over three years of L3 data. In forward-looking predictive scenarios, the ranking of the three components by predictive importance is: add-OFI first, cancel-OFI second, trade-OFI last. The contemporaneous gain from decomposition over unified OFI is modest, but the forward-looking improvement is statistically significant, which is the number that matters for strategy viability.

The research frontier has continued moving. Cont, Cucuringu, and Zhang (2023) in Quantitative Finance show that a multi-level integrated OFI measure, incorporating book depth across multiple price levels, pushes R-squared to 80 to 87%. Lu, Reinert, and Cucuringu (2024) introduce conditional OFI, segmenting flow by trade co-occurrence patterns, and report R-squared in the 84 to 86% range with Sharpe ratios of 1.79 on strategies built from the conditional decomposition, versus negative Sharpe on undifferentiated OFI. The practical message: each layer of decomposition adds measurable predictive power, and the baseline unified-OFI work from 2014 is the floor, not the ceiling.

Why 97% of Your Order Book Never Prints a Trade

The reason cancel-OFI carries predictive power is not arbitrary. It follows directly from the structure of modern electronic order books.

Khomyn and Putnins (2021), writing in the Journal of Banking and Finance, document that 97% of limit orders cancel before execution in US equities. The cancel-to-trade ratio expresses this fraction. In any venue with a CTR at that level, the dominant event type by volume is not a trade, not an add, it is a cancellation. A signal pipeline that consumes the trade tape is building on the 3% of book activity that completes as a fill, and discarding the 97% that does not.

The Khomyn and Putnins framing is also important for avoiding a misread: high CTR is primarily driven by market-making activity, not manipulation. Market makers continuously update their quotes in response to changing conditions, which generates high cancellation rates as a structural feature of their role. When the authors frame high CTR as “algos gone wild” in their title, they are addressing the question of whether this rate signals problematic behavior. Their finding is that it largely reflects normal market-making dynamics. That context matters because it clarifies what cancel-OFI is actually measuring: the real-time positioning adjustments of the dominant liquidity providers in the book.

The theoretical mechanism for why this carries predictive power is formalized by Bacry, Mastromatteo, and Muzy (2015) in their Hawkes process framework for finance. In an 8-dimensional Hawkes model of order book events, cancellations are self-exciting and cross-exciting. A wave of cancellations on one side of the book excites further cancellations and suppresses trade arrivals. This cross-excitation structure is what gives cancel-OFI its leading indicator character. The cancels arrive before the price move, not after.

Dahlström, Hagströmer, and Nordén (2024), writing in the Financial Review, add a further mechanism: cancel decisions are primarily driven by depth changes on both sides of the limit order book, more so than by price changes or inventory considerations. When the book thins on either side, market makers cancel to manage adverse selection exposure. This depth-reactive behavior means that cancel-OFI is not just measuring historical fills, it is measuring real-time risk management by the participants most sensitive to incoming informed flow. That is a predictive signal, not noise.

The Venue Problem: Where Clean Cancel Data Exists, Where It Does Not

The research establishes the case for including cancel events in OFI models. The practical constraint is whether your venue stack actually delivers clean cancel data. L3 market data, where every order add, modify, and cancel arrives as a discrete timestamped message, is not universally available. The difference between L3 and L2 is not a formatting detail; it is the difference between observing cancellations directly and inferring them.

Here is the current state by venue, as of the date of this article, with accuracy caveats noted where they apply.

Coinbase (spot and derivatives). The Full channel WebSocket delivers native L3 data. Every received, open, done, change, and match event arrives as a discrete message. The channel is available to accounts with a Coinbase Exchange API key. There is no per-message fee; the API key is free to obtain, and requires a free Coinbase Exchange account. Coinbase mandated authentication for the Full channel in August 2023, so “free and open” requires the clarification: no cost, but authentication is required.

Hyperliquid. Native order-level granularity is available via the l4Book WebSocket subscription. Hyperliquid calls this “L4” rather than “L3,” because the feed includes on-chain user address data attached to order events, providing a layer of attribution not available on traditional L3 feeds. This is worth documenting because the terminology diverges from conventional L3 usage and can create confusion in architecture docs. The feed is public with rate limits (100 connections, 1,000 subscriptions per connection). Historical replay is available through 0xArchive; a free tier exists, with paid tiers for heavier usage.

Nasdaq-listed equities via Databento (TotalView-ITCH). L3 confirmed. Databento provides TotalView-ITCH data, which includes every order add, cancel, and execution as discrete messages. Pricing information cited in the LinkedIn post ($0.45 per GB historical, $375 per month professional non-display) reflects pre-January 2025 rates. Databento restructured pricing in January 2025, introducing a $199 per month Standard subscription tier plus per-GB charges. Current rates should be verified at databento.com/pricing before any cost modeling.

NYSE Integrated Feed (Pillar platform). L3 available via paid subscription. The NYSE OpenBook Ultra product, despite its name, is an L2 feed. The distinction matters; confirm which feed your data contract covers.

IEX (DEEP and TOPS). IEX is commonly cited as an L3 venue because of its transparent order book philosophy. This is a misconception worth correcting directly: DEEP and TOPS are L2 and L1 feeds. IEX does not publish a publicly accessible L3 feed with per-order cancel events. If your architecture doc lists IEX as an L3 cancel-OFI source, that is a data-sourcing error.

Binance. L2 only. The standard depth stream throttles at 100ms. Multiple order events between snapshots compress into a single delta. A cancellation is not observable as a cancellation; you see a quantity reduction at a price level and infer. Binance does not offer an L3 feed for external consumption.

OKX. L2 only via the standard books channel, which updates at 100ms. OKX does offer a tick-by-tick channel (books-l2-tbt) with 10ms update frequency, but access to this channel is VIP4 tier and above, requiring minimum 30-day trading volumes. Even at 10ms, you are receiving aggregated depth updates, not discrete cancel messages.

Bybit. L2 only. Bybit’s depth stream is event-driven rather than fixed-interval: a snapshot is sent if three seconds elapse without a change, with deltas delivered on each change event. This is not a 100ms throttle in the same sense as Binance; it is a delta-on-change feed with a 3-second idle timeout. Cancel events are still not observable as discrete messages; they arrive as quantity changes at price levels.

For Binance, OKX, and Bybit, L2 inference is the available option. The approach: track the quantity at each price level, record an inferred cancel when the quantity decreases without a matching trade execution in the same snapshot window. Inference accuracy degrades in busy books. The 30 to 50% degradation figure I cited in the LinkedIn post is a practitioner estimate based on diagnostic runs, not a peer-reviewed measurement. It should be treated as a reference range, not a precise number. The accuracy floor is what matters when your signal has a short half-life.

Signal Horizon, Endogeneity, and When the Model Breaks

Before running cancel-OFI in production, three constraints deserve explicit treatment: signal horizon, endogeneity, and the regime conditions under which the model fails.

Signal horizon. Kolm, Turiel, and Westray (2023), writing in Mathematical Finance, study deep order flow imbalance across 115 Nasdaq stocks (not NYSE, an important distinction from the Cont 2014 dataset). Their finding on effective horizon is that the OFI signal is informative over approximately two average price changes. Translating that to clock time: in liquid equities, two average price changes typically occur within 10 to 30 seconds. That translation is a practitioner inference from the paper, not a direct quote of a clock-time figure from the authors. The number is regime-dependent; the average price change interval compresses during high-volatility periods and extends during quiet ones. What the paper establishes cleanly is the relative-event framing: a signal that decays in two price changes is a short-horizon signal, and the architecture around it must respect that.

The practical implication is direct: a 10-second half-life and a 12-second data pipeline are not compatible. If you are fetching snapshots at 250ms intervals, aggregating them, shipping them to a signal process, and receiving a latency budget in the 12 to 20 second range, you are running behind the half-life of the signal you are trying to exploit.

Endogeneity. Cont, Kukanov, and Stoikov (2014) themselves flag this in the paper. Large orders that move prices feed back into OFI measures because price-changing events affect subsequent order book behavior. Part of the R-squared of 0.65 reflects this mechanical co-movement. The forward-looking predictive power, which is the number that matters for trading, is lower than the contemporaneous R-squared. The contemporaneous figure is not useless; it establishes that the measure has tight linear relationship with price changes. But it inflates the apparent predictive alpha. Forward-looking R-squared from the Sitaru et al. decomposition shows statistically significant improvement from including cancel and add components, but the headline numbers from Cont 2014 should be understood in that contemporaneous framing.

Regime failures. Chi et al. (2021), writing in Hindawi Scientific Programming, attempted to replicate the Cont 2014 OFI framework on Chinese mainland equity markets. The replication failed. The mechanism: Chinese equity markets operate under regulatory cancellation throttles that suppress the cancel-to-trade ratio well below the 97% rate observed in US markets. When CTR is structurally low, the cancel stream carries less information, and the OFI-price relationship breaks down.

This falsification case actually strengthens the article’s thesis rather than undermining it. The OFI decomposition framework works because modern electronic order books have extremely high cancellation rates. The cancel component carries predictive power precisely because 97% of orders cancel. Pull out the cancel stream (as the Chinese regulatory framework effectively does) and the signal collapses. The dependency on high CTR is not a weakness of the framework; it is the mechanism explaining why the cancel stream is informative in the first place.

The Briola, Bartolucci, and Aste (2024/2025) paper in Quantitative Finance adds a production-level consideration: cancel-rate composition determines whether ML-based order flow signals translate to P&L in live trading. A model trained on cancel-rich data from a period of normal market-making activity may degrade if the cancellation regime shifts during stress. This is not a reason to avoid cancel-OFI; it is a reason to monitor CTR as a model validity signal alongside the signal itself.

The Architectural Implication and the Test Worth Running

The Federal Reserve published a FEDS Note in November 2025 applying the OFI framework to US Treasury markets during the April 2025 tariff-related volatility. The central bank’s finding: different OFI patterns across Treasury instruments explained divergent volatility outcomes during that period despite comparable book depth. That the Fed is using decomposed order flow imbalance as an analytical lens for policy-relevant market structure research is a useful benchmark for anyone still treating cancel-OFI as an esoteric research concept. The framework has reached policy altitude.

The architecture decision that follows from the research is not complicated, though implementing it cleanly requires attention to data sourcing and pipeline latency. Three components:

First, decompose OFI by event type. A unified OFI measure is better than a trade-only measure. A decomposed measure (add-OFI, cancel-OFI, trade-OFI as separate inputs) is better than unified when you have L3 data, particularly for forward-looking prediction. The additional complexity is engineering overhead, not conceptual complexity.

Second, prioritize venues with native L3 where signal quality is decision-relevant. For instruments where the strategy’s alpha viability depends on the predictive power of the OFI signal, the venue data stack should provide true L3. Where you have the option to source Coinbase Full channel or Hyperliquid l4Book data, using L2 inference instead is a choice with a measurable accuracy cost.

Third, for L2-only venues, implement cancel inference with a stated accuracy floor. The inference approach (track quantity deltas, record an inferred cancel when quantity drops without a matching trade) is legitimate when L3 is unavailable. The accuracy floor, whatever your diagnostic shows it to be on your actual instruments, should be a documented parameter in your signal configuration, not an implicit assumption.

Cartea, Donnelly, and Jaimungal (2018) in Applied Mathematical Finance show that adverse selection cost rises when order book signal quality degrades. The cost of running an underpowered OFI signal on a strategy with adverse selection exposure is not just an alpha drag; it affects fill quality distribution.

The test. Run three models on the same instrument, over the same period, long enough for statistical separation (the specific length depends on your instrument’s average-price-change interval and your target confidence level):

Trade-OFI alone.
Cancel-OFI alone.
Combined model.

If the combined model does not measurably dominate either component in forward-looking predictive power, two hypotheses remain open. First: your snapshot rate is aliasing cancel events, compressing multiple cancel messages into a single delta and making your cancel-OFI measure unreliable. Second: your pipeline latency is exceeding the signal half-life, so by the time the combined signal reaches your execution logic, the price move it was predicting has already occurred.

A 5-Question Diagnostic Before Your Next Quarter Review

This section is designed to be saved. Each question maps to a specific decision in your signal and data architecture.

1. Do we decompose OFI by event type, or do we consume a unified order flow measure?

If unified: you are leaving forward-looking predictive power on the table. The improvement from decomposition is statistically significant in forward-looking scenarios (Sitaru et al. 2023). The minimum viable change is adding cancel events as a separate regressor alongside trade events.

2. Which venues in our stack deliver native L3 cancel data, and which require inference?

Map each instrument to its venue and classify: native L3 (discrete cancel messages), L2 inference (quantity-delta method), or no cancel visibility. IEX DEEP is L2, not L3. Binance, OKX, and Bybit are L2 inference at best. Coinbase Full channel and Hyperliquid l4Book are native L3 with no per-message fee. Databento provides Nasdaq TotalView-ITCH L3 at a subscription cost; verify current pricing at databento.com/pricing.

3. For L2-only venues, have we measured and documented our cancel inference accuracy?

The 30 to 50% degradation range from my diagnostic work is a reference point, not a universal figure. Your instruments, your snapshot intervals, and your inference logic will produce a different number. That number should be a documented floor in your signal config, not an assumption. If you have not run the measurement, the floor is unknown and your model validity claims are incomplete.

4. What is our data pipeline latency relative to two average price changes on each instrument?

Calculate the average price change interval for each instrument at your typical trading hours. Multiply by two to get the approximate half-life horizon of the OFI signal. Compare that to your end-to-end latency from market data receipt to signal output. If the pipeline latency approaches or exceeds the half-life horizon, the signal is degraded before it reaches execution logic.

5. Have we run the cancel-OFI vs trade-OFI vs combined directional test, and do we have a documented result?

If the answer is no, the question of whether cancel-OFI improves your specific signal on your specific instruments is open. The test is the diagnostic. Run it, document the result, and let the result drive the architecture decision rather than defaulting to a prior assumption that the trade tape is sufficient.

Conclusion

The cancel stream carries 30 percentage points of predictive power that trade-tape OFI leaves behind. The research case for including it is well-established across multiple groups, multiple datasets, and now central bank research on Treasury markets. The architectural path to capturing it is a matter of data sourcing and pipeline latency management.

Three things remain open, and I will state them directly.

The 30 to 50% L2 inference degradation figure is practitioner reasoning, not a peer-reviewed measurement. Anyone running parallel cancel-OFI inference against true L3 on the same instrument and time window has the comparison that matters. That empirical benchmark is the one worth publishing.

The half-life translation from “two average price changes” to clock seconds is regime-dependent. In a high-volatility session, two price changes happen in 2 to 5 seconds. In a quiet book, the same count takes 30 to 60 seconds. The pipeline latency constraint is not a fixed number; it moves with market conditions, and the architecture should account for that.

The OKX tick-by-tick channel (books-l2-tbt) claims 10ms update frequency, but access is VIP4-gated. Whether that channel actually delivers cancel event observability meaningfully beyond the standard 100ms channel is an empirical question that depends on order flow patterns during busy periods. The published spec says faster snapshots; whether the event-level information is preserved at that cadence in practice requires testing on production data.

The falsifiable test is the three-model directional comparison: trade-OFI alone, cancel-OFI alone, combined, run to statistical separation. If the combined model does not dominate, two hypotheses sit open: snapshot aliasing or latency budget. The failure mode and the venue where you found it is the data point worth bringing.

References

Bacry, E., Mastromatteo, I., and Muzy, J-F. (2015). Hawkes processes in finance. arXiv:1502.04592.
Briola, A., Bartolucci, S., and Aste, T. (2024/2025). Deep limit order book forecasting: A microstructural guide. Quantitative Finance.
Cartea, A., Donnelly, R., and Jaimungal, S. (2018). Enhancing trading strategies with order book signals. Applied Mathematical Finance, 25(1):1-35.
Chi et al. (2021). The price impact of order book events from a dimension of time. Scientific Programming.
Cont, R., Cucuringu, M., and Zhang, C. (2023). Cross-impact of order flow imbalance in equity markets. Quantitative Finance, 23(10).
Cont, R., Kukanov, A., and Stoikov, S. (2014). The price impact of order book events. Journal of Financial Econometrics, 12(1):47-88.
Dahlström, P., Hagströmer, B., and Nordén, L. (2024). The determinants of limit order cancellations. Financial Review, 59(1):181-201.
Federal Reserve Board. (November 2025). Order flow imbalances and amplification of price movements: Evidence from U.S. Treasury markets. FEDS Notes.
Khomyn, M., and Putninš, T.J. (2021). Algos gone wild: What drives the extreme order cancellation rates in modern markets?. Journal of Banking and Finance, 129.
Kolm, P.N., Turiel, J., and Westray, N. (2023). Deep order flow imbalance: Extracting alpha at multiple horizons from the limit order book. Mathematical Finance, 33(4):1044-1081.
Lu, Y., Reinert, G., and Cucuringu, M. (2024). Trade co-occurrence, trade flow decomposition, and conditional order imbalance in equity markets. Quantitative Finance, 24(6).
Sitaru, B., Calinescu, A., and Cucuringu, M. (2023). Order flow decomposition for price impact analysis in equity limit order books. Proceedings of the 4th ACM International Conference on AI in Finance (ICAIF ’23).

Originally shared as a LinkedIn post on 2026-04-27. View original

Ariel Silahian has 20+ years in production HFT infrastructure. He works with specialized teams to assess signal architecture, data sourcing decisions, and execution quality diagnostics for quant funds and electronic trading desks. Architecture assessments can be arranged via the contact page.

Never Miss an Update

Get notified when we publish new analysis on HFT, market microstructure, and electronic trading infrastructure. No spam.

Subscribe by Email

Ariel Silahian

HFT Systems Architect & Consultant | 20+ years architecting high-frequency trading systems. Author of "Trading Systems Performance Unleashed" (Packt, 2024). Creator of VisualHFT.

I help financial institutions architect high-frequency trading systems that are fast, stable, and profitable.

>> Learn more about what I do:
https://hftAdvisory.com

>> Your execution logs contain $200K+ in recoverable edge.
>> Microstructure Diagnostics — one-time audit, 3-5 day turnaround
https://hftadvisory.com/microstructure-diagnostics

... more info about me 👇

Why Your Trade-Tape OFI Caps at 35% R-Squared: The Cancel Stream Your Signal Pipeline Is Ignoring

Ariel Silahian

Introduction

The 30-Point Gap That Lives in Your Cancel Stream

Why 97% of Your Order Book Never Prints a Trade

The Venue Problem: Where Clean Cancel Data Exists, Where It Does Not

Signal Horizon, Endogeneity, and When the Model Breaks

The Architectural Implication and the Test Worth Running

A 5-Question Diagnostic Before Your Next Quarter Review

Conclusion

References

Never Miss an Update

Leave a Reply Cancel reply

Subscribe to Updates

Ariel Silahian

Introduction

The 30-Point Gap That Lives in Your Cancel Stream

Why 97% of Your Order Book Never Prints a Trade

The Venue Problem: Where Clean Cancel Data Exists, Where It Does Not

Signal Horizon, Endogeneity, and When the Model Breaks

The Architectural Implication and the Test Worth Running

A 5-Question Diagnostic Before Your Next Quarter Review

Conclusion

References

Never Miss an Update

Related Posts

Cancel-First Order Book Design: From ITCH 5.0 to L1D Hot Caches

From Inside a Hedge Fund: Revelations From Insiders

3 Ways to Use Black Box Automated Trading Systems in Forex

Leave a Reply Cancel reply