Illiquid Market Making: When the Feature Pipeline, Not the Model, Determines Whether You Survive

Ariel Silahian

Ariel Silahian is a senior technology executive in institutional electronic trading, with 30+ years across the buy and sell side (New York, Miami, London, Hong Kong). He is the author of "C++ High Performance for Financial Systems" (Packt) and the creator of VisualHFT, the open-source microstructure analytics stack. He writes on exchange architecture, market microstructure, and execution quality, and advises a select number of trading firms on infrastructure decisions that move P&L. Talk architecture: https://hftadvisory.com

Every market-making model you can name was designed for a book that is reliably there. Avellaneda-Stoikov (2008) included. Place it on an instrument that trades fifty times an hour and it does not underperform in some manageable way. It abandons you — and the abandonment is quiet enough that you may not notice until inventory has moved against you and there is no other side to quote.

This article is about what breaks, why it breaks specifically on thin books, and what the infrastructure between the book and the quote has to do to keep a sensible model alive. The model is rarely the problem. The feature layer feeding it almost always is.

The Model Assumption That Fails on a Thin Book
Why You Cannot Classify a Fill on a Thin Book
The Real-Time Feature Pipeline That Makes the Model Survivable
Illiquid MM Infrastructure Readiness Checklist
The Residual Risk You Cannot Engineer Away

The Model Assumption That Fails on a Thin Book

Avellaneda and Stoikov’s 2008 paper — “High-frequency trading in a limit order book,” Quantitative Finance 8(3): 217–224 — frames market making as a stochastic control problem. The key execution intensity function is λ(δ) = A·e^(−κδ), where δ is the distance from mid-price, A is the baseline arrival intensity, and κ is the decay coefficient controlling how quickly fill rates fall as quotes move away from mid. The model is elegant, well-specified, and well-cited for a reason. The load-bearing assumption it makes is that the baseline arrival process follows a constant-rate Poisson distribution — meaning trade arrivals are independent and arrive at a stationary average rate Λ.

On a liquid name trading thousands of times per hour, that assumption holds well enough to be useful. A sixty-second calibration window generates enough events to produce a stable estimate of A and κ. The noise in the arrival process averages out. The model has something real to work with.

At fifty trades an hour, a sixty-second window sees 0.83 events on average. You are not calibrating a parameter — you are guessing on noise. The intensity estimate you derive from that window carries uncertainty that swamps the signal. And every downstream decision the model makes — how aggressively to quote, how wide to go, when to pull — rests on a parameter estimate that is statistically indistinguishable from a prior with no data.

The academic literature has caught up to this failure mode. Bergault and Cognéville (2024), published in Quantitative Finance Vol. 25(10) 2025 (arXiv:2410.06839), state directly that “in illiquid markets, characterized by significant gaps between order levels due to sparse trading volumes, traditional LOB models often fall short.” Their proposed remedy — modeling arrivals with inhomogeneous Poisson processes that can accommodate sporadic, non-stationary arrivals — confirms that the constant-rate assumption is the structural gap, not a rounding error in implementation.

Thank you for reading this post, don't forget to subscribe!

Subscribe by Email

The Guéant, Lehalle, and Fernandez-Tapia (2013) extension of Avellaneda-Stoikov (Mathematics and Financial Economics 7(4): 477–507) improves the inventory management piece with closed-form solutions, but it inherits the same arrival-rate assumption wholesale. Extending the model does not patch the calibration problem.

None of this means abandon the A-S framework. It means the framework requires infrastructure that compensates for what it cannot do when the book is sparse. That infrastructure lives between the book and the quote.

Why You Cannot Classify a Fill on a Thin Book

Order flow imbalance (OFI) is the standard diagnostic for pressure in the book. Cont, Kukanov, and Stoikov (2014) — “The Price Impact of Order Book Events,” Journal of Financial Econometrics 12(1): 47–88 — established the formal basis: OFI captures net signed queue-size changes at the best bid and ask and shows a linear relation to price changes with a slope inversely proportional to market depth. When OFI is positive, buy pressure dominates. When it is zero, the book is balanced.

That interpretation is correct on a liquid book. On a thin one, OFI of zero has a second meaning: nothing has printed in the last eight minutes. The metric is stale. It is not signaling equilibrium. It is reporting an absence of data as if it were data. A feature built on clock-time accumulation will silently misrepresent the state of the book whenever the book goes quiet, and thin books go quiet often.

The toxicity detection problem compounds this. Cartea, Duran-Martin, and Sanchez-Betancourt (2023) — arXiv:2312.05827 — define a toxic trade as one where “a client can unwind the trade within a given time window and make a profit (i.e., a loss for the broker).” In liquid markets, the unwinding window is measured in milliseconds. On a thin book, you cannot tell from the fill whether you are holding an informed position until you observe whether the book refills — and refill takes seconds to minutes, not microseconds.

Volume-Synchronized Probability of Informed Trading (VPIN), developed by Easley, Lopez de Prado, and O’Hara (2012) in the Review of Financial Studies 25(5): 1457–1493, partially addresses sparse-arrival problems by accumulating in volume-time rather than clock-time. This prevents the false equilibrium reading that comes from stale clock-time OFI. It is a meaningful improvement. But VPIN still requires a minimum trade density to generate a volume bucket in a reasonable time horizon — on the thinnest books, the bucket does not close before the information has decayed or the position has already moved.

What actually works as a real-time classifier on a sparse book is refill time. When a level is swept and the book refills in milliseconds, the flow was uninformed — a participant exiting a position, not moving on superior information. When the level stays empty for minutes after a sweep, the fill was likely toxic. The informed participant moved the book and is not in a hurry to let you fade it back.

The classifier is not theoretical. It is observable in real time. It requires no modeling of latent variables. It just requires that the infrastructure is watching the right thing — which means a pipeline built to detect refill dynamics, not one built to process OFI on a fixed clock interval and treat silence as signal.

The Real-Time Feature Pipeline That Makes the Model Survivable

The three components that make a market-making model survivable on a thin book are not improvements to the model. They are pre-processing layers that correct what the book is feeding it.

Staleness-aware OFI. Every OFI feature in the pipeline should carry a staleness flag — a timestamp of the last event that contributed to the reading, and a rule for how far back the most recent event can be before the feature is treated as stale rather than informative. An OFI of zero on a liquid book means balanced pressure. An OFI of zero on a book that last printed eight minutes ago means you have no information about current pressure whatsoever, and treating the two as equivalent will produce quotes that are wrong in the second case in a systematic way. The fix is architectural: the feature needs to know when it last had real input, and it needs to communicate that to the model.

Kolm, Turiel, and Westray (2023) — “Deep order flow imbalance: Extracting alpha at multiple horizons from the limit order book,” Mathematical Finance 33(4): 1044–1081 — demonstrate that extending OFI across multiple book levels materially improves predictive accuracy versus best-bid-only OFI. Multi-level awareness does not solve the staleness problem on its own, but the architectural principle applies: the more context the feature carries about book state, the less likely the model is to treat absence as information.

Refill-time classifier. The second layer watches every level in the book following a sweep. It records time-to-refill and maintains a rolling baseline of what refill looks like on that instrument under normal conditions. A fast refill — within the normal range — is classified as uninformed flow. A slow or absent refill triggers an elevated toxicity flag. This flag should influence the model’s inventory stance in real time: wider quotes, reduced size, or pulled quotes until the book demonstrates it is not in a post-informed-trade state.

The design choice here matters. A timer-based hedge that fires every N milliseconds regardless of book state is operating blind. A fill-triggered hedge — one that fires on the fill event itself and adjusts the model’s parameters immediately — gives the system seconds of response time rather than waiting for the next timer tick. On a thin book, one toxic position can erase the spread revenue from dozens of clean trades. The hedge has to fire on the event, not on the schedule.

Fill-event-driven hedger. The third component is less about feature construction and more about execution timing. In liquid markets, inventory drift is slow enough that clock-based rebalancing is adequate. In illiquid ones, a single fill can be a material fraction of the day’s risk budget. The hedger needs to be wired to the fill event — not a 500ms timer, not a position reconciliation loop that runs every second. The event fires, the hedge fires. That is the architectural requirement.

The quants keep the model. The infrastructure is what I get commissioned to build around it — the layer that makes the model survivable on a book that is not reliably there.

In production deployments on instruments with these characteristics — thin altcoin books, EM FX crosses, small-cap equity desks with exchange-mandated quoting obligations — the refill-time classifier has consistently explained more of the adverse-selection attribution than any adjustment to the underlying model. The feature-pipeline gap is almost always larger than the formula gap. Closing it is the work that does not show up in the quant review.

Illiquid MM Infrastructure Readiness Checklist

The following diagnostic is framed for a CTO or Head of Trading reviewing a market-making stack deployed or being deployed on illiquid instruments. Each question is binary. There is no partial credit.

1. Does your OFI feature carry a staleness flag? Specifically: does the feature propagate the timestamp of its most recent contributing event alongside the value itself, and does the consuming model have a rule for when to treat the feature as stale rather than informative?

2. Does your intensity calibration require a minimum event count before the estimate is trusted? A calibration window that accepts 0.83 events as a valid sample is producing a noise-dominated parameter. Is there a floor — a minimum event count — below which the system falls back to a prior rather than using the window estimate?

3. Does your inventory hedge trigger on the fill event or a time interval? If the hedger fires on a schedule rather than on fill, the question is: how many basis points of P&L does the time lag cost on your worst-fill instruments, and is that cost visible in your post-trade attribution?

4. Can your refill detector distinguish millisecond recovery from multi-minute absence? Is there any component in the pipeline that classifies sweep events by the time-to-refill of the affected level? Or does the system treat all sweeps identically regardless of subsequent book behavior?

5. Is your OFI computed in volume-time or clock-time? If clock-time: does the system have any correction for periods where the volume contribution to a time bucket is near zero? A bucket that closes on the clock rather than on volume accumulation will systematically misrepresent pressure in thin markets.

6. Does your post-trade attribution break out toxic fills explicitly? Toxic flow is invisible in aggregate P&L until it is material. If the analysis does not distinguish fills where the book failed to refill from fills where it recovered normally, the attribution cannot tell you whether you are paying for infrastructure gaps or model gaps.

7. Is there a per-instrument liquidity regime detector that gates which features and parameters are active? An instrument that is liquid in the morning session and thin in the afternoon session is not the same instrument all day. A stack that does not detect the regime transition will run liquid-book assumptions into a thin-book environment for the hours where the misalignment is most expensive.

If the answer to any of the first four questions is no, the feature layer has a gap. The question worth bringing to the next architecture review is not which model to run — it is which of these gaps is costing the most, and in which order they should close.

The Residual Risk You Cannot Engineer Away

Building this pipeline correctly buys real margin. It gives the model accurate features, a credible calibration, and a hedge that fires when it needs to. That is the infrastructure layer. It is the difference between a model that is theoretically sound and one that survives contact with a book that is not always there.

But there is an honest boundary to draw. Even with the pipeline right, a quiet book and a loaded one look identical until the fill goes against you. The infrastructure buys you seconds to react. It does not tell you which book you are in before the fill.

The standard for a survivable illiquid MM stack is a feature layer that responds to book state rather than a clock. If your infrastructure cannot distinguish an OFI reading of zero from an OFI that went stale eight minutes ago — if your hedger is on a 500ms timer rather than wired to the fill event — the pipeline gap is still open, and the model is absorbing costs the model cannot see.

The infrastructure question and the model question are separable. Teams that conflate them spend months on model refinement and never close the feature-layer gap. The gap does not close on its own.

This article was originally shared as a LinkedIn post.

Never Miss an Update

Get notified when we publish new analysis on HFT, market microstructure, and electronic trading infrastructure. No spam.

Subscribe by Email

Ariel Silahian

Illiquid Market Making: When the Feature Pipeline, Not the Model, Determines Whether You Survive

Ariel Silahian

Table of Contents

The Model Assumption That Fails on a Thin Book

Why You Cannot Classify a Fill on a Thin Book

The Real-Time Feature Pipeline That Makes the Model Survivable

Illiquid MM Infrastructure Readiness Checklist

The Residual Risk You Cannot Engineer Away

Never Miss an Update

Leave a Reply Cancel reply

Subscribe to Updates

Ariel Silahian

Table of Contents

The Model Assumption That Fails on a Thin Book

Why You Cannot Classify a Fill on a Thin Book

The Real-Time Feature Pipeline That Makes the Model Survivable

Illiquid MM Infrastructure Readiness Checklist

The Residual Risk You Cannot Engineer Away

Never Miss an Update

Related Posts

Crypto Fund Execution Infrastructure: The Real Cost Stack at $50M AUM Before Your First OMS Contract

VPIN and Real-Time Order Toxicity: What Your Execution Stack Cannot See Before the Fill

FPGA Trading Infrastructure: 7 Failure Modes That Cost More Than the Hardware

Leave a Reply Cancel reply