FX Interbank Trading System Architecture: Building a Unified Price Aggregation and Contribution Platform with Enterprise Integration Patterns

Ariel Silahian

Ariel Silahian is a senior technology executive in institutional electronic trading, with 30+ years across the buy and sell side (New York, Miami, London, Hong Kong). He is the author of "C++ High Performance for Financial Systems" (Packt) and the creator of VisualHFT, the open-source microstructure analytics stack. He writes on exchange architecture, market microstructure, and execution quality, and advises a select number of trading firms on infrastructure decisions that move P&L. Talk architecture: https://hftadvisory.com

A few years ago I spent the better part of two years inside an FX interbank system at a large sell-side institution. The bank ran roughly a dozen liquidity provider relationships, contributed prices to EBS and LSEG FX Matching, serviced an internal sales desk, and managed credit give-up through a prime-brokerage FIX gateway. The system had grown organically: each LP connectivity team had built its own FIX session manager, the contribution path to ECNs was a separate C++ monolith, and the sales blotter pulled from a shared database that was four seconds stale during data releases. The bank wanted one platform: LP prices in, aggregation and analytics in the middle, clean contribution out to venues, and a live blotter UI that would not freeze during NFP.

This article documents the architecture we built, structured as a discovery walk through the integration problem, the same way the Hohpe and Woolf Enterprise Integration Patterns book documents the bond pricing system: candidate options laid out, tradeoffs weighed concretely, a pattern chosen, and an honest account of what the pattern fixed and what it did not. The EIP names are not decorative. They are the vocabulary that lets you communicate the design unambiguously across team boundaries, and on a project spanning a C++ FIX connectivity team, a Java middleware team, and a trading desk that had opinions about every architectural choice, that vocabulary mattered.

The BIS April 2022 Triennial put global FX spot turnover at $7.5 trillion per day. Electronic execution reached 59% of FX market turnover by the April 2025 BIS survey. Those numbers describe the scale of liquidity flow this architecture class must handle.

Building the System
Architecture with Patterns
Bridging FIX/C++ to the Message Bus
Structuring the Channels
Selecting a Message Channel
Problem Solving: Flash Quotes and Excessive Update Rate
Production Incident: The Dead Letter Queue Cascade
What This Architecture Cannot Handle
The FX Aggregation Resilience Checklist

Building the System

The high-level flow: LP price streams arrive from a dozen counterparties, an aggregation layer constructs a best-bid-offer (BBO) and applies per-tier spread and skew imposition, modified prices flow out to ECN venues (EBS for EUR/USD, USD/JPY, and USD/CHF; LSEG FX Matching for cable, AUD/USD, and USD/CAD), and a trader blotter UI displays everything live. Alongside the outbound ECN contribution runs an STP path over SWIFT MT300, and the prime-brokerage give-up credit intermediation FIX session sits between the platform and the executing venues.

The inherited legacy was four independent systems: a C++ LP connectivity layer managing FIX 4.4 sessions (two LPs on proprietary binary feeds), a prime-broker FIX gateway, a C++ contribution server with hard-coded ECN connections, and a back-office SWIFT MT300 engine. None talked to each other through a shared integration layer.

Architecture with Patterns

The first decision was integration style. Gregor Hohpe’s taxonomy covers four options: File Transfer, Shared Database, Remote Procedure Call, and Messaging.

Thank you for reading this post, don't forget to subscribe!

Subscribe by Email

File Transfer was rejected on latency grounds. LP price streams deliver, in my experience, several hundred to several thousand quotes per second per pair per LP during active sessions. File transfer granularity is seconds; LP quote validity windows are milliseconds.

Shared Database was the existing architecture. The database was simultaneously a staging area, a logging store, and a live-price bus, queried by polling from every consumer. Under data releases it was a thundering herd: dozens of polling connections hammering a SELECT while the C++ writer performed bulk inserts. The schema had become a de facto API contract; a column change required coordinating four teams. We wanted abstraction.

Remote Procedure Call has a specific failure mode in a price-distribution context. If the Pricing Gateway uses RPC to push prices to each blotter client, it must track every active client session and issue a concurrent blocking call on every new tick. With ten traders on two blotters each, a data-release quote spike means twenty concurrent blocking calls per update per pair per LP. The gateway becomes a fan-out traffic amplifier, and connection tracking, reconnection logic, and back-pressure handling all move into the pricing layer where they do not belong.

Messaging with a Publish-Subscribe Channel solves the fan-out problem cleanly: the gateway publishes to a channel; every interested subscriber receives it; the gateway does not track listeners. Clients with partial interest (a USD/JPY specialist who does not trade cable) simply do not subscribe to irrelevant channels. Disconnection and reconnection are the broker’s problem.

We chose a JMS broker, since the bank already ran JMS infrastructure for its equity OMS. The net result was a Message Bus: all Pricing Gateway-to-client and client-to-Pricing-Gateway communications flow over the bus. The Contribution Gateway, the blotter UI, and the analytics engine are all bus citizens. The C++ FIX connectivity layer was the exception.

Bridging FIX/C++ to the Message Bus

The hardest early problem was not the messaging architecture, it was the language and protocol boundary between the C++ FIX connectivity layer and the Java JMS bus.

The naive approach is a Message Translator: a component that reads an LP’s raw FIX quote (MarketDataSnapshotFullRefresh 35=W, or MarketDataIncrementalRefresh 35=X) and normalizes it into a canonical internal structure for publication onto the JMS bus. A Message Translator works when the integration boundary is semantic: two systems on the same messaging infrastructure using different formats.

The problem here was deeper. The C++ FIX layer could not load a JMS client library. The Java JMS client could not safely load into a C++ process through JNI, because the bank had a standing prohibition on JNI in production after a JVM exception crossing a JNI boundary had corrupted a C++ thread’s stack. The two sides were in different languages, running in different processes, with no shared memory space.

The EIP solution is a Channel Adapter: a lightweight intermediary process whose sole job is to connect a non-messaging system (the C++ FIX layer) to a messaging system (the JMS bus), translating both protocol (FIX to JMS message) and transport (TCP FIX session to JMS destination). As a separate process it sidesteps the JNI constraint entirely.

We built two Channel Adapter processes: one C++ process owning FIX session management, subscribing to LP price updates via MarketDataRequest (35=V) and receiving the 35=W and 35=X stream; and one Java process owning a JMS connection and publishing canonical price messages onto the bus. The two communicated over CORBA, the bank’s standard cross-language IPC.

Together they constitute a Messaging Bridge: a component connecting two distinct messaging systems (the FIX TCP stream and the JMS bus) by translating across protocol and transport boundaries. The Message Translator we originally considered is effectively implemented by this Messaging Bridge, except the translation happens at the process level via two Channel Adapters rather than inside a single component. This is a pattern implementing a pattern, one of the most useful ideas in the EIP vocabulary.

The same Messaging Bridge structure appeared on the contribution side. The Contribution Gateway received contribution triggers from the JMS bus and translated them outbound through a C++ Channel Adapter managing venue FIX sessions: NewOrderSingle 35=D to EBS across the iLink interface (EBS has migrated onto CME Globex; iLink 3 Binary is now mandatory, iLink 2 retired).

Structuring the Channels

With the Message Bus established and the Messaging Bridge in place, the next question was the channel taxonomy.

The subject hierarchy we designed (TIBCO-style hierarchical subjects map cleanly onto JMS topic naming conventions):

fx.spot.eurusd.lp.citi, raw EUR/USD stream from Citi
fx.spot.eurusd.lp.db, raw EUR/USD stream from Deutsche Bank
fx.spot.eurusd.bbo, best bid and offer constructed across all LPs
fx.spot.eurusd.tier1, BBO with tier-1 spread applied (top-tier institutional clients)
fx.spot.eurusd.tier2, BBO with tier-2 spread applied (corporate flow)
fx.spot.cable.lp.barclays, and so on, per pair per LP

Channel count. Roughly 30 actively quoted spot pairs, a dozen LPs (5 to 8 per major pair in practice), two BBO channels per pair, two or three tier channels: total channel count stayed comfortably below 2,000. Manageable on any modern JMS broker.

First design question: per-LP-pair channels versus a smaller set of channels with subscribers using a Message Filter or Selective Consumer.

The case for fewer channels: simpler management, one fx.spot.eurusd.all topic, subscribers filter by LP. The case against: the filter runs per-consumer, per-message. During a data release, all twelve LPs are simultaneously bursting into that single channel. Every consumer (aggregation engine, BBO publisher, contribution gateway, blotter) receives every message and must discard what it does not need. The filtering is also stateless: it tells you what to drop but cannot encode LP priority ordering without adding logic to each consumer individually.

Per-LP channels make the filtering structural: a consumer that only needs the BBO subscribes to fx.spot.eurusd.bbo and receives nothing from the raw LP layer at all. The Pricing Gateway subscribes to fx.spot.eurusd.lp.* across all LPs. Zero runtime discard cost.

Second design question: where does the Content-Based Router that applies spread/skew and collapses LP channels into tier channels live?

The tempting answer is the C++ Channel Adapter, which already receives every raw LP quote and could publish directly to fx.spot.eurusd.tier1. The problem: a Channel Adapter is supposed to be a generic Message Bus citizen with no business logic. The spread/skew matrix is owned by the trading desk and changes frequently; putting it in the C++ layer means a C++ deployment for every spread adjustment. It also splits business logic across two languages and two process boundaries.

The correct placement is the Pricing Gateway on the Java side. It subscribes to all LP raw channels, receives every 35=W and 35=X, applies aggregation (BBO construction) and spread/skew (tier pricing), and publishes to the tier channels. The Channel Adapter translates FIX to JMS and nothing else.

Selecting a Message Channel

JMS provides two primitives: Point-to-Point Channel (Queue) and Publish-Subscribe Channel (Topic). Getting the choice wrong produces either duplicate processing or broken client behavior.

The scenario that forced the decision: a sales trader monitored EUR/USD tier-1 on two blotter sessions simultaneously. With a Point-to-Point channel, each message is delivered to exactly one consumer, so the two sessions see interleaved prices, not synchronized ones. Operationally untenable.

A Recipient List could fix this: the Pricing Gateway maintains a per-session subscription registry and explicitly addresses each message to each active session. It works, but it requires the gateway to track every session, handle reconnection, manage subscription lifecycle, and clean up stale entries on disconnect, the same complexity trap the RPC approach presented.

Publish-Subscribe on per-tier channels resolves it: both sessions subscribe to fx.spot.eurusd.tier1 and both receive every message. The Pricing Gateway publishes once. Session management is the broker’s responsibility.

The Publish-Subscribe failure mode on the server side: if two Contribution Gateway instances both subscribe to the same pub-sub BBO channel for redundancy, both receive every BBO update and both attempt to publish to EBS. A double-contribution, two identical prices from the same institution in rapid succession, produces two live quotes on EBS from the same counterparty. That is a data-integrity problem.

The resolution: channel type is determined by direction, not component. Server-to-client price distribution (Pricing Gateway to blotter, Pricing Gateway to risk engine): Publish-Subscribe. Every consumer needs every message. Server-to-venue contribution (Contribution Gateway to EBS): Point-to-Point. Exactly one active contributor at a time, standard queue-based failover. Client-to-server commands (spread-adjustment commands, analytics-parameter changes): Point-to-Point. Processed exactly once, not broadcast.

Each channel carries traffic in one direction; the direction determines the type.

Problem Solving: Flash Quotes and Excessive Update Rate

About three weeks after go-live, the blotter UI began freezing during the 8:30 AM US data release window. EUR/USD alone was receiving tightly bunched updates from multiple LPs in short succession; the GUI thread could not process incoming JMS messages fast enough and queued behind a growing backlog. Two patterns address this: Message Filter and Aggregator.

A Message Filter with a time-based rule drops any quote arriving within N milliseconds of the previously accepted one for the same pair (a value I have calibrated in practice between 20 and 100ms depending on the use case). It is simple and provably bounds the message rate. The problem is data integrity. An FX spot quote is not a single scalar: it carries bid, ask, bid-size bands, ask-size bands, tenor, and spread imposition per tier. During a data release these fields update across successive messages: mid moves on tick one, size bands adjust on tick two, spread imposition recalculates on tick three. A filter dropping ticks two and three delivers a blotter price with a new mid but old size bands and old spread. Internally inconsistent.

An Aggregator holds a current snapshot of each pair’s full quote state. When a new 35=W or 35=X arrives, it merges the incoming fields into the snapshot: new mid overwrites old mid, new size bands overwrite old size bands. It emits a complete, internally consistent snapshot on request, not on every incoming message. The Aggregator is the right pattern when multiple messages together constitute a complete event and each message carries only a partial update.

The emit-timing question: time-driven emission produces unnecessary traffic in quiet conditions; event-driven emission on every incoming message is marginally better than no Aggregator during a rate spike. The correct answer is to make the consumer a Polling Consumer rather than an Event-Driven Consumer. The blotter sends a Command Message to the Aggregator, give me the current EUR/USD tier-1 snapshot, and the Aggregator replies with a Document Message: the full current state. The blotter controls the polling rate at approximately its render-loop frequency. During a data release with a hundred LP quotes per second, the blotter polls at 30 Hz and sees 30 fully consistent snapshots per second. The message queue does not grow; it is drained by design. The consumer controls the flow; the Aggregator absorbs the rate mismatch.

Last-look as an additional Aggregator input. When an LP streams indicative quotes via 35=V/35=W and a taker sends a NewOrderSingle (35=D), the LP holds for its last-look window then accepts or rejects via ExecutionReport (35=8). A reject rate rising above the LP’s historical baseline signals strategic behavior: the LP is quoting to gather information, not provide liquidity. The Aggregator tracks each LP’s rolling accept/reject ratio and de-weights LPs with rising reject rates in BBO construction. This is distinct from quote staleness (an old quote); it is quote quality (a strategically misleading quote). The Aggregator is the right home for this logic because it holds state across multiple messages from the same LP.

Production Incident: The Dead Letter Queue Cascade

Six weeks after go-live the system went down during the European open. The JMS broker crashed and on restart found its persistent message store inconsistent. Recovery took eleven minutes: contribution to EBS and LSEG was offline, the blotter was dark.

Post-mortem traced the failure to the broker’s Dead Letter Channel, the queue where the broker parks messages that exceed their time-to-live or cannot be delivered. The dead-letter queue had grown to several hundred thousand messages, consuming the broker’s persistence storage limit. The broker deadlocked trying to write new dead-letter entries while simultaneously vacuum-compacting the store.

The root cause: a slow consumer on the Contribution Gateway. The EBS FIX session’s acknowledgment latency had risen during the European open, and the consumer was doing a synchronous wait-for-ack before processing the next message. The BBO channel was producing faster than the Gateway could consume, in-flight message count grew, messages passed their Message Expiration TTL (2-second TTL on market data, reasonable for contribution, but the consumer was now processing at 4-second intervals), and the broker routed the expired messages to the Dead Letter Channel. The queue had been growing silently for three days; no alert was configured on its depth.

The fix candidates were Competing Consumers and Message Dispatcher.

Competing Consumers (multiple consumer instances on one channel, each processing one message) scales throughput on a Point-to-Point channel. The Contribution Gateway’s channel was correctly Point-to-Point, but Competing Consumers here means two Contribution Gateway instances racing to contribute to EBS. EBS expects a single FIX session per counterparty; two concurrent contribution sessions either get rejected by EBS or produce a double-quoting scenario. Wrong tool.

Message Dispatcher is the right pattern. A single consumer (the Dispatcher) listens to the channel and delegates each incoming message to one of a pool of worker threads (the Performers), then returns immediately. The Performers do the actual work: constructing the FIX NewOrderSingle, sending it over the iLink session, waiting for acknowledgment, each on its own thread, multiple quotes in-flight simultaneously. The Dispatcher’s onMessage returns in microseconds regardless of how long the Performers take. The broker’s in-flight window never backs up.

We implemented a JMSListener Dispatcher holding a fixed-size pool of JMSListener Performers in the Contribution Gateway. Each incoming BBO message went to the first available Performer. The Dispatcher returned immediately. The Performers managed iLink acknowledgment waits independently.

We added two pieces of FX-specific hardening. First, a contribution-layer staleness check: before any Performer sends a NewOrderSingle to EBS, it checks the message timestamp. If the message age exceeds a configurable threshold (in my experience, 200ms to 1s depending on pair and venue), the Performer discards without contributing. A second line of defense behind Message Expiration: a message can survive the broker TTL and still be too stale to contribute.

Second, an empty-BBO circuit breaker: if the Aggregator determines that fewer than a minimum number of LPs (in practice, two) are actively quoting a pair, the Pricing Gateway stops publishing to the BBO channel and publishes instead to fx.spot.eurusd.bbo.suspended. On receiving a suspended message, the Contribution Gateway sends QuoteCancel (35=Z) to EBS and LSEG and withdraws the bank’s prices. Contributing on a one-LP or zero-LP BBO is more dangerous than not contributing at all.

I should be honest about the same thing the bond system’s documentation is honest about: the Message Dispatcher improved throughput and eliminated the dead-letter cascade under normal conditions. It did not fully solve the problem. The real issue was the synchronous ack-wait pattern being architecturally mismatched with variable-latency iLink acknowledgment. The full fix came months later when we refactored the contribution path to route the Pricing Gateway’s output directly into a dedicated contribution queue, bypassing the shared BBO channel, and made the EBS session management fully asynchronous. Patterns help you design and maintain systems. They do not paper over structural mismatches in how latency budgets are allocated.

What This Architecture Cannot Handle

Simultaneous wholesale LP withdrawal. On January 15, 2015, the Swiss National Bank removed the EUR/CHF floor. Per FXCM reporting at the time, CHF spreads reached 2,000 to 3,000 pips and meaningful liquidity was absent for approximately 40 minutes. When every LP simultaneously withdraws, the empty-BBO circuit breaker fires across the CHF universe simultaneously and the platform stops contributing, which is correct behavior. The problem is positions already in-flight and books built on pre-dislocation prices. A circuit breaker is not a position-size governor. The GBP flash crash of October 7, 2016, where cable fell more than 6% in under two minutes (per BIS), is a milder case: LP coverage did not disappear entirely, contribution continued, and the circuit breaker may have fired briefly on individual LPs as last-look reject rates spiked. January 2015 CHF is the hard boundary; October 2016 GBP is inside the survivable envelope.

Settlement scope is external. CLS settles over $8 trillion per day across 18 currencies with approximately 96% funding compression (CLS published operational data). The SWIFT MT300 confirmation feeds the CLS settlement cycle. NDF legs, FX swap second legs, and transactions in currencies outside the 18 CLS currencies settle bilaterally. None of this flows through the pricing and contribution layer described here; the integration boundary is the point-to-point STP channel.

Venue protocol evolution is an ongoing Channel Adapter versioning cost. EBS has migrated onto CME Globex; iLink 3 Binary is mandatory, iLink 2 retired; CME’s MDP market data uses conflated UDP with snapshots at approximately 5-millisecond intervals. The Channel Adapter connecting the Contribution Gateway to EBS must handle iLink 3 Binary session management, not the FIX 4.4 session management in place when this system was originally built. The SWIFT family migration is a parallel forcing function: the cross-border MT payment coexistence period ended November 22, 2025; fxtr.003 ISO 20022 for FX trade confirmations is on a later timeline but the direction is established. The STP path’s Channel Adapter will need re-versioning.

Pre-trade hard blocks. The Citi incident of May 2022: a trader executed 284 orders intended to be $58 million but mis-keyed as $444 billion, resulting in approximately $1.4 billion market impact before correction. The FCA and PRA fined the bank over £61 million. The finding included the absence of a hard pre-trade block on notional size. The contribution layer is the last point before a price goes live to an ECN. The Contribution Gateway’s Performer pool is the correct placement: before any Performer sends NewOrderSingle (35=D), it checks the contribution notional against a hard limit and rejects-and-alerts on breach. A soft warning is not a substitute.

The FX Aggregation Resilience Checklist

These are the items I check when reviewing an FX aggregation and contribution platform. Each maps to a named pattern or named failure mode from the architecture above.

Channel Adapter isolation per LP. Each LP has its own C++ Channel Adapter process. An LP session crash or FIX protocol violation does not propagate to other LP sessions or the Java bus.

Messaging Bridge is the only cross-language boundary. The C++/Java boundary is crossed exclusively via the Messaging Bridge (two Channel Adapters over CORBA or shared-memory IPC). No JNI.

Publish-Subscribe for server-to-client price distribution; Point-to-Point for server-to-venue contribution and client-to-server commands. Per channel, per direction. Any deviation requires explicit justification.

Content-Based Router lives in the Pricing Gateway, not the Channel Adapter. Spread/skew matrix, tier assignment, and BBO construction belong on the Java side. The C++ Channel Adapter is a generic bus citizen with no pricing logic.

Aggregator, not Message Filter, for rate control on the blotter path. Message Filter creates internal inconsistency between bid, ask, and size-band fields. Aggregator holds full quote state and emits complete snapshots on request.

Blotter is a Polling Consumer. Command Messages to the Aggregator at render rate; no Event-Driven push. LP quote rate and UI render rate are decoupled structurally, not by throttling.

Message Dispatcher with Performer pool on the contribution path. Dispatcher returns immediately; Performers handle ECN session latency asynchronously. Dead Letter Channel depth monitored with an operational alert at a shallow threshold, in my experience a few thousand messages is an early warning, not a crisis point.

Message Expiration TTL plus contribution-layer staleness check. Two independent layers: broker discards beyond TTL; Performer discards beyond age threshold. A message can survive the broker TTL and still be too stale to contribute.

Empty-BBO circuit breaker. If fewer than the minimum LP count is actively quoting a pair, publish to bbo.suspended and send QuoteCancel (35=Z) to all venues. Never contribute on a one-LP or zero-LP BBO.

Last-look reject-rate de-weighting. Aggregator tracks rolling accept/reject ratios on 35=D orders per LP. A rising reject rate means strategic LP behavior, not infrastructure failure, and down-weights that LP in BBO construction.

Settlement scope documented and bounded. CLS covers $8T/day across 18 currencies; NDF legs and out-of-scope currencies settle bilaterally. The STP path is a separate integration concern. Do not let the platform team assume contribution equals settlement coverage.

Pre-trade hard block on contribution notional. Performer-level check before any NewOrderSingle (35=D) is routed to an ECN. Configurable notional threshold; reject-and-alert on breach. The Citi May 2022 $444B vs $58M scenario is the reference point: a hard block that a soft warning cannot substitute for.

Conclusion

The pattern vocabulary, Publish-Subscribe Channel, Message Bus, Channel Adapter, Messaging Bridge, Content-Based Router, Aggregator, Polling Consumer, Message Dispatcher, Dead Letter Channel, Message Expiration, does not make design decisions for you. It names the options and the tradeoffs with enough precision to reason about them without talking around them. In a system spanning C++, Java, FIX, CORBA, JMS, and four distinct venue protocols, the naming layer is not trivial.

The gap I have not closed on this architecture as described: the Aggregator’s BBO weighting logic and the last-look de-weighting algorithm both depend on rolling LP quality metrics computed in-process and not persisted across restarts. A broker restart or Pricing Gateway restart resets the LP quality weights to neutral. During the reset window, in my experience three to five minutes while rolling windows repopulate, BBO construction operates without LP quality discrimination. That is a known exposure. Whether you persist the quality metrics to a side store, accept the reset window as tolerable, or have found a third approach is the technical question this case study leaves open.

Drawn from 20+ years architecting electronic trading systems across venue classes. If you are auditing an FX aggregation or contribution stack, the Microstructure Diagnostics engagement applies this checklist to your live architecture.

Never Miss an Update

Get notified when we publish new analysis on HFT, market microstructure, and electronic trading infrastructure. No spam.

Subscribe by Email

Ariel Silahian

Leave a Reply Cancel reply

Subscribe to Updates

Ariel Silahian

Table of Contents

Building the System

Architecture with Patterns

Bridging FIX/C++ to the Message Bus

Structuring the Channels

Selecting a Message Channel

Problem Solving: Flash Quotes and Excessive Update Rate

Production Incident: The Dead Letter Queue Cascade

What This Architecture Cannot Handle

The FX Aggregation Resilience Checklist

Conclusion

Never Miss an Update

Related Posts

10 Things that Affect the Speed of a Trading System

Six Market Microstructure Signals That Fire Before the Price Print: A Practitioner’s Execution Quality Architecture

How Can Machine Learning Predict the Stock Market?

Leave a Reply Cancel reply