Building an algorithmic trading platform in Python 3

Vita Smid | PyCon CZ

June 9, 2017

Hi, I am Vita.

I am a software engineer specializing in difficult, mathy problems.

Quantlane

  • We develop and run a stock trading platform and strategies.
  • Small team, lean principles.
  • All back-end code is Python 3.5 / 3.6.
  • We also like Docker, React, Redis, Kafka, PostgreSQL, TensorFlow, …

Our challenge

How did we build this?

What mistakes did we make?

Chapter I

The Prototype

One small asyncio application

  • Only one data provider and a single stock exchange supported.
  • Most components polling each other.
  • Some independent components communicating through queues.

Decoupled components


class SomeComponent:

    def start(self):
        self._is_running.set() # asyncio.Event
        self._run_task = asyncio.ensure_future(self._run())
    async def _run(self):
        while self._is_running.is_set():
            message = await some_queue.get() # asyncio.Queue
            self._process(message)
    async def stop(self):
        self._is_running.clear()
        self._run_task.cancel()
        with contextlib.suppress(asyncio.TimeoutError):
            await self._run_task

Pickle persistence

  • Platform state represented as a single big dictionary with “namespaces”…
  • …serialised by Pickle and written to a file.

Web-based user interface

  • React
  • No business logic in the client.
  • Server state represented as collections:
    named lists of dicts.
  • Each collection is serialised to JSON and sent over WebSockets on every model change.

aiomanhole

  • Interactive Python session inside your running process.
  • The process is not interrupted, thanks to asyncio.

Chapter II

The Monolith

Big, multi-component monolith

  • Many data providers and many stock exchanges supported in the same system.
  • Hundreds of components in a single process.
  • Move towards a fully event-driven design.

Real publish-subscribe system


hub = aiopubsub.Hub()

subscriber = aiopubsub.Subscriber(hub, 'subscriber_id')
subscriber.subscribe(('some', 'namespace'))

publisher = aiopubsub.Publisher(hub, prefix = ('some',))
publisher.publish(('namespace',), 'Hello subscriber')

key, message = await subscriber.consume()
assert key == ('some', 'namespace')
assert message == 'Hello subscriber'

Redis persistence

  • State is still pickled, but saved in Redis.
  • Redis data structures (lists, hashes) are utilised for tiny atomic writes.
    
    lst = RedisBackedReadWritableList('redis-key', ...)
    lst.append('hi') # writes 'hi' to Redis
    lst[0]           # reads 'hi' from memory, not from Redis
    del lst[0]       # deletes 'hi' from Redis
    							
  • Writes are asynchronous and can be debounced.

Chapter III

The Ecosystem

Distributed platform

  • Trading is carried out by 20+ instances of various services.
  • Data is distributed through Apache Kafka.
  • Request-response communication is managed by a custom RPC system.
  • Everything is serialised to binary.

Apache Kafka

  • Data-producing processes decoupled from data-consuming processes.
  • Messages have fixed schemata with versioning and migrations.
  • Payloads are binary-encoded using Apache Avro.

Custom RPC

  • asyncio + aiohttp
  • Standard Avro protocols
  • Server discovery via etcd (coming soon).

Custom user interface protocol

  • Several data structures: lists, records, hashmaps.
  • Element-level updates (insert/update/delete an item).
  • Serialised to Avro.

trades[3] = {'price': '15.10', 'quantity': 500}
trades[4] = {'price': '15.11', 'quantity': 150}

schema_id = 123
payload = {'start': 3, 'stop': 5,
    'items': [{'price': '15.10', 'quantity': 500},
              {'price': '15.11', 'quantity': 150}]}

007b0014060a040a31352e3130e8070a31352e3131ac0200

Epilogue

8 takeaways

  1. asyncio is great for runtimes with many independent components.
  2. Decouple your components but don’t forget to invest in monitoring and debugging.
  3. Don’t use Pickle.
  4. Publish-subscribe works nicely within a single process and scales across processes naturally.
  1. Lean into distributed systems (like Kafka) slowly. Operational experience is priceless.
  2. Use fixed schemata… with evolution.
  3. Use binary formats like Avro.
  4. Invent your own protocols when it makes sense.

Thank you

quantlane.com