/ Docs

Improving performance

This page contains tips that may help to increase indexing speed.

Configure table indexes

Postgres indexes are internal tables that Postgres can use to speed up data lookup. A database index acts like a pointer to data in a table, just like an index in a printed book. If you look in the index first, you will find the data much quicker than searching the whole book (or — in this case — database).

You should add indexes on columns often appearing in WHERE clauses in your GraphQL queries and subscriptions.

DipDup ORM uses BTree indexes by default. To set index on a field, add index=True to the field definition:

from dipdup import fields
from dipdup.models import Model

class Trade(Model):
    id = fields.BigIntField(pk=True)
    amount = fields.BigIntField()
    level = fields.BigIntField(index=True)
    timestamp = fields.DatetimeField()

Perform heavy computations in a separate process

For the most deferred calculations you can use built-in job scheduler. However, DipDup jobs are executed in the same asyncio loop as the rest of the framework, so they can affect indexing performance.

If you decide to implement a separate service to perform heavy computations, you can implement an additional DipDup CLI command to run it. That way you can reuse the same config and environment variables. Create a new file cli.py in the project root directory:

from contextlib import AsyncExitStack

import asyncclick as click
from dipdup.cli import cli, cli_wrapper
from dipdup.config import DipDupConfig
from dipdup.context import DipDupContext
from dipdup.utils.database import tortoise_wrapper

@cli.command(help='Run heavy calculations')
@click.option('-k', '--key', help='Command option')
async def heavy_stuff(ctx, key: str) -> None:
    config: DipDupConfig = ctx.obj.config
    url = config.database.connection_string
    models = f'{config.package}.models'

    async with tortoise_wrapper(url, models):

if __name__ == '__main__':
    cli(prog_name='dipdup', standalone_mode=True)

Then use python -m dipdup_indexer.cli instead of dipdup as an entrypoint. Now you can call heavy-stuff like any other command. dipdup.cli:cli group handles arguments and config parsing, graceful shutdown, and other boilerplate. Keep in mind that DipDup ORM is not thread-safe.

python -m dipdup_indexer.cli -c dipdup.yaml heavy-stuff --key value

Or in Dockerfile:

ENTRYPOINT ["python", "-m", "dipdup_indexer.cli"]
CMD ["-c", "dipdup.yaml", "heavy-stuff", "--key", "value"]

Reducing disk I/O

Indexing produces a lot of disk I/O. During development you can store the database in RAM. By default DipDup uses in-memory SQLite database dropped on exit. Using tmpfs instead allows you to persist the database between process restarts until the system is rebooted. By default, tmpfs is mounted on /tmp with a size of 50% of RAM. The following spells are for Linux, but on macOS the process should be similar.

# Make sure tmpfs is mounted
$ df -h /tmp
Filesystem      Size  Used Avail Use% Mounted on
tmpfs            16G  1.0G   16G   7% /tmp

# You can change the size of tmpfs without unmounting it
$ sudo mount -o remount,size=64G,noatime /tmp

# But make sure that you have enough swap for this
$ free -h
               total        used        free      shared  buff/cache   available
Mem:            30Gi        16Gi       3,1Gi       1,3Gi        11Gi        12Gi
Swap:           31Gi       6,0Mi        31Gi

# Update database config to use tmpfs
$ grep database -A2 dipdup.yaml
  kind: sqlite
  path: /tmp/uniswap.sqlite

# After you've done indexing, move the database from RAM to disk
$ mv /tmp/uniswap.sqlite ~/uniswap.sqlite

Commands above were checked on Linux, but on macOS the process should be similar.

Help and tips -> Join our Discord
Ideas or suggestions -> Issue Tracker