Improving performance

This page contains tips that may help to increase indexing speed.

Configure table indexes

Postgres indexes are internal tables that Postgres can use to speed up data lookup. A database index acts like a pointer to data in a table, just like an index in a printed book. If you look in the index first, you will find the data much quicker than searching the whole book (or — in this case — database).

You should add indexes on columns often appearing in WHERE clauses in your GraphQL queries and subscriptions.

DipDup ORM uses BTree indexes by default. To set index on a field, add db_index=True to the field definition:

from dipdup import fields
from dipdup.models import Model


class Trade(Model):
    id = fields.BigIntField(primary_key=True)
    amount = fields.BigIntField()
    level = fields.BigIntField(db_index=True)
    timestamp = fields.DatetimeField()

Perform heavy computations in a separate process

For the most deferred calculations you can use built-in job scheduler. However, DipDup jobs are executed in the same asyncio loop as the rest of the framework, so they can affect indexing performance.

If you decide to implement a separate service to perform heavy computations, you can implement an additional DipDup CLI command to run it. That way you can reuse the same config and environment variables. Create a new file cli.py in the project root directory:

cli.py
from contextlib import AsyncExitStack

import click
from dipdup.cli import cli, _cli_wrapper
from dipdup.config import DipDupConfig
from dipdup.context import DipDupContext
from dipdup.utils.database import tortoise_wrapper


@cli.command(help='Run heavy calculations')
@click.option('-k', '--key', help='Command option')
@click.pass_context
@_cli_wrapper
async def heavy_stuff(ctx, key: str) -> None:
    config: DipDupConfig = ctx.obj.config
    url = config.database.connection_string
    models = f'{config.package}.models'

    async with tortoise_wrapper(url, models):
        ...


if __name__ == '__main__':
    cli(prog_name='dipdup', standalone_mode=True)

Then use python -m dipdup_indexer.cli instead of dipdup as an entrypoint. Now you can call heavy-stuff like any other command. dipdup.cli:cli group handles arguments and config parsing, graceful shutdown, and other boilerplate. Keep in mind that DipDup ORM is not thread-safe.

Terminal
python -m dipdup_indexer.cli -c dipdup.yaml heavy-stuff --key value

Or in Dockerfile:

Dockerfile
ENTRYPOINT ["python", "-m", "dipdup_indexer.cli"]
CMD ["-c", "dipdup.yaml", "heavy-stuff", "--key", "value"]

Reducing disk I/O

Indexing produces a lot of disk I/O. During development you can store the database in RAM. By default DipDup uses in-memory SQLite database dropped on exit. Using tmpfs instead allows you to persist the database between process restarts until the system is rebooted. By default, tmpfs is mounted on /tmp with a size of 50% of RAM. The following spells are for Linux, but on macOS the process should be similar.

Terminal
# Make sure tmpfs is mounted
$ df -h /tmp
Filesystem      Size  Used Avail Use% Mounted on
tmpfs            16G  1.0G   16G   7% /tmp

# You can change the size of tmpfs without unmounting it
$ sudo mount -o remount,size=64G,noatime /tmp

# But make sure that you have enough swap for this
$ free -h
               total        used        free      shared  buff/cache   available
Mem:            30Gi        16Gi       3,1Gi       1,3Gi        11Gi        12Gi
Swap:           31Gi       6,0Mi        31Gi

# Update database config to use tmpfs
$ grep database -A2 dipdup.yaml
database:
  kind: sqlite
  path: /tmp/uniswap.sqlite

# After you've done indexing, move the database from RAM to disk
$ mv /tmp/uniswap.sqlite ~/uniswap.sqlite

Commands above were checked on Linux, but on macOS the process should be similar.

Help and tips -> Join our Discord
Ideas or suggestions -> Issue Tracker
GraphQL IDE -> Open Playground