F.A.Q.
This page contains answers to the most frequently asked questions about DipDup.
General
What hardware do I need to run DipDup?
DipDup can run on any amd64/arm64 machine starting from 1 CPU core and 256M of RAM. Aim for a good single-threaded and disk I/O performance.
Actual RAM requirements depend on multiple factors: the number and complexity of indexes, the size of internal queues and caches, and the usage of CachedModel
. For the average project, 1GB is usually enough. If you're running DipDup on some ultra-low-end instance and getting OOMs, try the DIPDUP_LOW_MEMORY=1
environment variable.
Indexing
How to index similar but not identical contracts as a single entity?
Multiple contracts can provide the same interface but have a different storage structure. Examples are ERC20/ERC721/ERC1155 standard tokens on Ethereum and FA1.2/FA2 ones on Tezos. If you try to use the same typename for them, indexing will fail because of the storage mismatch. However, you can modify typeclasses manually. Edit types/<typename>/storage.py
file and comment out fields leaving only the ones used in your index (common for all contracts with the same interface).
class ContractStorage(BaseModel):
class Config:
extra = Extra.ignore
common_ledger: dict[str, str]
# unique_field_foo: str
# unique_field_bar: str
Don't forget Extra.ignore
Pydantic hint, otherwise, storage deserialization will fail. To restore the original typeclass, remove the modified file and run dipdup init
again. You can also add --force
flag to overwrite all ABIs and typeclasses.
How to use off-chain datasources?
DipDup provides convenient helpers to process off-chain data like market quotes or IPFS metadata. Follow the tips below to use them most efficiently.
- Do not perform off-chain requests in handers until necessary. Handlers need to be as fast as possible not to block the database transaction. Use hooks instead, enriching indexed data on-demand.
- Use generic
http
datasource for external APIs instead of plainaiohttp
requests. It makes available the same features DipDup uses for internal requests: retry with backoff, rate limiting, Prometheus integration etc. - Database tables that store off-chain data can be marked as immune, to speed up reindexing.
How to process indexes in a specific order?
Indexes of all kinds are fully independent. They are processed in parallel, have their own message queues, and don't share any state. It is one of the essential DipDup concepts, so there's no "official" way to manage the order of indexing.
Avoid using sync primitives like asyncio.Event
or asyncio.Lock
in handlers. Indexing will be stuck forever, waiting for the database transaction to complete.
Instead, save raw data in handlers and process it later with hooks when all conditions are met. For example, process data batch only when all indexes in the dipdup_index
table have reached a specific level.
Database
How to perform database migrations?
At the moment DipDup does not have a built-in migration system. Framework architecture implies that schema changes are rare and usually require reindexing. However, you can perform migrations yourself using third-party tools or write your own scripts and keep them in sql
project directory.
You may want to disable the schema hash check in config. Alternatively, call the schema approve
command after every schema change.
advanced:
reindex:
schema_modified: ignore
Now, let's prepare a migration script. To determine the changes you need to make, you can compare the SQL schema dump before and after modifying the models. Say you need to add a new field to one of the models.
class Event(Model):
...
+ timestamp = fields.DatetimeField()
dipdup schema export > old-schema.sql
# [make some changes]
dipdup schema export > new-schema.sql
diff old-schema.sql new-schema.sql
And here's SQL for the new column:
+ "timestamp" TIMESTAMP NOT NULL,
Now prepare the idempotent migration script and put it in the sql/on_restart
directory.
ALTER TABLE "event" ADD COLUMN IF NOT EXISTS "timestamp" TIMESTAMP NOT NULL;
SELECT dipdup_approve('public');
I get schema_modified
error, but I didn't change anything
schema_modified
error, but I didn't change anythingDipDup compares the current schema hash with the one stored in the database. If they don't match, it throws an error. If models were not modified, most likely the reason is incorrect model definitions. e.g. if you define a timestamp field like this…
timestamp = fields.DatetimeField(default=datetime.utcnow())
…schema will be different every time you run DipDup, because datetime.utcnow()
is evaluated on a module import.
$ dipdup schema export > schema.sql
$ dipdup schema export > same-schema.sql
$ diff schema.sql same-schema.sql
116c116
< "timestamp" TIMESTAMP NOT NULL DEFAULT '2023-04-19T21:16:31.183036',
---
> "timestamp" TIMESTAMP NOT NULL DEFAULT '2023-04-19T21:16:36.231221',
You need to make the following change in models.py:
< timestamp = fields.DatetimeField(default=datetime.utcnow())
> timestamp = fields.DatetimeField(auto_now=True)
We plan to improve field classes in future releases to accept callables as default values.
Why am I getting decimal.InvalidOperation
error?
decimal.InvalidOperation
error?If your models contain DecimalField
s, you may encounter this error when performing arithmetic operations. It's because the value is too big to fit into the current decimal context.
class Token(Model):
id = fields.TextField(primary_key=True)
volume = fields.DecimalField(decimal_places=18, max_digits=76)
...
The default decimal precision in Python is 28 digits. DipDup tries to increase it automatically by guessing the value from the schema. It works in most cases, but not for really big numbers. You can increase the precision manually in config.
advanced:
decimal_precision: 128
Don't forget to reindex after this change. When decimal context precision is adjusted you'll get a warning in the logs.
WARNING dipdup.database Decimal context precision has been updated: 28 -> 128
How to modify schema manually?
Drop an idempotent SQL script into sql/on_reindex/
directory. For example, here's how to create a Timescale hypertable:
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
ALTER TABLE swap DROP CONSTRAINT swap_pkey;
ALTER TABLE swap ADD PRIMARY KEY (id, timestamp);
SELECT create_hypertable('swap', 'timestamp', chunk_time_interval => 7776000);
Package
What is the symlink in the project root for?
DipDup project must be a valid discoverable Python package. Python searches for packages in site-packages
and the current working directory. This symlink is a hack that allows project root to serve as a working directory without requiring a subdirectory. It's created only when dipdup init
is executed from the project root. While this symlink trick should be harmless, if it interferes with any of your tools, use DIPDUP_NO_SYMLINK=1
environment variable to disable this behavior.
Maintenance
pipx, Poetry, PDM... What's the difference?
For historical reasons, Python package management is a mess. There are multiple tools and approaches to manage Python dependencies. Here's a brief comparison:
- pip is a general-purpose package manager. It's simple and robust, but you need to manage venvs and lock files manually.
- pipx is meant for applications. It installs packages into separate environments in
~/.local/pipx/venvs
and makes their CLI commands available from any path. pipx is a recommended way to install DipDup CLI. - Poetry and PDM are full-fledged project management tools. They handle venvs, lock files, dependency resolution, publishing, etc.
Using PDM/Poetry is not required to run DipDup, but strongly recommended. Choosing one over the other is a matter of personal preference. As of writing, Poetry is faster, more popular, and has a nicer CLI, while PDM is more PEP-compatible and allows dependency overrides.
You can choose the preferred tool (or none) when initializing a project with dipdup new
command. If you change your mind later, modify the replay.yaml
file and run dipdup init --base --force
.
Miscellaneous
Why it's called DipDup?
DipDup (/dɪp dʌp/) was initially developed as a grant project for the Tezos Foundation. Tezos is one of the few blockchains with its own contract language, completely different from Solidity. This low-level, stack-based language is called Michelson. The name "DipDup" comes from DIP {DUP}
, a small Michelson program that duplicates the second element on the stack. While today we support dozens of blockchains, Tezos holds a special place in our project's history.