5.1.0
Migration from 5.0 (optional)
- Run
init
command. Now you have two conflicting hooks:on_rollback
andon_index_rollback
. Follow the guide below to perform the migration.ConflictingHooksError
exception will be raised until then.
What's New
Per-index rollback hook
In this release, we continue to improve the rollback-handling experience, which became much more important since the Ithaca protocol reached mainnet. Let's briefly recap how DipDup currently processes chain reorgs before calling a rollback hook:
- If the
buffer_size
option of a TzKT datasource is set to a non-zero value, and there are enough data messages buffered when a rollback occurs, data is just dropped from the buffer, and indexing continues. - If all indexes in the config are
operation
ones, we can attempt to process a single-level rollback. All operations from rolled back block must be presented in the next one for rollback to succeed. If some operations are missing, theon_rollback
hook will be called as usual. - Finally, we can safely ignore indexes with a level lower than the rollback target. The index level is updated either on synchronization or when at least one related operation or bigmap diff has been extracted from a realtime message.
If none of these tricks have worked, we can't process a rollback without custom logic. Here's where changes begin. Before this release, every project contained the on_rollback
hook, which receives datasource: IndexDatasource
argument and from/to levels. Even if your deployment has thousands of indexes and only a couple of them are affected by rollback, you weren't able to easily find out which ones.
Now on_rollback
hook is deprecated and superseded by the on_index_rollback
one. Choose one of the following options:
- You haven't touched the
on_rollback
hook since project creation. Runinit
command and removehooks/on_rollback
andsql/on_rollback
directories in project root. Default action (reindexing) has not changed. - You have some custom logic in
on_rollback
hook and want to leave it as-is for now. You can ignore introduced changes at least till the next major release. - You have implemented per-datasource rollback logic and are ready to switch to the per-index one. Run
init
, move your code to theon_index_rollback
hook and deleteon_rollback
one. Note, you can access rolled back datasource viaindex.datasource
.
Token transfer index
Sometimes implementing an operation
index is overkill for a specific task. An existing alternative is to use a big_map
index to process only the diffs of selected big map paths. However, you still need to have a separate index for each contract of interest, which is very resource-consuming. A widespread case is indexing FA1.2/FA2 token contracts. So, this release introduces a new token_transfer
index:
indexes:
transfers:
kind: token_transfer
datasource: tzkt
handlers:
- callback: transfers
The TokenTransferData
object is passed to the handler on each operation, containing only information enough to process a token transfer.
config env
command to generate env-files
config env
command to generate env-filesGenerally, It's good to separate a project config from deployment parameters, and DipDup has multiple options to achieve this. First of all, multiple configs can be chained successively, overriding top-level sections. Second, the DipDup config can contain docker-compose-style environment variable declarations. Let's say your config contains the following content:
database:
kind: postgres
host: db
port: 5432
user: ${POSTGRES_USER:-dipdup}
password: ${POSTGRES_PASSWORD:-changeme}
database: ${POSTGRES_DB:-dipdup}
You can generate an env-file to use with this exact config:
$ dipdup -c dipdup.yml -c dipdup.docker.yml config env
POSTGRES_USER=dipdup
POSTGRES_PASSWORD=changeme
POSTGRES_DB=dipdup
The environment of your current shell is also taken into account:
$ POSTGRES_DB=foobar dipdup -c dipdup.yml -c dipdup.docker.yml config env
POSTGRES_USER=dipdup
POSTGRES_PASSWORD=changeme
POSTGRES_DB=foobar # <- set from current env
Use -f <filename>
option to save output on disk instead of printing to stdout. After you have modified the env-file according to your needs, you can apply it the way which is more convenient to you:
With dipdup --env-file / -e
option:
dipdup -e prod.env <...> run
When using docker-compose:
services:
indexer:
...
env_file: prod.env
Keeping framework up-to-date
A bunch of new tags is now pushed to the Docker Hub on each release in addition to the X.Y.Z
one: X.Y
and X
. That way, you can stick to a specific release without the risk of leaving a minor/major update unattended (friends don't let friends use latest
😉). The -pytezos
flavor is also available for each tag.
FROM dipdup/dipdup:5.1
...
In addition, DipDup will poll GitHub for new releases on each command which executes reasonably long and print a warning when running an outdated version. You can disable these checks with advanced.skip_version_check
flag.
Pro tip: you can also enable notifications on the GitHub repo page with 👁 Watch -> Custom -> tick Releases -> Apply to never miss a fresh DipDup release.
Changelog
See full 5.1.0 changelog here.
5.0.0
⚠ Breaking Changes
- Python versions 3.8 and 3.9 are no longer supported.
bcd
datasource has been removed.- Two internal tables were added,
dipdup_contract_metadata
anddipdup_token_metadata
. - Some methods of
tzkt
datasource have changed their signatures and behavior. - Dummy
advanced.oneshot
config flag has been removed. - Dummy
schema approve --hashes
command flag has been removed. docker init
command has been removed.ReindexingReason
enumeration items have been changed.
Migration from 4.x
- Ensure that you have a
python = "^3.10"
dependency inpyproject.toml
. - Remove
bcd
datasources from config. Usemetadata
datasource instead to fetch contract and token metadata. - Update
tzkt
datasource method calls as described below. - Run the
dipdup schema approve
command on every database you use with 5.0.0. - Update usage of
ReindexingReason
enumeration if needed.
What's New
Process realtime messages with lag
Chain reorgs have occurred much recently since the Ithaca protocol reached mainnet. The preferable way to deal with rollbacks is the on_rollback
hook. But if the logic of your indexer is too complex, you can buffer an arbitrary number of levels before processing to avoid reindexing.
datasources:
tzkt_mainnet:
kind: tzkt
url: https://api.tzkt.io
buffer_size: 2
DipDup tries to remove backtracked operations from the buffer instead emitting rollback. Ithaca guarantees operations finality after one block and blocks finality after two blocks, so to completely avoid reorgs, buffer_size
should be 2.
BCD API takedown
Better Call Dev API was officially deprecated in February. Thus, it's time to go for bcd
datasource. In DipDup, it served the only purpose of fetching contract and token metadata. Now there's a separate metadata
datasource which do the same thing but better. If you have used bcd
datasource for custom requests, see How to migrate from BCD to TzKT API article.
TzKT batch request pagination
Historically, most TzktDatasource
methods had a page iteration logic hidden inside. The quantity of items returned by TzKT in a single request is configured in HTTPConfig.batch_size
and defaulted to 10.000. Before this release, three requests would be performed by the get_big_map
method to fetch 25.000 big map keys, leading to performance degradation and extensive memory usage.
affected method | response size in 4.x | response size in 5.x |
---|---|---|
get_similar_contracts | unlimited | max. datasource.request_limit |
get_originated_contracts | unlimited | max. datasource.request_limit |
get_big_map | unlimited | max. datasource.request_limit |
get_contract_big_maps | unlimited | max. datasource.request_limit |
get_quotes | first datasource.request_limit | max. datasource.request_limit |
All paginated methods now behave the same way. You can either iterate over pages manually or use iter_...
helpers.
datasource = ctx.get_tzkt_datasource('tzkt_mainnet')
batch_iter = datasource.iter_big_map(
big_map_id=big_map_id,
level=last_level,
)
async for key_batch in batch_iter:
for key in key_batch:
...
Metadata interface for TzKT integration
Starting with 5.0 you can store and expose custom contract and token metadata in the same format DipDup Metadata service does for TZIP-compatible metadata.
Enable this feature with advanced.metadata_interface
flag, then update metadata in any callback:
await ctx.update_contract_metadata(
network='mainnet',
address='KT1...',
metadata={'foo': 'bar'},
)
Metadata stored in dipdup_contract_metadata
and dipdup_token_metadata
tables and available via GraphQL and REST APIs.
Prometheus integration
This version introduces initial Prometheus integration. It could help you set up monitoring, find performance issues in your code, and so on. To enable this integration, add the following lines to the config:
prometheus:
host: 0.0.0.0
Changes since 4.2.7
Added
- config: Added
custom
section to store arbitrary user data. - metadata: Added
metadata_interface
feature flag to expose metadata in TzKT format. - prometheus: Added ability to expose Prometheus metrics.
- tzkt: Added ability to process realtime messages with lag.
- tzkt: Added missing fields to the
HeadBlockData
model. - tzkt: Added
iter_...
methods to iterate over item batches.
Fixed
- config: Fixed default SQLite path (
:memory:
). - prometheus: Fixed invalid metric labels.
- tzkt: Fixed pagination in several getter methods.
- tzkt: Fixed data loss when
skip_history
option is enabled. - tzkt: Fixed crash in methods that do not support cursor pagination.
- tzkt: Fixed possible OOM while calling methods that support pagination.
- tzkt: Fixed possible data loss in
get_originations
andget_quotes
methods.
Changed
- tzkt: Added
offset
andlimit
arguments to all methods that support pagination.
Removed
- bcd: Removed
bcd
datasource and config section. - cli: Removed
docker init
command. - cli: Removed dummy
schema approve --hashes
flag. - config: Removed dummy
advanced.oneshot
flag.
Performance
- dipdup: Use fast
orjson
library instead of built-injson
where possible.
4.2.0
What's new
ipfs
datasource
ipfs
datasourceWhile working with contract/token metadata, a typical scenario is to fetch it from IPFS. DipDup now has a separate datasource to perform such requests.
datasources:
ipfs:
kind: ipfs
url: https://ipfs.io/ipfs
You can use this datasource within any callback. Output is either JSON or binary data.
ipfs = ctx.get_ipfs_datasource('ipfs')
file = await ipfs.get('QmdCz7XGkBtd5DFmpDPDN3KFRmpkQHJsDgGiG16cgVbUYu')
assert file[:4].decode()[1:] == 'PDF'
file = await ipfs.get('QmSgSC7geYH3Ae4SpUHy4KutxqNH9ESKBGXoCN4JQdbtEz/package.json')
assert file['name'] == 'json-buffer'
You can tune HTTP connection parameters with the http
config field, just like any other datasource.
Sending arbitrary requests
DipDup datasources do not cover all available methods of underlying APIs. Let's say you want to fetch protocol of the chain you're currently indexing from TzKT:
tzkt = ctx.get_tzkt_datasource('tzkt_mainnet')
protocol_json = await tzkt.request(
method='get',
url='v1/protocols/current',
cache=False,
weigth=1, # ratelimiter leaky-bucket drops
)
assert protocol_json['hash'] == 'PtHangz2aRngywmSRGGvrcTyMbbdpWdpFKuS4uMWxg2RaH9i1qx'
Datasource HTTP connection parameters (ratelimit, backoff, etc.) are applied on every request.
Firing hooks outside of the current transaction
When configuring a hook, you can instruct DipDup to wrap it in a single database transaction:
hooks:
my_hook:
callback: my_hook
atomic: True
Until now, such hooks could only be fired according to jobs
schedules, but not from a handler or another atomic hook using ctx.fire_hook
method. This limitation is eliminated - use wait
argument to escape the current transaction:
async def handler(ctx: HandlerContext, ...) -> None:
await ctx.fire_hook('atomic_hook', wait=False)
Spin up a new project with a single command
Cookiecutter is an excellent jinja2
wrapper to initialize hello-world templates of various frameworks and toolkits interactively. Install python-cookiecutter
package systemwide, then call:
cookiecutter https://github.com/dipdup-io/cookiecutter-dipdup
Advanced scheduler configuration
DipDup utilizes apscheduler
library to run hooks according to schedules in jobs
config section. In the following example, apscheduler
spawns up to three instances of the same job every time the trigger is fired, even if previous runs are in progress:
advanced:
scheduler:
apscheduler.job_defaults.coalesce: True
apscheduler.job_defaults.max_instances: 3
See apscheduler
docs for details.
Note that you can't use executors from apscheduler.executors.pool
module - ConfigurationError
exception raised then. If you're into multiprocessing, I'll explain why in the next paragraph.
About the present and future of multiprocessing
It's impossible to use apscheduler
pool executors with hooks because HookContext
is not pickle-serializable. So, they are forbidden now in advanced.scheduler
config. However, thread/process pools can come in handy in many situations, and it would be nice to have them in DipDup context. For now, I can suggest implementing custom commands as a workaround to perform any resource-hungry tasks within them. Put the following code in dipdup_indexer/cli.py
:
from contextlib import AsyncExitStack
import asyncclick as click
from dipdup.cli import cli, cli_wrapper
from dipdup.config import DipDupConfig
from dipdup.context import DipDupContext
from dipdup.utils.database import tortoise_wrapper
@cli.command(help='Run heavy calculations')
@click.pass_context
@cli_wrapper
async def do_something_heavy(ctx):
config: DipDupConfig = ctx.obj.config
url = config.database.connection_string
models = f'{config.package}.models'
async with AsyncExitStack() as stack:
await stack.enter_async_context(tortoise_wrapper(url, models))
...
if __name__ == '__main__':
cli(prog_name='dipdup', standalone_mode=False)
Then use python -m dipdup_indexer.cli
instead of dipdup
as an entrypoint. Now you can call do-something-heavy
like any other dipdup
command. dipdup.cli:cli
group handles arguments and config parsing, graceful shutdown, and other boilerplate. The rest is on you; use dipdup.dipdup:DipDup.run
as a reference. And keep in mind that Tortoise ORM is not thread-safe. I aim to implement ctx.pool_apply
and ctx.pool_map
methods to execute code in pools with magic within existing DipDup hooks, but no ETA yet.
That's all, folks! As always, your feedback is very welcome 🤙
4.1.0
Migration from 4.0 (optional)
- Run
dipdup schema init
on the existing database to enabledipdup_head_status
view and REST endpoint.
What's New
Index only the current state of big maps
big_map
indexes allow achieving faster processing times than operation
ones when storage updates are the only on-chain data your dapp needs to function. With this DipDup release, you can go even further and index only the current storage state, ignoring historical changes.
indexes:
foo:
kind: big_map
...
skip_history: never|once|always
When this option is set to once
, DipDup will skip historical changes only on initial sync and switch to regular indexing afterward. When the value is always
, DipDup will fetch all big map keys on every restart. Preferrable mode depends on your workload.
All big map diffs DipDup pass to handlers during fast sync have action
field set to BigMapAction.ADD_KEY
. Keep in mind that DipDup fetches all keys in this mode, including ones removed from the big map. If needed, you can filter out the latter by BigMapDiff.data.active
field.
New datasource for contract and token metadata
Since the first version DipDup allows to fetch token metadata from Better Call Dev API with bcd
datasource. Now it's time for a better solution. Firstly, BCD is far from being reliable in terms of metadata indexing. Secondly, spinning up your own instance of BCD requires significant effort and computing power. Lastly, we plan to deprecate Better Call Dev API soon (but do not worry - it won't affect the explorer frontend).
Luckily, we have dipdup-metadata, a standalone companion indexer for DipDup written in Go. Configure a new datasource in the following way:
datasources:
metadata:
kind: metadata
url: https://metadata.dipdup.net
network: mainnet|ghostnet|limanet
Now you can use it anywhere in your callbacks:
datasource = ctx.datasources['metadata']
token_metadata = await datasource.get_token_metadata(address, token_id)
bcd
datasource will remain available for a while, but we discourage using it for metadata processing.
Nested packages for hooks and handlers
Callback modules are no longer have to be in top-level hooks
/handlers
directories. Add one or multiple dots to the callback name to define nested packages:
package: indexer
hooks:
foo.bar:
callback: foo.bar
After running init
command, you'll get the following directory tree (shortened for readability):
indexer
├── hooks
│ ├── foo
│ │ ├── bar.py
│ │ └── __init__.py
│ └── __init__.py
└── sql
└── foo
└── bar
└── .keep
The same rules apply to handler callbacks. Note that callback
field must be a valid Python package name - lowercase letters, underscores, and dots.
New CLI commands and flags
schema init
is a new command to prepare a database for running DipDup. It will create tables based on your models, then callon_reindex
SQL hook to finish preparation - the same things DipDup does when run on a clean database.hasura configure --force
flag allows to configure Hasura even if metadata hash matches one saved in database. It may come in handy during development.init --keep-schemas
flag makes DipDup preserve contract JSONSchemas. Usually, they are removed after generating typeclasses withdatamodel-codegen
, but you can keep them to convert to other formats or troubleshoot codegen issues.
Built-in dipdup_head_status
view and REST endpoint
dipdup_head_status
view and REST endpointDipDup maintains several internal models to keep its state. As Hasura generates GraphQL queries and REST endpoints for those models, you can use them for monitoring. However, some SaaS monitoring solutions can only check whether an HTTP response contains a specific word or not. For such cases dipdup_head_status
view was added - a simplified representation of dipdup_head
table. It returns OK
when datasource received head less than two minutes ago and OUTDATED
otherwise. Latter means that something's stuck, either DipDup (e.g., because of database deadlock) or TzKT instance. Or maybe the whole Tezos blockchain, but in that case, you have problems bigger than indexing.
$ curl "http://127.0.0.1:41000/api/rest/dipdupHeadStatus?name=https%3A%2F%2Fapi.tzkt.io"
{"dipdupHeadStatus":[{"status":"OUTDATED"}]}%
Note that dipdup_head
update may be delayed during sync even if the --early-realtime
flag is enabled, so don't rely exclusively on this endpoint.
Changelog
Added
- cli: Added
schema init
command to initialize database schema. - cli: Added
--force
flag tohasura configure
command. - codegen: Added support for subpackages inside callback directories.
- hasura: Added
dipdup_head_status
view and REST endpoint. - index: Added an ability to skip historical data while synchronizing
big_map
indexes. - metadata: Added
metadata
datasource. - tzkt: Added
get_big_map
andget_contract_big_maps
datasource methods.
4.0.0
⚠ Breaking Changes
run --oneshot
option is removed. The oneshot mode (DipDup stops after the sync is finished) applies automatically whenlast_level
field is set in the index config.clear-cache
command is removed. Usecache clear
instead.
Migration from 3.x
- Run
dipdup init
command to generateon_synchronized
hook stubs. - Run
dipdup schema approve
command on every database you want to use with 4.0.0. Runningdipdup migrate
is not necessary sincespec_version
hasn't changed in this release.
What's New
Performance optimizations
Overall indexing performance has been significantly improved. Key highlights:
- Configuration files are loaded 10x times faster. The more indexes in the project, the more noticeable difference is.
- Significantly reduced CPU usage in realtime mode.
- Datasource default HTTP connection options optimized for a reasonable balance between resource consumption and indexing speed.
Also, two new flags were added to improve DipDup performance in several scenarios: merge_subscriptions
and early_relatime
. See this paragraph for details.
Configurable action on reindex
There are several reasons that trigger reindexing:
reason | description |
---|---|
manual | Reindexing triggered manually from callback with ctx.reindex . |
migration | Applied migration requires reindexing. Check release notes before switching between major DipDup versions to be prepared. |
rollback | Reorg message received from TzKT, and can not be processed. |
config_modified | One of the index configs has been modified. |
schema_modified | Database schema has been modified. Try to avoid manual schema modifications in favor of SQL hooks. |
Now it is possible to configure desirable action on reindexing triggered by the specific reason.
action | description |
---|---|
exception (default) | Raise ReindexingRequiredError and quit with error code. The safest option since you can trigger reindexing accidentally, e.g., by a typo in config. Don't forget to set up the correct restart policy when using it with containers. |
wipe | Drop the whole database and start indexing from scratch. Be careful with this option! |
ignore | Ignore the event and continue indexing as usual. It can lead to unexpected side-effects up to data corruption; make sure you know what you are doing. |
To configure actions for each reason, add the following section to DipDup config:
advanced:
reindex:
manual: wipe
migration: exception
rollback: ignore
config_modified: exception
schema_modified: exception
New CLI commands and flags
command or flag | description |
---|---|
cache show | Get information about file caches used by DipDup. |
config export | Print config after resolving all links and variables. Add --unsafe option to substitute environment variables; default values from config will be used otherwise. |
run --early-realtime | Establish a realtime connection before all indexes are synchronized. |
run --merge-subscriptions | Subscribe to all operations/big map diffs during realtime indexing. This flag helps to avoid reaching TzKT subscriptions limit (currently 10000 channels). Keep in mind that this option could significantly improve RAM consumption depending on the time required to perform a sync. |
status | Print the current status of indexes from the database. |
advanced
top-level config section
advanced
top-level config sectionThis config section allows users to tune system-wide options, either experimental or unsuitable for generic configurations.
field | description |
---|---|
early_realtime merge_subscriptions postpone_jobs | Another way to set run command flags. Useful for maintaining per-deployment configurations. |
reindex | Configure action on reindexing triggered. See this paragraph for details. |
CLI flags have priority over self-titled AdvancedConfig
fields.
aiosignalrcore
replaced with pysignalr
aiosignalrcore
replaced with pysignalr
It may not be the most noticeable improvement for end-user, but it still deserves a separate paragraph in this article.
Historically, DipDup used our own fork of signalrcore
library named aiosignalrcore
. This project aimed to replace the synchronous websocket-client
library with asyncio-ready websockets
. Later we discovered that required changes make it hard to maintain backward compatibility, so we have decided to rewrite this library from scratch. So now you have both a modern and reliable library for SignalR protocol and a much more stable DipDup. Ain't it nice?
Changes since 3.1.3
This is a combined changelog of -rc versions released since the last stable release until this one.
Added
- cli: Added
run --early-realtime
flag to establish a realtime connection before all indexes are synchronized. - cli: Added'run --merge-subscriptions` flag to subscribe to all operations/big map diffs during realtime indexing.
- cli: Added
status
command to print the current status of indexes from the database. - cli: Added
config export [--unsafe]
command to print config after resolving all links and variables. - cli: Added
cache show
command to get information about file caches used by DipDup. - config: Added
first_level
andlast_level
optional fields toTemplateIndexConfig
. These limits are applied after ones from the template itself. - config: Added
daemon
boolean field toJobConfig
to run a single callback indefinitely. Conflicts withcrontab
andinterval
fields. - config: Added
advanced
top-level section. - hooks: Added
on_synchronized
hook, which fires each time all indexes reach realtime state.
Fixed
- cli: Fixed config not being verified when invoking some commands.
- cli: Fixed crashes and output inconsistency when piping DipDup commands.
- cli: Fixed missing
schema approve --hashes
argument. - cli: Fixed
schema wipe --immune
flag being ignored. - codegen: Fixed contract address used instead of an alias when typename is not set.
- codegen: Fixed generating callback arguments for untyped operations.
- codegen: Fixed missing imports in handlers generated during init.
- coinbase: Fixed possible data inconsistency caused by caching enabled for method
get_candles
. - hasura: Fixed unnecessary reconfiguration in restart.
- http: Fixed increasing sleep time between failed request attempts.
- index: Fixed
CallbackError
raised instead ofReindexingRequiredError
in some cases. - index: Fixed crash while processing storage of some contracts.
- index: Fixed incorrect log messages, remove duplicate ones.
- index: Fixed invocation of head index callback.
- index: Fixed matching of untyped operations filtered by
source
field (@pravin-d). - tzkt: Fixed filtering of big map diffs by the path.
- tzkt: Fixed
get_originated_contracts
andget_similar_contracts
methods whose output was limited toHTTPConfig.batch_size
field. - tzkt: Fixed lots of SignalR bugs by replacing
aiosignalrcore
library withpysignalr
. - tzkt: Fixed processing operations with entrypoint
default
. - tzkt: Fixed regression in processing migration originations.
- tzkt: Fixed resubscribing when realtime connectivity is lost for a long time.
- tzkt: Fixed sending useless subscription requests when adding indexes in runtime.
Changed
- cli:
schema wipe
command now requires confirmation when invoked in the interactive shell. - cli:
schema approve
command now also causes a recalculation of schema and index config hashes. - index: DipDup will recalculate respective hashes if reindexing is triggered with
config_modified: ignore
orschema_modified: ignore
in advanced config.
Removed
- cli: Removed deprecated
run --oneshot
argument andclear-cache
command.
Performance
- config: Configuration files are loaded 10x times faster.
- index: Checks performed on each iteration of the main DipDup loop are slightly faster now.
- index: Number of operations processed by matcher reduced by 40%-95% depending on the number of addresses and entrypoints used.
- tzkt: Improved performance of response deserialization.
- tzkt: Rate limit was increased. Try to set
connection_timeout
to a higher value if requests fail withConnectionTimeout
exception.