research.wrangle_vault_prices
Documentation for eth_defi.research.wrangle_vault_prices Python module.
Clean vault price data.
Denormalise data to a single DataFrame
Remove abnormalities in the price data
Reduce data by removing hourly changes that are below our epsilon threshold
Generate returns data
Module Attributes
For manual debugging, we process these vaults first |
|
Vault state columns added by the historical scanner. |
Functions
|
Add denormalised data to the prices DataFrame. |
|
Ensure all vaults have unique human-readable name. |
|
Calculate returns for each vault. |
|
Check that we have metadata for all vaults in the prices DataFrame. |
|
TVL-based threshold filtering of returns. |
|
Clean returns data by removing rows with NaN or infinite values. |
|
Ensure vault state columns are present in the DataFrame. |
|
Dedpulicate data rows with epsilon. |
|
Reduce vaults to stablecoin vaults only. |
|
Fix out rows with share price that is too high. |
|
Forward fill missing vault prices up to max_gap_hours. |
A command line script entry point to take raw scanned vault price data and clean it up to a format that can be analysed. |
|
|
Build a dictionary of vaults by their chain-address id. |
|
Preprocess vault data for further analysis. |
|
Remove initial inactive period from each vault's price history. |
|
Set up the order of vaults for processing. |
- PRIORITY_SORT_IDS = ['8453-0x0d877dc7c8fa3ad980dfdb18b48ec9f8768359c4']
For manual debugging, we process these vaults first
- get_vaults_by_id(rows)
Build a dictionary of vaults by their chain-address id.
- Parameters
rows (dict[eth_defi.vault.base.VaultSpec, eth_defi.vault.vaultdb.VaultRow]) – Metadata rows from vault database
- Returns
Dictionary of vaults by their chain-address id
- Return type
- VAULT_STATE_COLUMNS = {'deposits_open': '', 'max_deposit': nan, 'max_redeem': nan, 'redemption_open': '', 'trading': ''}
Vault state columns added by the historical scanner. Ensure these are always present in cleaned data, even when processing old scan data that lacks them.
- ensure_vault_state_columns(prices_df)
Ensure vault state columns are present in the DataFrame.
Adds missing columns with default values for backward compatibility with raw scan data generated before these fields were added.
- Parameters
prices_df (pandas.core.frame.DataFrame) –
- Return type
pandas.core.frame.DataFrame
- assign_unique_names(rows, prices_df, logger=<built-in function print>, duplicate_nav_threshold=1000)
Ensure all vaults have unique human-readable name.
Rerwrite metadata rows
Find duplicate vault names
Add a running counter to the name to make it unique
- Parameters
rows (dict[eth_defi.vault.base.VaultSpec, eth_defi.vault.vaultdb.VaultRow]) –
prices_df (pandas.core.frame.DataFrame) –
- Return type
pandas.core.frame.DataFrame
- add_denormalised_vault_data(rows, prices_df, logger=<built-in function print>)
Add denormalised data to the prices DataFrame.
Take data from vault database and duplicate it across every row
Add protocol name and event count columns
- Parameters
rows (dict[eth_typing.evm.HexAddress, eth_defi.vault.vaultdb.VaultRow]) –
prices_df (pandas.core.frame.DataFrame) –
- Return type
pandas.core.frame.DataFrame
- filter_vaults_by_stablecoin(rows, prices_df, logger=<built-in function print>)
Reduce vaults to stablecoin vaults only.
In this notebooks, we focus on stablecoin yield
Do not consider WETH, other native token vaults, as their returns calculation would need to match the appreciation of underlying assets
[is_stablecoin_like](https://web3-ethereum-defi.readthedocs.io/api/core/_autosummary/eth_defi.token.is_stablecoin_like.html?highlight=is_stablecoin_like#eth_defi.token.is_stablecoin_like) supports GHO, crvUSD and other DeFi/algorithmic stablecoins
Note that this picks up very few EUR and other fiat-nominated vaults
- Parameters
rows (dict[eth_typing.evm.HexAddress, eth_defi.vault.vaultdb.VaultRow]) –
prices_df (pandas.core.frame.DataFrame) –
- Return type
pandas.core.frame.DataFrame
- calculate_vault_returns(prices_df, logger=<built-in function print>)
Calculate returns for each vault.
Filter out reads for which we did not get a proper share price
Add
returns_1hcolumns
Example of input data:
chain address block_number timestamp share_price ... errors id name event_count protocol 207 42161 0x487cdc7d21ac8765eff6c0e681aea36ae1594471 13294721 2022-05-30 19:59:22 1.0 ... 42161-0x487cdc7d21ac8765eff6c0e681aea36ae1594471 LDAI 17 <unknown ERC-4626>
- Parameters
prices_df (pandas.core.frame.DataFrame) –
- clean_returns(rows, prices_df, logger=<built-in function print>, outlier_threshold=0.5, display=<function <lambda>>, returns_col='returns_1h')
Clean returns data by removing rows with NaN or infinite values.
- In returns data we have outliers that are likely not real returns, or one-time events that cannot repeat.
Floating point errors: [Share price may jumps wildly when a vault TVL is near zero](https://x.com/0xSEM/status/1914748782102630455)
Bugs: Vault share price method to estimate returns does not work for a particular airdrop
Airdrops: Vault gets an irregular rewards that will not repeat, and thus are not good to estimate the future performance
We clean returns by doing an assumptions - Daily returns higher than static outlier - Daily TVL max does not make sense - Daily TVL min does not make sense - Daily TVL % below lifetime average TVL
- Parameters
rows (dict[eth_typing.evm.HexAddress, eth_defi.vault.vaultdb.VaultRow]) –
prices_df (pandas.core.frame.DataFrame) –
display (Callable) –
- Return type
pandas.core.frame.DataFrame
- clean_by_tvl(rows, prices_df, logger=<built-in function print>, tvl_threshold_min=1000.0, tvl_threshold_max=99000000000, tvl_threshold_min_dynamic=0.02, returns_col='returns_1h')
TVL-based threshold filtering of returns.
Clean returns from TVL-manipulation outliers
See https://x.com/moo9000/status/1914746350216077544 for manipulation example
- Parameters
rows (dict[eth_typing.evm.HexAddress, eth_defi.vault.vaultdb.VaultRow]) –
prices_df (pandas.core.frame.DataFrame) –
- Return type
pandas.core.frame.DataFrame
- filter_unneeded_row(prices_df, logger=<built-in function print>, epsilon=0.0025)
Dedpulicate data rows with epsilon.
Reduce data size by elimating rows where the value changes is too little
Remove rows where the total asset/share price/total supply change has been too small
Note
This filter conly yields 2% savings in row count, so it turned out not to be worth of the problems.
- Parameters
prices_df (pandas.core.frame.DataFrame) – Assume sorted by timestsamp
epsilon – Tolerance for floating point comparison
- Return type
pandas.core.frame.DataFrame
- remove_inactive_lead_time(prices_df, logger=<built-in function print>)
Remove initial inactive period from each vault’s price history.
At the beginning of a vault’s lifecycle, total supply may remain constant while the vault is inactive (e.g., 1, 1000, etc.)
When the vault activates, the share price may jump, causing abnormal returns
This function removes the initial rows where total_supply hasn’t changed
Uses exact equality for comparison
Skips initial rows with zero or NaN total_supply to find first valid value
- Parameters
prices_df (pandas.core.frame.DataFrame) – Price data with ‘id’ and ‘total_supply’ columns. Assumes data is sorted by timestamp within each vault.
- Returns
DataFrame with inactive lead time removed for each vault
- Return type
pandas.core.frame.DataFrame
Fix out rows with share price that is too high.
Sometimes share price jump to an outlier value and back
This caused abnormal returns in returns calculations, messing all volatility numbers, sharpe, charts, etc.
The root cause is bad oracles, fat fingers, MEV trades, etc.
See
check-share-pricescript for inspecting individual prices
Case Fluegel DAO:
timestamp
chain
address
block_number
share_price
total_assets
total_supply
2024-07-16 15:02:57
8453
0x277a3c57f3236a7d458576074d7c3d7046eb26c
17176415
1.60
373,740.21
232,929.92
2024-07-16 16:02:57
8453
0x277a3c57f3236a7d458576074d7c3d7046eb26c
17178215
1.63
379,832.59
232,929.92
2024-07-16 17:02:57
8453
0x277a3c57f3236a7d458576074d7c3d7046eb26c
17180015
0.33
75,744.97
232,929.92
2024-07-16 18:02:57
8453
0x277a3c57f3236a7d458576074d7c3d7046eb26c
17181815
1.64
382,282.78
232,929.92
Case Untangle Finance:
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-12 23:14:19 (3206): fixing: 1.038721 -> 1.038721, prev: 1.038827, next: 0.444865 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 00:14:13 (3207): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 01:14:09 (3208): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 02:14:03 (3209): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 03:14:01 (3210): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 04:13:56 (3211): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 05:13:53 (3212): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.468629 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 06:13:51 (3213): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.468629 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 07:13:45 (3214): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.468629 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 08:13:40 (3215): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.468629 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 09:13:37 (3216): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.482511 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 10:13:27 (3217): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.482511 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 11:13:21 (3218): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.482511 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 00:11:51 (3230): fixing: 0.444865 -> 1.0405275, prev: 1.038721, next: 1.042334 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 01:11:42 (3231): fixing: 0.444865 -> 1.0406325, prev: 1.038931, next: 1.042334 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 02:11:33 (3232): fixing: 0.444865 -> 1.0407335, prev: 1.038931, next: 1.042536 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 03:11:36 (3233): fixing: 0.444865 -> 1.0407335, prev: 1.038931, next: 1.042536 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 04:11:26 (3234): fixing: 0.444865 -> 1.0407335, prev: 1.038931, next: 1.042536 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 05:11:17 (3235): fixing: 0.444865 -> 1.0407335, prev: 1.038931, next: 1.042536 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 06:11:10 (3236): fixing: 0.468629 -> 1.0407335, prev: 1.038931, next: 1.042536 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 07:11:01 (3237): fixing: 0.468629 -> 1.040835, prev: 1.039134, next: 1.042536 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 08:10:52 (3238): fixing: 0.468629 -> 1.0406445, prev: 1.039134, next: 1.042155 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 11:10:26 (3239): fixing: 0.468629 -> 1.0406445, prev: 1.039134, next: 1.042155 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 12:10:18 (3240): fixing: 0.482511 -> 1.0406445, prev: 1.039134, next: 1.042155 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 13:10:09 (3241): fixing: 0.482511 -> 1.0406445, prev: 1.039134, next: 1.042155 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 14:10:01 (3242): fixing: 0.482511 -> 1.0406445, prev: 1.039134, next: 1.042155 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 04:08:39 (3254): fixing: 1.042334 -> 1.042334, prev: 0.444865, next: 1.04251 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 05:08:33 (3255): fixing: 1.042334 -> 1.042334, prev: 0.444865, next: 1.04251 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 06:08:29 (3256): fixing: 1.042536 -> 1.042536, prev: 0.444865, next: 1.04251 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 07:08:24 (3257): fixing: 1.042536 -> 1.042536, prev: 0.444865, next: 1.04251 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 08:08:20 (3258): fixing: 1.042536 -> 1.042536, prev: 0.444865, next: 1.04251 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 09:08:12 (3259): fixing: 1.042536 -> 1.042536, prev: 0.444865, next: 1.04251 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 10:08:07 (3260): fixing: 1.042536 -> 1.042536, prev: 0.468629, next: 1.042519 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 11:08:03 (3261): fixing: 1.042536 -> 1.042536, prev: 0.468629, next: 1.042519 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 12:07:55 (3262): fixing: 1.042155 -> 1.042155, prev: 0.468629, next: 1.042519 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 13:07:51 (3263): fixing: 1.042155 -> 1.042155, prev: 0.468629, next: 1.042519 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 14:07:49 (3264): fixing: 1.042155 -> 1.042155, prev: 0.482511, next: 1.042519 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 15:07:42 (3265): fixing: 1.042155 -> 1.042155, prev: 0.482511, next: 1.042519 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 16:07:36 (3266): fixing: 1.042155 -> 1.042155, prev: 0.482511, next: 1.042354 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-23 02:04:05 (3433): fixing: 1.036482 -> 1.036482, prev: 1.126302, next: 0.487429 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-24 06:01:21 (3457): fixing: 0.487429 -> 1.041995, prev: 1.036482, next: 1.047508
- Parameters
prices_df (pandas.core.frame.DataFrame) –
- Return type
pandas.core.frame.DataFrame
- sort_and_index_vault_prices(prices_df, priority_ids)
Set up the order of vaults for processing.
If we do debugging we want vaults we debug go first, as the pipeline takes several minutes to run
- process_raw_vault_scan_data(rows, prices_df, logger=<built-in function print>, display=<function <lambda>>, diagnose_vault_id=None)
Preprocess vault data for further analysis.
Assign unique names to vaults
Add denormalised vault data to prices DataFrame
Filter out non-stablecoin vaults
Calculate returns, rolling metrics
- Parameters
rows (dict[eth_defi.vault.base.VaultSpec, eth_defi.vault.vaultdb.VaultRow] | eth_defi.vault.vaultdb.VaultDatabase) – Metadata rows from vault database
logger – Notebook / console printer function
display (Callable) – Display Pandas DataFrame function
prices_df (pandas.core.frame.DataFrame) –
diagnose_vault_id (str | None) –
- Return type
pandas.core.frame.DataFrame
- check_missing_metadata(rows, price_ids, logger=<built-in function print>)
Check that we have metadata for all vaults in the prices DataFrame.
Vault id is in format: 56-0x10c90bfcfb3d2a7ae814da1548ae3a7fc31c35a0’
- Parameters
rows (dict) – Metadata rows from vault database
price_ids (pandas.core.series.Series) –
- generate_cleaned_vault_datasets(vault_db_path=PosixPath('/home/runner/.tradingstrategy/vaults/vault-metadata-db.pickle'), price_df_path=PosixPath('/home/runner/.tradingstrategy/vaults/vault-prices-1h.parquet'), cleaned_price_df_path=PosixPath('/home/runner/.tradingstrategy/vaults/cleaned-vault-prices-1h.parquet'), logger=<built-in function print>, display=<function display>, diagnose_vault_id=None)
A command line script entry point to take raw scanned vault price data and clean it up to a format that can be analysed.
Reads
vault-prices-1h.parquetand generatesvault-prices-1h-cleaned.parquetCalculate returns and various performance metrics to be included with prices data
Clean returns from abnormalities
Note
Drops non-stablecoin vaults. The cleaning is currently applicable for stable vaults only.
- Parameters
diagnose_vault_id (str | None) –
- forward_fill_vault(vault_df)
Forward fill missing vault prices up to max_gap_hours.
For displaying, calculating metrics, etc. we want continuous time series
Align random sample interval to 1h
- Parameters
vault_df (pandas.core.frame.DataFrame) –
Price data for a single vault.
Assume 1h price data.
- Return type
pandas.core.frame.DataFrame