research.wrangle_vault_prices

Documentation for eth_defi.research.wrangle_vault_prices Python module.

Clean vault price data.

  • Denormalise data to a single DataFrame

  • Remove abnormalities in the price data

  • Reduce data by removing hourly changes that are below our epsilon threshold

  • Generate returns data

Module Attributes

PRIORITY_SORT_IDS

For manual debugging, we process these vaults first

VAULT_STATE_COLUMNS

Vault state columns added by the historical scanner.

Functions

add_denormalised_vault_data(rows, prices_df)

Add denormalised data to the prices DataFrame.

assign_unique_names(rows, prices_df[, ...])

Ensure all vaults have unique human-readable name.

calculate_vault_returns(prices_df[, logger])

Calculate returns for each vault.

check_missing_metadata(rows, price_ids[, logger])

Check that we have metadata for all vaults in the prices DataFrame.

clean_by_tvl(rows, prices_df[, logger, ...])

TVL-based threshold filtering of returns.

clean_returns(rows, prices_df[, logger, ...])

Clean returns data by removing rows with NaN or infinite values.

ensure_vault_state_columns(prices_df)

Ensure vault state columns are present in the DataFrame.

filter_unneeded_row(prices_df[, logger, epsilon])

Dedpulicate data rows with epsilon.

filter_vaults_by_stablecoin(rows, prices_df)

Reduce vaults to stablecoin vaults only.

fix_outlier_share_prices(prices_df[, ...])

Fix out rows with share price that is too high.

forward_fill_vault(vault_df)

Forward fill missing vault prices up to max_gap_hours.

generate_cleaned_vault_datasets([...])

A command line script entry point to take raw scanned vault price data and clean it up to a format that can be analysed.

get_vaults_by_id(rows)

Build a dictionary of vaults by their chain-address id.

process_raw_vault_scan_data(rows, prices_df)

Preprocess vault data for further analysis.

remove_inactive_lead_time(prices_df[, logger])

Remove initial inactive period from each vault's price history.

sort_and_index_vault_prices(prices_df, ...)

Set up the order of vaults for processing.

PRIORITY_SORT_IDS = ['8453-0x0d877dc7c8fa3ad980dfdb18b48ec9f8768359c4']

For manual debugging, we process these vaults first

get_vaults_by_id(rows)

Build a dictionary of vaults by their chain-address id.

Parameters

rows (dict[eth_defi.vault.base.VaultSpec, eth_defi.vault.vaultdb.VaultRow]) – Metadata rows from vault database

Returns

Dictionary of vaults by their chain-address id

Return type

dict[str, eth_defi.vault.vaultdb.VaultRow]

VAULT_STATE_COLUMNS = {'deposits_open': '', 'max_deposit': nan, 'max_redeem': nan, 'redemption_open': '', 'trading': ''}

Vault state columns added by the historical scanner. Ensure these are always present in cleaned data, even when processing old scan data that lacks them.

ensure_vault_state_columns(prices_df)

Ensure vault state columns are present in the DataFrame.

  • Adds missing columns with default values for backward compatibility with raw scan data generated before these fields were added.

Parameters

prices_df (pandas.core.frame.DataFrame) –

Return type

pandas.core.frame.DataFrame

assign_unique_names(rows, prices_df, logger=<built-in function print>, duplicate_nav_threshold=1000)

Ensure all vaults have unique human-readable name.

  • Rerwrite metadata rows

  • Find duplicate vault names

  • Add a running counter to the name to make it unique

Parameters
Return type

pandas.core.frame.DataFrame

add_denormalised_vault_data(rows, prices_df, logger=<built-in function print>)

Add denormalised data to the prices DataFrame.

  • Take data from vault database and duplicate it across every row

  • Add protocol name and event count columns

Parameters
Return type

pandas.core.frame.DataFrame

filter_vaults_by_stablecoin(rows, prices_df, logger=<built-in function print>)

Reduce vaults to stablecoin vaults only.

Parameters
Return type

pandas.core.frame.DataFrame

calculate_vault_returns(prices_df, logger=<built-in function print>)

Calculate returns for each vault.

  • Filter out reads for which we did not get a proper share price

  • Add returns_1h columns

Example of input data:

     chain                                     address  block_number           timestamp  share_price  ...  errors                                                id  name  event_count            protocol
207  42161  0x487cdc7d21ac8765eff6c0e681aea36ae1594471      13294721 2022-05-30 19:59:22          1.0  ...          42161-0x487cdc7d21ac8765eff6c0e681aea36ae1594471  LDAI           17  <unknown ERC-4626>
Parameters

prices_df (pandas.core.frame.DataFrame) –

clean_returns(rows, prices_df, logger=<built-in function print>, outlier_threshold=0.5, display=<function <lambda>>, returns_col='returns_1h')

Clean returns data by removing rows with NaN or infinite values.

  • In returns data we have outliers that are likely not real returns, or one-time events that cannot repeat.
    • Floating point errors: [Share price may jumps wildly when a vault TVL is near zero](https://x.com/0xSEM/status/1914748782102630455)

    • Bugs: Vault share price method to estimate returns does not work for a particular airdrop

    • Airdrops: Vault gets an irregular rewards that will not repeat, and thus are not good to estimate the future performance

  • We clean returns by doing an assumptions - Daily returns higher than static outlier - Daily TVL max does not make sense - Daily TVL min does not make sense - Daily TVL % below lifetime average TVL

Parameters
Return type

pandas.core.frame.DataFrame

clean_by_tvl(rows, prices_df, logger=<built-in function print>, tvl_threshold_min=1000.0, tvl_threshold_max=99000000000, tvl_threshold_min_dynamic=0.02, returns_col='returns_1h')

TVL-based threshold filtering of returns.

Parameters
Return type

pandas.core.frame.DataFrame

filter_unneeded_row(prices_df, logger=<built-in function print>, epsilon=0.0025)

Dedpulicate data rows with epsilon.

  • Reduce data size by elimating rows where the value changes is too little

  • Remove rows where the total asset/share price/total supply change has been too small

Note

This filter conly yields 2% savings in row count, so it turned out not to be worth of the problems.

Parameters
  • prices_df (pandas.core.frame.DataFrame) – Assume sorted by timestsamp

  • epsilon – Tolerance for floating point comparison

Return type

pandas.core.frame.DataFrame

remove_inactive_lead_time(prices_df, logger=<built-in function print>)

Remove initial inactive period from each vault’s price history.

  • At the beginning of a vault’s lifecycle, total supply may remain constant while the vault is inactive (e.g., 1, 1000, etc.)

  • When the vault activates, the share price may jump, causing abnormal returns

  • This function removes the initial rows where total_supply hasn’t changed

  • Uses exact equality for comparison

  • Skips initial rows with zero or NaN total_supply to find first valid value

Parameters

prices_df (pandas.core.frame.DataFrame) – Price data with ‘id’ and ‘total_supply’ columns. Assumes data is sorted by timestamp within each vault.

Returns

DataFrame with inactive lead time removed for each vault

Return type

pandas.core.frame.DataFrame

fix_outlier_share_prices(prices_df, logger=<built-in function print>, max_diff=0.33, look_back=24, look_ahead=24)

Fix out rows with share price that is too high.

  • Sometimes share price jump to an outlier value and back

  • This caused abnormal returns in returns calculations, messing all volatility numbers, sharpe, charts, etc.

  • The root cause is bad oracles, fat fingers, MEV trades, etc.

  • See check-share-price script for inspecting individual prices

Case Fluegel DAO:

timestamp

chain

address

block_number

share_price

total_assets

total_supply

2024-07-16 15:02:57

8453

0x277a3c57f3236a7d458576074d7c3d7046eb26c

17176415

1.60

373,740.21

232,929.92

2024-07-16 16:02:57

8453

0x277a3c57f3236a7d458576074d7c3d7046eb26c

17178215

1.63

379,832.59

232,929.92

2024-07-16 17:02:57

8453

0x277a3c57f3236a7d458576074d7c3d7046eb26c

17180015

0.33

75,744.97

232,929.92

2024-07-16 18:02:57

8453

0x277a3c57f3236a7d458576074d7c3d7046eb26c

17181815

1.64

382,282.78

232,929.92

Case Untangle Finance:

Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-12 23:14:19 (3206): fixing: 1.038721 -> 1.038721, prev: 1.038827, next: 0.444865
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 00:14:13 (3207): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 01:14:09 (3208): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 02:14:03 (3209): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 03:14:01 (3210): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 04:13:56 (3211): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 05:13:53 (3212): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.468629
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 06:13:51 (3213): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.468629
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 07:13:45 (3214): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.468629
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 08:13:40 (3215): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.468629
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 09:13:37 (3216): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.482511
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 10:13:27 (3217): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.482511
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 11:13:21 (3218): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.482511
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 00:11:51 (3230): fixing: 0.444865 -> 1.0405275, prev: 1.038721, next: 1.042334
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 01:11:42 (3231): fixing: 0.444865 -> 1.0406325, prev: 1.038931, next: 1.042334
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 02:11:33 (3232): fixing: 0.444865 -> 1.0407335, prev: 1.038931, next: 1.042536
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 03:11:36 (3233): fixing: 0.444865 -> 1.0407335, prev: 1.038931, next: 1.042536
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 04:11:26 (3234): fixing: 0.444865 -> 1.0407335, prev: 1.038931, next: 1.042536
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 05:11:17 (3235): fixing: 0.444865 -> 1.0407335, prev: 1.038931, next: 1.042536
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 06:11:10 (3236): fixing: 0.468629 -> 1.0407335, prev: 1.038931, next: 1.042536
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 07:11:01 (3237): fixing: 0.468629 -> 1.040835, prev: 1.039134, next: 1.042536
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 08:10:52 (3238): fixing: 0.468629 -> 1.0406445, prev: 1.039134, next: 1.042155
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 11:10:26 (3239): fixing: 0.468629 -> 1.0406445, prev: 1.039134, next: 1.042155
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 12:10:18 (3240): fixing: 0.482511 -> 1.0406445, prev: 1.039134, next: 1.042155
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 13:10:09 (3241): fixing: 0.482511 -> 1.0406445, prev: 1.039134, next: 1.042155
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 14:10:01 (3242): fixing: 0.482511 -> 1.0406445, prev: 1.039134, next: 1.042155
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 04:08:39 (3254): fixing: 1.042334 -> 1.042334, prev: 0.444865, next: 1.04251
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 05:08:33 (3255): fixing: 1.042334 -> 1.042334, prev: 0.444865, next: 1.04251
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 06:08:29 (3256): fixing: 1.042536 -> 1.042536, prev: 0.444865, next: 1.04251
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 07:08:24 (3257): fixing: 1.042536 -> 1.042536, prev: 0.444865, next: 1.04251
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 08:08:20 (3258): fixing: 1.042536 -> 1.042536, prev: 0.444865, next: 1.04251
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 09:08:12 (3259): fixing: 1.042536 -> 1.042536, prev: 0.444865, next: 1.04251
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 10:08:07 (3260): fixing: 1.042536 -> 1.042536, prev: 0.468629, next: 1.042519
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 11:08:03 (3261): fixing: 1.042536 -> 1.042536, prev: 0.468629, next: 1.042519
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 12:07:55 (3262): fixing: 1.042155 -> 1.042155, prev: 0.468629, next: 1.042519
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 13:07:51 (3263): fixing: 1.042155 -> 1.042155, prev: 0.468629, next: 1.042519
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 14:07:49 (3264): fixing: 1.042155 -> 1.042155, prev: 0.482511, next: 1.042519
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 15:07:42 (3265): fixing: 1.042155 -> 1.042155, prev: 0.482511, next: 1.042519
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 16:07:36 (3266): fixing: 1.042155 -> 1.042155, prev: 0.482511, next: 1.042354
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-23 02:04:05 (3433): fixing: 1.036482 -> 1.036482, prev: 1.126302, next: 0.487429
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-24 06:01:21 (3457): fixing: 0.487429 -> 1.041995, prev: 1.036482, next: 1.047508
Parameters

prices_df (pandas.core.frame.DataFrame) –

Return type

pandas.core.frame.DataFrame

sort_and_index_vault_prices(prices_df, priority_ids)

Set up the order of vaults for processing.

  • If we do debugging we want vaults we debug go first, as the pipeline takes several minutes to run

Parameters
  • prices_df (pandas.core.frame.DataFrame) –

  • priority_ids (list[str]) –

process_raw_vault_scan_data(rows, prices_df, logger=<built-in function print>, display=<function <lambda>>, diagnose_vault_id=None)

Preprocess vault data for further analysis.

  • Assign unique names to vaults

  • Add denormalised vault data to prices DataFrame

  • Filter out non-stablecoin vaults

  • Calculate returns, rolling metrics

Parameters
Return type

pandas.core.frame.DataFrame

check_missing_metadata(rows, price_ids, logger=<built-in function print>)

Check that we have metadata for all vaults in the prices DataFrame.

Vault id is in format: 56-0x10c90bfcfb3d2a7ae814da1548ae3a7fc31c35a0’

Parameters
  • rows (dict) – Metadata rows from vault database

  • price_ids (pandas.core.series.Series) –

generate_cleaned_vault_datasets(vault_db_path=PosixPath('/home/runner/.tradingstrategy/vaults/vault-metadata-db.pickle'), price_df_path=PosixPath('/home/runner/.tradingstrategy/vaults/vault-prices-1h.parquet'), cleaned_price_df_path=PosixPath('/home/runner/.tradingstrategy/vaults/cleaned-vault-prices-1h.parquet'), logger=<built-in function print>, display=<function display>, diagnose_vault_id=None)

A command line script entry point to take raw scanned vault price data and clean it up to a format that can be analysed.

  • Reads vault-prices-1h.parquet and generates vault-prices-1h-cleaned.parquet

  • Calculate returns and various performance metrics to be included with prices data

  • Clean returns from abnormalities

Note

Drops non-stablecoin vaults. The cleaning is currently applicable for stable vaults only.

Parameters

diagnose_vault_id (str | None) –

forward_fill_vault(vault_df)

Forward fill missing vault prices up to max_gap_hours.

  • For displaying, calculating metrics, etc. we want continuous time series

  • Align random sample interval to 1h

Parameters

vault_df (pandas.core.frame.DataFrame) –

Price data for a single vault.

Assume 1h price data.

Return type

pandas.core.frame.DataFrame