Using Local Data Paths with zea

Most zea examples use Hugging Face links for convenience, but you can also work with local datasets by configuring a users.yaml file that points to your data root. This notebook shows how to set up local paths and load data from your own storage.

Open In Colab   View on GitHub

[ ]:
%%capture
%pip install zea
[ ]:
config_picmus_rf = "hf://zeahub/configs/config_picmus_rf.yaml"

Setting up your users.yaml

Many codebases and projects are littered with hardcoded absolute paths, which can make it difficult to share code or run it on different machines. To avoid this, zea makes use of a users.yaml file to define local data paths. The idea is that users can specify a local data root, and zea will use this to resolve paths dynamically, relative to the user’s data root.

Create a users.yaml file in your project directory. This file tells zea where your local data is stored. Example content:

data_root: /home/your_username/data

Replace /home/your_username/data with the actual path to your data directory.

Tip: You can auto-generate this file by running:

python -m zea.datapaths

and following the prompts.

Using Local Data Paths

Once your users.yaml is set up, you can load data from your local data root. Here’s a minimal example:

[4]:
from zea import set_data_paths

user = set_data_paths("users.yaml")

data_root = user.data_root
username = user.username

print(f"🔔 Hi {username}! You are using data from {data_root}")
🔔 Hi devcontainer15! You are using data from //home/devcontainer15/data

Advanced Data Path Configuration

In the above example, we use the most simple configuration in users.yaml, with just a data_root key. However, there are many more advanced options you can configure using users.yaml. For example, you can specify multiple data roots, for different projects, users and machines. Additionally, you can define a path for local and remote data (if you use for instance a remote storage). Let’s have a look at a more advanced example.

Example: Complex users.yaml Layout

For collaborative projects or when working across multiple machines and operating systems, you can use a more structured users.yaml file. Here is an example:

alice:
  workstation1:
    system: linux
    data_root:
      local: /mnt/data/alice
      remote: /mnt/remote/alice
    output: /mnt/data/alice/output
  laptop:
    system: windows
    data_root: D:/data/alice
    output: D:/data/alice/output
bob:
  server:
    system: linux
    data_root:
      local: /mnt/data/bob
      remote: /mnt/remote/bob
  system: linux
  data_root: /mnt/data/bob
  output: /mnt/data/bob/output
# Default fallback if no user/machine matches
data_root: /mnt/shared/data
output: /mnt/shared/output
  • Each user can have different machines, each with their own system and data_root.

  • data_root can be a string or a dictionary with local and remote keys.

  • If no user or machine matches, the default data_root at the bottom is used.

[2]:
# Example: Select remote data root (if defined in users.yaml)
user_remote = set_data_paths("users.yaml", local=False)
print("Remote data root:", user_remote.data_root)
user_local = set_data_paths("users.yaml", local=True)
print("Local data root:", user_local.data_root)
Remote data root: /mnt/z/Ultrasound-BMd/data
Local data root: //home/devcontainer15/data

Full Environment Setup with setup

For convenience, zea provides a setup function that configures everything in one step: config, data paths, and device (GPU/CPU).

  • This will prompt for missing user profiles if needed, set up data paths, and initialize the device.

  • Use this in your main scripts for reproducible and portable setups.

[ ]:
from zea.internal.setup_zea import setup

# config_path: path to your config YAML file
# user_config: path to your users.yaml file
config = setup(config_path=config_picmus_rf, user_config="users.yaml")

data_root = config.data.user.data_root
device = config.device
zea: Using config file: hf://zeahub/configs/config_picmus_rf.yaml
zea: Git branch and commit: feature/clean-up=9ed781092df9d7fd78d402cadb278a8751f8e34a
zea: Git branch and commit: feature/clean-up=9ed781092df9d7fd78d402cadb278a8751f8e34a
-------------------GPU settings-------------------
-------------------GPU settings-------------------
     memory
GPU
0         968
1       11011
2       11011
3         988
4       11011
5       11011
6         246
7         690
Selecting 1 GPU based on available memory.
Selected GPU 1 with Free Memory: 11011.00 MiB
Hiding GPUs [0, 2, 3, 4, 5, 6, 7] from the system.
--------------------------------------------------
     memory
GPU
0         968
1       11011
2       11011
3         988
4       11011
5       11011
6         246
7         690
Selecting 1 GPU based on available memory.
Selected GPU 1 with Free Memory: 11011.00 MiB
Hiding GPUs [0, 2, 3, 4, 5, 6, 7] from the system.
--------------------------------------------------

Summary

  • Use users.yaml to manage local/remote data roots for different users and systems.

  • Use set_data_paths to resolve your data root dynamically.

  • For advanced setups, structure users.yaml with users, hostnames, and local/remote keys.

  • Use setup for a one-liner to initialize config, data paths, and device.