Using Local Data Paths with zea
¶
Most zea examples use Hugging Face links for convenience, but you can also work with local datasets by configuring a users.yaml
file that points to your data root. This notebook shows how to set up local paths and load data from your own storage.
[ ]:
%%capture
%pip install zea
[ ]:
config_picmus_rf = "hf://zeahub/configs/config_picmus_rf.yaml"
Setting up your users.yaml
¶
Many codebases and projects are littered with hardcoded absolute paths, which can make it difficult to share code or run it on different machines. To avoid this, zea makes use of a users.yaml
file to define local data paths. The idea is that users can specify a local data root, and zea will use this to resolve paths dynamically, relative to the user’s data root.
Create a users.yaml
file in your project directory. This file tells zea where your local data is stored. Example content:
data_root: /home/your_username/data
Replace /home/your_username/data
with the actual path to your data directory.
Tip: You can auto-generate this file by running:
python -m zea.datapathsand following the prompts.
Using Local Data Paths¶
Once your users.yaml
is set up, you can load data from your local data root. Here’s a minimal example:
[4]:
from zea import set_data_paths
user = set_data_paths("users.yaml")
data_root = user.data_root
username = user.username
print(f"🔔 Hi {username}! You are using data from {data_root}")
🔔 Hi devcontainer15! You are using data from //home/devcontainer15/data
Advanced Data Path Configuration¶
In the above example, we use the most simple configuration in users.yaml
, with just a data_root
key. However, there are many more advanced options you can configure using users.yaml
. For example, you can specify multiple data roots, for different projects, users and machines. Additionally, you can define a path for local and remote data (if you use for instance a remote storage). Let’s have a look at a more advanced example.
Example: Complex users.yaml
Layout¶
For collaborative projects or when working across multiple machines and operating systems, you can use a more structured users.yaml
file. Here is an example:
alice:
workstation1:
system: linux
data_root:
local: /mnt/data/alice
remote: /mnt/remote/alice
output: /mnt/data/alice/output
laptop:
system: windows
data_root: D:/data/alice
output: D:/data/alice/output
bob:
server:
system: linux
data_root:
local: /mnt/data/bob
remote: /mnt/remote/bob
system: linux
data_root: /mnt/data/bob
output: /mnt/data/bob/output
# Default fallback if no user/machine matches
data_root: /mnt/shared/data
output: /mnt/shared/output
Each user can have different machines, each with their own
system
anddata_root
.data_root
can be a string or a dictionary withlocal
andremote
keys.If no user or machine matches, the default
data_root
at the bottom is used.
[2]:
# Example: Select remote data root (if defined in users.yaml)
user_remote = set_data_paths("users.yaml", local=False)
print("Remote data root:", user_remote.data_root)
user_local = set_data_paths("users.yaml", local=True)
print("Local data root:", user_local.data_root)
Remote data root: /mnt/z/Ultrasound-BMd/data
Local data root: //home/devcontainer15/data
Full Environment Setup with setup
¶
For convenience, zea provides a setup
function that configures everything in one step: config, data paths, and device (GPU/CPU).
This will prompt for missing user profiles if needed, set up data paths, and initialize the device.
Use this in your main scripts for reproducible and portable setups.
[ ]:
from zea.internal.setup_zea import setup
# config_path: path to your config YAML file
# user_config: path to your users.yaml file
config = setup(config_path=config_picmus_rf, user_config="users.yaml")
data_root = config.data.user.data_root
device = config.device
zea: Using config file: hf://zeahub/configs/config_picmus_rf.yaml
zea: Git branch and commit: feature/clean-up=9ed781092df9d7fd78d402cadb278a8751f8e34a
zea: Git branch and commit: feature/clean-up=9ed781092df9d7fd78d402cadb278a8751f8e34a
-------------------GPU settings-------------------
-------------------GPU settings-------------------
memory
GPU
0 968
1 11011
2 11011
3 988
4 11011
5 11011
6 246
7 690
Selecting 1 GPU based on available memory.
Selected GPU 1 with Free Memory: 11011.00 MiB
Hiding GPUs [0, 2, 3, 4, 5, 6, 7] from the system.
--------------------------------------------------
memory
GPU
0 968
1 11011
2 11011
3 988
4 11011
5 11011
6 246
7 690
Selecting 1 GPU based on available memory.
Selected GPU 1 with Free Memory: 11011.00 MiB
Hiding GPUs [0, 2, 3, 4, 5, 6, 7] from the system.
--------------------------------------------------
Summary¶
Use
users.yaml
to manage local/remote data roots for different users and systems.Use
set_data_paths
to resolve your data root dynamically.For advanced setups, structure
users.yaml
with users, hostnames, and local/remote keys.Use
setup
for a one-liner to initialize config, data paths, and device.