{ "cells": [ { "cell_type": "markdown", "id": "925409c8", "metadata": {}, "source": [ "# Using Local Data Paths with `zea`\n", "\n", "Most zea examples use Hugging Face links for convenience, but you can also work with local datasets by configuring a `users.yaml` file that points to your data root. This notebook shows how to set up local paths and load data from your own storage." ] }, { "cell_type": "markdown", "id": "e8eb21ff", "metadata": {}, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tue-bmd/zea/blob/main/docs/source/notebooks/data/zea_local_data.ipynb)\n", " \n", "[![View on GitHub](https://img.shields.io/badge/GitHub-View%20Source-blue?logo=github)](https://github.com/tue-bmd/zea/blob/main/docs/source/notebooks/data/zea_local_data.ipynb)" ] }, { "cell_type": "code", "execution_count": null, "id": "f6e1f554", "metadata": {}, "outputs": [], "source": [ "%%capture\n", "%pip install zea" ] }, { "cell_type": "code", "execution_count": null, "id": "613908ac", "metadata": { "tags": [ "parameters" ] }, "outputs": [], "source": [ "config_picmus_rf = \"hf://zeahub/configs/config_picmus_rf.yaml\"" ] }, { "cell_type": "markdown", "id": "9c5f1be2", "metadata": {}, "source": [ "## Setting up your `users.yaml`\n", "\n", "Many codebases and projects are littered with hardcoded absolute paths, which can make it difficult to share code or run it on different machines. To avoid this, zea makes use of a `users.yaml` file to define local data paths. The idea is that users can specify a local data root, and zea will use this to resolve paths dynamically, relative to the user's data root.\n", "\n", "Create a `users.yaml` file in your project directory. This file tells zea where your local data is stored. Example content:\n", "\n", "```yaml\n", "data_root: /home/your_username/data\n", "```\n", "\n", "Replace `/home/your_username/data` with the actual path to your data directory.\n", "\n", "> **Tip:** You can auto-generate this file by running:\n", "> ```\n", "> python -m zea.datapaths\n", "> ```\n", "> and following the prompts." ] }, { "cell_type": "markdown", "id": "60e92c98", "metadata": {}, "source": [ "## Using Local Data Paths\n", "\n", "Once your `users.yaml` is set up, you can load data from your local data root. Here's a minimal example:" ] }, { "cell_type": "code", "execution_count": 4, "id": "5225fcd8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "🔔 Hi devcontainer15! You are using data from //home/devcontainer15/data\n" ] } ], "source": [ "from zea import set_data_paths\n", "\n", "user = set_data_paths(\"users.yaml\")\n", "\n", "data_root = user.data_root\n", "username = user.username\n", "\n", "print(f\"🔔 Hi {username}! You are using data from {data_root}\")" ] }, { "cell_type": "markdown", "id": "ebec6ada", "metadata": {}, "source": [ "## Advanced Data Path Configuration\n", "\n", "In the above example, we use the most simple configuration in `users.yaml`, with just a `data_root` key. However, there are many more advanced options you can configure using `users.yaml`. For example, you can specify multiple data roots, for different projects, users and machines. Additionally, you can define a path for local and remote data (if you use for instance a remote storage). Let's have a look at a more advanced example." ] }, { "cell_type": "markdown", "id": "0727e2fd", "metadata": {}, "source": [ "### Example: Complex `users.yaml` Layout\n", "\n", "For collaborative projects or when working across multiple machines and operating systems, you can use a more structured `users.yaml` file. Here is an example:\n", "\n", "```yaml\n", "alice:\n", " workstation1:\n", " system: linux\n", " data_root:\n", " local: /mnt/data/alice\n", " remote: /mnt/remote/alice\n", " output: /mnt/data/alice/output\n", " laptop:\n", " system: windows\n", " data_root: D:/data/alice\n", " output: D:/data/alice/output\n", "bob:\n", " server:\n", " system: linux\n", " data_root:\n", " local: /mnt/data/bob\n", " remote: /mnt/remote/bob\n", " system: linux\n", " data_root: /mnt/data/bob\n", " output: /mnt/data/bob/output\n", "# Default fallback if no user/machine matches\n", "data_root: /mnt/shared/data\n", "output: /mnt/shared/output\n", "```\n", "\n", "- Each user can have different machines, each with their own `system` and `data_root`.\n", "- `data_root` can be a string or a dictionary with `local` and `remote` keys.\n", "- If no user or machine matches, the default `data_root` at the bottom is used." ] }, { "cell_type": "code", "execution_count": 2, "id": "7fa9c2d9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Remote data root: /mnt/z/Ultrasound-BMd/data\n", "Local data root: //home/devcontainer15/data\n" ] } ], "source": [ "# Example: Select remote data root (if defined in users.yaml)\n", "user_remote = set_data_paths(\"users.yaml\", local=False)\n", "print(\"Remote data root:\", user_remote.data_root)\n", "user_local = set_data_paths(\"users.yaml\", local=True)\n", "print(\"Local data root:\", user_local.data_root)" ] }, { "cell_type": "markdown", "id": "a4f9f007", "metadata": {}, "source": [ "## Full Environment Setup with `setup`\n", "\n", "For convenience, zea provides a `setup` function that configures everything in one step: config, data paths, and device (GPU/CPU).\n", "\n", "- This will prompt for missing user profiles if needed, set up data paths, and initialize the device.\n", "- Use this in your main scripts for reproducible and portable setups." ] }, { "cell_type": "code", "execution_count": null, "id": "a2de03cd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m\u001b[38;5;36mzea\u001b[0m\u001b[0m: Using config file: \u001b[33mhf://zeahub/configs/config_picmus_rf.yaml\u001b[0m\n", "\u001b[1m\u001b[38;5;36mzea\u001b[0m\u001b[0m: Git branch and commit: feature/clean-up=9ed781092df9d7fd78d402cadb278a8751f8e34a\n", "\u001b[1m\u001b[38;5;36mzea\u001b[0m\u001b[0m: Git branch and commit: feature/clean-up=9ed781092df9d7fd78d402cadb278a8751f8e34a\n", "-------------------GPU settings-------------------\n", "-------------------GPU settings-------------------\n", " memory\n", "GPU \n", "0 968\n", "1 11011\n", "2 11011\n", "3 988\n", "4 11011\n", "5 11011\n", "6 246\n", "7 690\n", "Selecting 1 GPU based on available memory.\n", "Selected GPU 1 with Free Memory: 11011.00 MiB\n", "Hiding GPUs [0, 2, 3, 4, 5, 6, 7] from the system.\n", "--------------------------------------------------\n", " memory\n", "GPU \n", "0 968\n", "1 11011\n", "2 11011\n", "3 988\n", "4 11011\n", "5 11011\n", "6 246\n", "7 690\n", "Selecting 1 GPU based on available memory.\n", "Selected GPU 1 with Free Memory: 11011.00 MiB\n", "Hiding GPUs [0, 2, 3, 4, 5, 6, 7] from the system.\n", "--------------------------------------------------\n" ] } ], "source": [ "from zea.internal.setup_zea import setup\n", "\n", "# config_path: path to your config YAML file\n", "# user_config: path to your users.yaml file\n", "config = setup(config_path=config_picmus_rf, user_config=\"users.yaml\")\n", "\n", "data_root = config.data.user.data_root\n", "device = config.device" ] }, { "cell_type": "markdown", "id": "c967677b", "metadata": {}, "source": [ "## Summary\n", "\n", "- Use `users.yaml` to manage local/remote data roots for different users and systems.\n", "- Use `set_data_paths` to resolve your data root dynamically.\n", "- For advanced setups, structure `users.yaml` with users, hostnames, and local/remote keys.\n", "- Use `setup` for a one-liner to initialize config, data paths, and device." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }