Skip to content

OpenMined/syft-datasets-experimental

πŸ” Syft-Datasets

Find Data Fast. Build Projects Faster.

The easiest way to discover and select datasets in the SyftBox ecosystem. Every great Syft project starts with finding the right dataβ€”syft-datasets makes that effortless.

🎬 See It In Action

Syft-Datasets Interactive Demo

Search, select, and generate code for your datasets in seconds

πŸš€ What This Does

Problem: Finding and working with datasets across SyftBox datasites is tedious
Solution: Beautiful interactive interface that makes dataset discovery a breeze

import syft_datasets as syd

# 1. Browse all available datasets
syd.datasets  # Shows interactive table in Jupyter

# 2. Search for what you need  
crop_data = syd.datasets.search("crop")

# 3. Select datasets visually with checkboxes
# 4. Click "Generate Code" β†’ automatic copy to clipboard
# 5. Paste and use immediately

# Selected datasets:
datasets = [syd.datasets[i] for i in [0, 1, 5]]

⚑ Key Features

  • 🎨 Interactive Jupyter UI - Beautiful HTML tables with real-time search
  • πŸ” Smart Search - Find datasets by name, email, or keyword
  • β˜‘οΈ Visual Selection - Checkbox interface with one-click code generation
  • πŸ“‹ Copy-Paste Ready - Generated code works immediately
  • 🌐 Cross-Datasite - Discovers datasets across all your connected datasites
  • πŸ›‘οΈ Privacy-First - Respects SyftBox security model

πŸ“¦ Installation

pip install syft-datasets

🎯 Quick Start

import syft_datasets as syd

# Interactive dataset browsing (run in Jupyter)
syd.datasets

# Search for specific datasets
financial_data = syd.datasets.search("financial")
andrew_datasets = syd.datasets.filter_by_email("andrew")

# Use datasets by index
first_dataset = syd.datasets[0]
first_five = syd.datasets[:5]
specific_ones = syd.datasets.get_by_indices([0, 2, 7])

πŸ”— Perfect With Syft-NSAI

Use your selected datasets with OpenAI-compatible chat:

import syft_datasets as syd
import syft_nsai as nsai

# 1. Find your data
selected_datasets = [syd.datasets[i] for i in [0, 1, 5]]

# 2. Analyze with AI
response = nsai.client.chat.completions.create(
    model=selected_datasets,
    messages=[{"role": "user", "content": "What insights can you find?"}]
)

🎨 Interactive Features

The Jupyter Interface Includes:

  • Real-time search box - Filter as you type
  • Sortable columns - Email, dataset name, URLs
  • Checkbox selection - Visual dataset picking
  • One-click code generation - Automatic clipboard copy
  • Connection status - See your SyftBox status at a glance

Programmatic Interface:

# Get dataset information
dataset = syd.datasets[0]
print(f"Dataset: {dataset.name}")
print(f"From: {dataset.email}")  
print(f"URL: {dataset.syft_url}")

# Utility methods
syd.datasets.list_unique_emails()    # All available emails
syd.datasets.list_unique_names()     # All dataset names  
len(syd.datasets)                    # Count datasets
syd.datasets.help()                  # Show help

πŸ› οΈ Advanced Usage

Chain Operations

# Find crop datasets from specific users
crop_data = syd.datasets.search("crop").filter_by_email("andrew")

# Select datasets matching patterns
ml_datasets = [ds for ds in syd.datasets 
               if any(kw in ds.name.lower() for kw in ['model', 'train'])]

Custom Workflows

# Group datasets by domain
from collections import defaultdict
by_domain = defaultdict(list)
for ds in syd.datasets:
    domain = ds.email.split('@')[1]
    by_domain[domain].append(ds)

βš™οΈ Requirements

  • SyftBox installed and running
  • Jupyter notebook environment
  • Python 3.9+

🚦 Connection Status

Syft-datasets automatically checks:

  • βœ… SyftBox filesystem access
  • βœ… SyftBox app running status
  • πŸ“Š Available datasites and datasets

🀝 Contributing

# Development setup
git clone https://github.com/OpenMined/syft-datasets.git
cd syft-datasets
pip install -e ".[dev]"

# Run tests
pytest

# Format code  
ruff format

πŸ“„ License

Apache 2.0 License - see LICENSE file.


Start your next Syft project rightβ€”find your data first. 🎯

About

Interactive dataset discovery and management for SyftBox applications

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published