Skip to content

Commit 48a64d3

Browse files
authored
Add images and csv dataset source to book (#2179)
1 parent e1fed79 commit 48a64d3

File tree

1 file changed

+50
-1
lines changed

1 file changed

+50
-1
lines changed

burn-book/src/building-blocks/dataset.md

+50-1
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ dataset to use should be based on the dataset's size as well as its intended pur
104104

105105
## Sources
106106

107-
For now, there is only one dataset source available with Burn, but more to come!
107+
For now, there are only a couple of dataset sources available with Burn, but more to come!
108108

109109
### Hugging Face
110110

@@ -131,6 +131,55 @@ fn main() {
131131
We see that items must derive `serde::Serialize`, `serde::Deserialize`, `Clone`, and `Debug`, but
132132
those are the only requirements.
133133

134+
### Images
135+
136+
`ImageFolderDataset` is a generic vision dataset used to load images from disk. It is currently
137+
available for multi-class and multi-label classification tasks.
138+
139+
```rust, ignore
140+
// Create an image classification dataset from the root folder,
141+
// where images for each class are stored in their respective folder.
142+
//
143+
// For example:
144+
// root/dog/dog1.png
145+
// root/dog/dog2.png
146+
// ...
147+
// root/cat/cat1.png
148+
let dataset = ImageFolderDataset::new_classification("path/to/dataset/root").unwrap();
149+
```
150+
151+
```rust, ignore
152+
// Create a multi-label image classification dataset from a list of items,
153+
// where each item is a tuple `(image path, labels)`, and a list of classes
154+
// in the dataset.
155+
//
156+
// For example:
157+
let items = vec![
158+
("root/dog/dog1.png", vec!["animal".to_string(), "dog".to_string()]),
159+
("root/cat/cat1.png", vec!["animal".to_string(), "cat".to_string()]),
160+
];
161+
let dataset = ImageFolderDataset::new_multilabel_classification_with_items(
162+
items,
163+
&["animal", "cat", "dog"],
164+
)
165+
.unwrap();
166+
```
167+
168+
### Comma-Separated Values (CSV)
169+
170+
Loading records from a simple CSV file in-memory is simple with the `InMemDataset`:
171+
172+
```rust, ignore
173+
// Build dataset from csv with tab ('\t') delimiter.
174+
// The reader can be configured for your particular file.
175+
let mut rdr = csv::ReaderBuilder::new();
176+
let rdr = rdr.delimiter(b'\t');
177+
178+
let dataset = InMemDataset::from_csv("path/to/csv", rdr).unwrap();
179+
```
180+
181+
Note that this requires the `csv` crate.
182+
134183
**What about streaming datasets?**
135184

136185
There is no streaming dataset API with Burn, and this is by design! The learner struct will iterate

0 commit comments

Comments
 (0)