data sequence and customized csv dataset #2733
Replies: 1 comment 1 reply
-
The burn/crates/burn-dataset/src/dataset/in_memory.rs Lines 72 to 88 in 245fbcd If that doesn't fit your needs, you can implement your own parsing to create the
You cannot construct a tensor from a string. For NLP tasks, you need to go from the string representation to tokens. This can be done in many different ways, so the implementation is up to the user. Modern techniques involve tokenization, where strings (e.g., sentences) are split into smaller units (e.g., words, subwords, or characters) called tokens, and these tokens are mapped to unique integers using a vocabulary. See for example the tokenizer in the text classification example. |
Beta Was this translation helpful? Give feedback.
-
Hi,
We can construct a
InMemDataset
from a csv file according to burn's example (https://github.com/tracel-ai/burn/blob/main/examples/custom-csv-dataset/src/dataset.rs).But, when the csv is very wide (such as having 1000 columns), it is impossible to construct a struct with all the columns as fields manually. Are there an easy way ?
Besides, how to construct a tensor from a digital string ? Here is my intention:
They may be useful when implementing a LSTM as sequences are needed.
Beta Was this translation helpful? Give feedback.
All reactions