You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Jeli-ASR is a multidimentional package that was developed with the aim to empower the usage of the Bambara Language. Starting in an initiative to the develop the Bambara Language, and its cultural values. The package is consisted of an ASR model under ongoing development, and a mini corpus of griots narration in [audio](https://zenodo.org/record/6997806), its transcription in eaf which is [ELAN format](https://archive.mpi.nl/tla/elan/download), and a package tool that can yield the transcription in raw text format or json.
4
+
This is a multidimentional open-source package consisting of a dataset & an ASR model. The dataset consists of the transcriptions of 30 hours of griots stories and narrations, and their translations. The corresponding [audio](https://zenodo.org/record/7094702) is hosted on zenodo. The ASR model is an ongoing attempt at an automatic speech recognition model for bambara.
5
+
6
+
## Dataset
7
+
The Griots corpus is a speech corpus containing both audio and its accompanying transcribed text. You can find the intent, the approaches, a detailed look, and a thorough explanation of the dataset on the [Data-Card](./docs/DataCard.pdf). It is about 28k utterances & clips (couting). There are two sub-speech dataset. Griots Narrations and Street Interviews.
8
+
9
+
### Griots Narrations
10
+
These are recording of 30 griots (23 Males / 7 Females) talking about various subjects. In a controlled environment. *The subjects are culture oriented*.
11
+
12
+
### Street Interviews
13
+
Along side the griots' narrations, a smaller sample of individuals were interviewd about the importance of bambara in the technology. These interviews were conducted on the street with background noises.
14
+
15
+
**N.B**: Not all of these audios have been transcribed.
5
16
6
17
## ASR - Model
7
-
[TODO]
18
+
### Kaldi
19
+
### Wav2Vec
20
+
### Espnet
8
21
9
-
## Corpus
10
-
The Griots corpus is a speech corpus containing both audio and its accompanying transcribed text. You can find the intent, the approaches, a detailed look, and a thorough explanation of the dataset on the [Data-Card](). Refer to the following list of recordings and the general meta information about the recordings:
22
+
<!-- ### Keras Transfomer -->
11
23
12
-
### Griots Narrations
24
+
## jelipkg toolkit (Jeli => Griot in Bambara)
25
+
<code>jelipkg</code> is sub-package that serves as an entry point to the corpus. It is a python package that allows you to browse, and download the corpus for your own convenience, you can download the textual data either in raw text format or json format. The package can be used to download the audio in batch format or as clips (utterance) format.
<code>jelipkg</code> is sub-package that serves as an entry point to the corpus. It is a python package that allows you to browse, and download the corpus for your own convenience, you can download the textual data either in raw text format or json format.
This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
Jeli-ASR is a multidimentional package that was developed with the aim to empower the usage of the Bambara Language. Starting in an initiative to the develop the Bambara Language, and its cultural values. The package is consisted of an ASR model under ongoing development, and a mini corpus of griots narration in [audio](https://zenodo.org/record/6997806), its transcription in ***eaf*** which is [ELAN format](https://archive.mpi.nl/tla/elan/download), and a tool to download and exctract the dataset.
5
+
6
+
## ASR - Model
7
+
[TODO]
8
+
9
+
## Corpus
10
+
The Griots corpus is a speech corpus containing both audio and its accompanying transcribed text. You can find the intent, the approaches, a detailed look, and a thorough explanation of the dataset on the [Data-Card (coming)](). Refer to the following list of recordings and the general meta information about the recordings:
<code>jelipkg</code> is sub-package that serves as an entry point to the corpus. It is a python package that allows you to browse, and download the corpus for your own convenience, you can download the textual data either in raw text format or json format.
This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
0 commit comments