The NaijaVoices Dataset


(DOI: 10.57967/hf/3257)
The NaijaVoices dataset captures the essence of the Nigerian culture in the following ways:

  • ✅ Authentic, expert-generated, contextualized sentences. The kind of originality you won't see on the internet!
  • ✅ 1,800+ hours of quality recordings from more than 5000 diverse speakers.
  • ✅ Encompassing the three major Nigerian languages, along with our various speaking styles: from youthful to elder, diverse intonations, dialects, accents, and more.

Our dataset is licensed under the CC BY-NC-SA 4.0 license. For commercial interests, please check out our memberships.

For partnerships & collaborations, reach out to our community at [email protected].

⬇️ Listen to some samples below.

Naijiria ga-ebido ntuliaka nke afọ a.

Onye isi nchekwa ga-enye nkọwa nke ihe ọ bụla mere n'ime ụlọ akwụkwọ ahụ.

Mba, anyi eleghị televishon

Enwere m ntụkwasị obi na ọ ga-enwe ntụkwasị obi na onwe ya n'oge ngosi

Mo ma lọ sí ọjà àárín gbùngbùn Kánò láti ra ata tí kò gbowó lórí

Arábìnrin yẹn àti òfófó ti di ọ̀rẹ́.

Wọn ní ki n wa gba ike Ìgbowó pélébé mi ni ọ̀la

Ò̩gbé̩ni Rauf Aré̩gbé̩s̩o̩lá fi ìgbà jé̩ gómìnà Ìpínlè̩ Ò̩s̩un

Folosi maganine da masu juna biyu ake amfani dashi.

Wacce cuta ce take da saurin kissa?

Dolle me yi yarjejeniyar zaman lafiya yanzu!

Tamim yana da rauni a ɓangaren dama na fuskarsa.

Explore Membership to Access Full Dataset