Data visualisation of Mozilla Common Voice v9 dataset metadata coverage in Observable

This visualisation uses “@d3/stacked-horizontal-bar-chart” to visualise the Common Voice metadata coverage.

The original data is taken from the Common Voice cv-dataset repository – direct link

Splits by age range – shows how many clips have been provided by speakers of different age ranges for each locale (language)
Splits by gender – shows how many clips have been provided by speakers of different genders for each locale (language)
Average utterance duration by language – shows the average length of the utterance in seconds
Total hours versus validated hours by language – compares the # of hours of recordings to the # of hours of validated recordings