Pronunciation Visualization for Cebuano

Fullstack Engineer

Language Learning Advisor
We are launching the first version of Dictionarying's pronunciation visualization engine, and we chose Cebuano as our first language. This post explains why Cebuano, what the science behind the visualization actually is, and how we built it — including the specific acoustic measurements that drive every animation and waveform on the platform.
Why Cebuano
Cebuano (also called Bisaya or Binisaya) is the second most widely spoken language in the Philippines, with an estimated 20–27 million native speakers primarily in the Visayas and Mindanao regions. Despite this scale, it is significantly underrepresented in digital language-learning tools compared to Tagalog or Filipino.
More importantly for our purposes, Cebuano has a well-documented phonological stress system — one that is rich enough to be genuinely useful for learners, and specific enough that imprecise pronunciation causes real misunderstandings. Stress in Cebuano is lexically contrastive: the same sequence of sounds can mean different things depending on which syllable carries stress. This makes pronunciation accuracy meaningful, not cosmetic.
That combination — large speaker base, underserved by existing tools, and phonologically interesting — made Cebuano the right first language for a Cebuano dictionary built around making pronunciation visible.
The Acoustic Basis of the visualization
Every animation, waveform, and stress marker on the platform is grounded in acoustic measurements of native speaker speech. We do not interpolate or approximate. Here is what the data shows.

Duration is the Primary Cue
The most comprehensive experimental study of Cebuano stress to date — Xu (2020), Cebuano Stress: Phonetic Cues and Phonological Pattern — analysed stressed and unstressed syllables in disyllabic words using Praat-based acoustic measurement across multiple native speakers. The findings are unambiguous:
Stressed syllables are substantially longer. The mean duration difference between stressed and unstressed syllables was +54.2 ms — a large, consistent, and perceptually salient effect with minimal overlap between distributions. This is the primary cue to stress in Cebuano, and it is the most prominent feature encoded in our visualizations. When you see a syllable marker expand on screen, that proportional expansion directly reflects this durational relationship.
Pitch and Intensity are Secondary Cues
The same study found two additional acoustic correlates of stress, both statistically significant but with considerably smaller effect sizes:
- Fundamental Frequency (F0): Stressed syllables showed a mean pitch increase of +9.23 Hz compared to adjacent unstressed syllables. This effect is real but small — the box-plot distributions overlap substantially, meaning F0 alone is not a reliable stress cue in Cebuano.
- Intensity: Stressed syllables were louder by a mean of +2.14 dB. Again, statistically significant, but a weak perceptual cue with high overlap.
The practical implication: in a phrase like Unsa diay imong trabaho? the stressed syllables (ÚN-, DI-, MÓNG, BÁ-) carry pitch and amplitude that is slightly elevated relative to unstressed syllables (sa, ay, i-, tra-, ho), but the dominant perceptual marker is vowel length. Our waveform visualization reflects all three dimensions — duration, F0 contour, and amplitude envelope — so learners can observe the full acoustic picture, not just the most prominent feature.
This finding also aligns with the broader typological picture for Philippine languages. Shryock's (1993) foundational metrical analysis of Cebuano described stress assignment in terms of an iambic foot structure, with the penultimate syllable as the default stress position — a pattern confirmed and refined by the acoustic evidence in Xu (2020).
Intonation vs. Lexical Stress
An important distinction for learners: Cebuano has both lexical stress (which syllable within a word carries prominence) and phrasal intonation (how F0 moves across a whole utterance). In questions, global F0 typically rises toward the end of the phrase — but this rise sits on top of the lexical stress pattern, it does not replace it. Stressed syllables within a question still show their characteristic F0 peak and duration relative to their local context.
Our visualizations handle these two layers separately: the stress markers encode lexical prominence, while the prosody overlay encodes phrasal intonation. This separation is deliberate and reflects current phonetic understanding of how the two systems interact.
How We Built It
Acoustic Analysis Workflow
Pronunciation models are built from recordings of native speakers using Praat, the industry-standard acoustic analysis software developed at the Institute of Phonetic Sciences, University of Amsterdam. Praat allows us to extract precise measurements of duration, F0 trajectory, and intensity envelope at the phoneme level for every recorded token.
From these measurements we derive:
- Duration ratios between stressed and unstressed syllables in each word and phrase — these drive the proportional sizing of syllable markers in the animation.
- F0 trajectories normalised to the speaker's pitch range — these drive the pitch contour overlay.
- Intensity envelopes — these drive the amplitude visualization in the waveform.
This workflow is informed by the methodology established in the Seeing Speech project at the University of Glasgow, which pioneered the use of articulatory visualization for language learning and whose ultrasound-based approach to making speech visible inspired the founding of this platform.
Phonological Grounding
Beyond the acoustic measurements, the stress assignment patterns are cross-referenced against the metrical phonology literature. The primary sources we used:
- Shryock (1993) — the foundational metrical analysis of Cebuano stress, establishing the iambic foot and penultimate default: A metrical analysis of stress in Cebuano, Lingua 91, pp. 103–148
- Xu (2020) — the experimental acoustic study: Cebuano Stress: Phonetic Cues and Phonological Pattern
- Himmelmann & Kaufman (2018) — the typological overview of prosody across Austronesian languages, situating Cebuano within its broader language family: Prosodic systems: Austronesia
- Liwanag (2012) — a cross-linguistic comparison that directly contrasts stress cues in three Philippine languages, confirming the primacy of duration in Cebuano relative to its close relatives: Acoustic Correlates of Stress in Ilocano, Cebuano, and Tagalog
What Is Visualised
For each Cebuano word and phrase currently on the platform, users can observe:
- Articulatory animation — showing lip, tongue, and jaw position during production of each sound, based on articulatory data cross-referenced with the Seeing Speech database
- Syllable stress markers — proportionally scaled to reflect the durational difference between stressed and unstressed syllables
- Waveform display — showing the full acoustic signal with F0 contour and amplitude envelope visible at the phoneme level
- Prosodic overlay for phrases — showing the intonation contour across the full utterance, distinguishing phrasal pitch movement from lexical stress peaks
What Comes Next
Cebuano is the foundation. The acoustic analysis pipeline, the annotation workflow, and the visualization rendering engine are now established — built to generalize to additional languages without rebuilding from scratch.
The choice of subsequent languages will be guided by the same criteria that led us to Cebuano: speaker population size, underrepresentation in existing learning tools, and phonological richness that makes visual feedback genuinely useful. We will announce the next language once the models reach the quality threshold we set for Cebuano.
You can explore the Cebuano pronunciation tool at dictionarying.com/cebuano
If you have questions about the methodology, want to report an error in a word’s stress annotation, or suggest a language for future development, you can contact us via dictionarying.com/contact or email us at [email protected].
References
- Xu, S. C. A. (2020). Cebuano Stress: Phonetic Cues and Phonological Pattern. https://www.scangelaxu.com/pdf/Xu_CebuanoStress2020.pdf
- Shryock, A. (1993). A metrical analysis of stress in Cebuano. Lingua, 91, 103–148. https://www.sciencedirect.com/science/article/abs/pii/002438419390010T
- Himmelmann, N. P. & Kaufman, D. (2018). Prosodic systems: Austronesia. https://bahasawan.com/wp-content/uploads/2019/11/Himmelmann-and-Kaufman-to-appear-Austronesian-Prosodic-Systems.pdf
- Liwanag, M. H. C. (2012). Acoustic Correlates of Stress in Ilocano, Cebuano, and Tagalog. The 2nd Philippine Conference-Workshop on Mother Tongue-based Multilingual Education. https://www.researchgate.net/publication/361242190
- Seeing Speech project, University of Glasgow. https://www.seeingspeech.ac.uk/
- Institute of Phonetic Sciences, University of Amsterdam. https://www.uva.nl/en/research/research-institutes/uil-ots/phonetic-sciences.html
- Praat: doing phonetics by computer (Boersma & Weenink). https://www.fon.hum.uva.nl/praat/