A cross-linguistic test of one of the oldest quantitative laws of language: the more constituents a verb has, the shorter each one tends to be. Built on Universal Dependencies v2.17, this companion site lets you browse every language’s log-log regression, tweak thresholds, and reproduce every chiffre in the paper.
📄 Read the paper (PDF)→ Featured language profiles🔍 Interactive explorerAcross 180 languages from every major family, longer verbal constituents tend to be shorter on average — the predicted Menzerath-Altmann compression really shows up cross-linguistically.
Several languages display a flat or even opposite trend (Anti-MAL). Old East Slavic and Naija are headline examples — their per-language plots are featured below.
The MAL effect is stronger after the verb (RMAL), while Anti-MAL is concentrated before the verb (LMAL). Length-based ordering does not apply symmetrically.
VO languages compress more strongly on the postverbal side; OV languages on the preverbal side. Fisher’s exact tests (all p-values here) confirm this is significant.
A permutation test on the slope β(1→max) flags only 40 of 185 languages (21.6%) as showing a significantly positive MAL effect (α=0.05); 11 go the other way. The p-values are now in the big effect table (how the test works).
Live log-log regressions computed from the cached data — click any title to jump to the language’s row in the big effect table.
German · MAL · β = +0.171 MAL
A clean negative slope: every additional dependent shrinks the average constituent. Almost perfect power law (R² ≈ 0.99).
OldEastSlavic · MAL · β = -0.255 Anti-MAL
Going the other way: longer verbs come with longer constituents. One of only a handful of languages doing this.
Naija · Left MAL · β = -0.734 Anti-MAL
Preverbally, Naija contradicts MAL more strongly than any other language in the sample — a near-perfect positive power law.
Reference paper:
Pegah Faghiri, Kim Gerdes, Sylvain Kahane (2026). Verifying the Menzerath-Altmann law in the verbal domain in 180 languages. UDW26 @ LREC 2026.
This site is the complete data presentation of the paper. It complements the printed analysis with sortable tables, clickable per-language log-log regressions, and the underlying notebook commentary.
Each row maps one section of the paper to the page on this site that illustrates it interactively.
| Paper section | On this site |
|---|---|
| §1 Introduction | index.html — overview and references |
| §2 The Menzerath-Altmann law | Notebook Plots & Commentary — historical/empirical context |
| §3 UD constituent extraction | Notebook Plots & Commentary — construction of MAL_n from UD treebanks |
| §4 The MAL effect metric — β(1→∞) | MAL Effect — slopes for every language; more on MAL shows R² distributions and β(1→2) vs β(2→max) |
| §4.2 MAL compliance | MAL Compliance — local change scores β(n→n+1) and high/middle/low classification |
| §5.1 Results (MAL effect) | MAL Effect — paper Tables 1–3 reproduced at top of page; consistency scatter for the RMAL bias / Anti-LMAL bias |
| §5.2 Results (MAL compliance) | MAL Compliance Summary — paper Table 4 reproduced; box plots and family×transition heatmap |
| §5.3 Zoom on outliers | MAL Effect — green box at top with anchor links to each discussed language (Old East Slavic, Naija, Occitan, Bambara, Egyptian, Khoekhoe, Western Armenian, Gothic, Latin) |
| Appendix A — per-language plots | MAL Effect — the big sortable table at the bottom is the interactive equivalent of tab:all-miniplots |
| §6 Conclusion | Notebook Plots & Commentary |
All scripts, notebooks, and the source of this site are available on GitHub: