Verifying the Menzerath-Altmann law in the verbal domain in 180 languages

A cross-linguistic test of one of the oldest quantitative laws of language: the more constituents a verb has, the shorter each one tends to be. Built on Universal Dependencies v2.17, this companion site lets you browse every language’s log-log regression, tweak thresholds, and reproduce every chiffre in the paper.

Pegah Faghiri, Kim Gerdes, Sylvain Kahane · LREC 2026 / UDW 2026

📄 Read the paper (PDF)→ Featured language profiles🔍 Interactive explorer
186
Languages analysed from UD v2.17
62/131
47% show a clear MAL preference (β > 0.1)
29
Languages with Anti-LMAL (preverbal domain goes the wrong way)
81
Languages with strong RMAL (postverbal compression)
21.6%
40/185 pass the universality test (β > 0, p < 0.05); 11 are significantly anti-MAL. how is this measured?

Key findings

🌍 MAL is a widespread preference…

Across 180 languages from every major family, longer verbal constituents tend to be shorter on average — the predicted Menzerath-Altmann compression really shows up cross-linguistically.

⚠️ …but not an absolute universal

Several languages display a flat or even opposite trend (Anti-MAL). Old East Slavic and Naija are headline examples — their per-language plots are featured below.

↔️ The verb is asymmetric

The MAL effect is stronger after the verb (RMAL), while Anti-MAL is concentrated before the verb (LMAL). Length-based ordering does not apply symmetrically.

🔁 Word order matters

VO languages compress more strongly on the postverbal side; OV languages on the preverbal side. Fisher’s exact tests (all p-values here) confirm this is significant.

🧪 Universality? Only for some.

A permutation test on the slope β(1→max) flags only 40 of 185 languages (21.6%) as showing a significantly positive MAL effect (α=0.05); 11 go the other way. The p-values are now in the big effect table (how the test works).

See it in three languages

Live log-log regressions computed from the cached data — click any title to jump to the language’s row in the big effect table.

Reference paper:

Pegah Faghiri, Kim Gerdes, Sylvain Kahane (2026). Verifying the Menzerath-Altmann law in the verbal domain in 180 languages. UDW26 @ LREC 2026.

This site is the complete data presentation of the paper. It complements the printed analysis with sortable tables, clickable per-language log-log regressions, and the underlying notebook commentary.

Reading guide — paper ↔ web pages

Each row maps one section of the paper to the page on this site that illustrates it interactively.

Paper sectionOn this site
§1 Introductionindex.html — overview and references
§2 The Menzerath-Altmann lawNotebook Plots & Commentary — historical/empirical context
§3 UD constituent extractionNotebook Plots & Commentary — construction of MAL_n from UD treebanks
§4 The MAL effect metric — β(1→∞)MAL Effect — slopes for every language; more on MAL shows R² distributions and β(1→2) vs β(2→max)
§4.2 MAL complianceMAL Compliance — local change scores β(n→n+1) and high/middle/low classification
§5.1 Results (MAL effect)MAL Effect — paper Tables 1–3 reproduced at top of page; consistency scatter for the RMAL bias / Anti-LMAL bias
§5.2 Results (MAL compliance)MAL Compliance Summary — paper Table 4 reproduced; box plots and family×transition heatmap
§5.3 Zoom on outliersMAL Effect — green box at top with anchor links to each discussed language (Old East Slavic, Naija, Occitan, Bambara, Egyptian, Khoekhoe, Western Armenian, Gothic, Latin)
Appendix A — per-language plotsMAL Effect — the big sortable table at the bottom is the interactive equivalent of tab:all-miniplots
§6 ConclusionNotebook Plots & Commentary

Browse the data

Source code

All scripts, notebooks, and the source of this site are available on GitHub:

↗ https://github.com/typometrics/UDW26-Menzerath