Verifying the Menzerath-Altmann law in the verbal domain in 180 languages

A cross-linguistic test of one of the oldest quantitative laws of language: the more constituents a verb has, the shorter each one tends to be. Built on Universal Dependencies v2.17, this companion site lets you browse every language’s log-log regression, tweak thresholds, and reproduce every chiffre in the paper.

Pegah Faghiri, Kim Gerdes, Sylvain Kahane · LREC 2026 / UDW 2026

📄 Read the paper (PDF)→ Featured language profiles 🔍 Interactive explorer

186

Languages analysed from UD v2.17

62/131

47% show a clear MAL preference (β > 0.1)

Languages with Anti-LMAL (preverbal domain goes the wrong way)

Languages with strong RMAL (postverbal compression)

21.6%

40/185 pass the universality test (β > 0, p < 0.05); 11 are significantly anti-MAL. how is this measured?

Key findings

🌍 MAL is a widespread preference…

Across 180 languages from every major family, longer verbal constituents tend to be shorter on average — the predicted Menzerath-Altmann compression really shows up cross-linguistically.

⚠️ …but not an absolute universal

Several languages display a flat or even opposite trend (Anti-MAL). Old East Slavic and Naija are headline examples — their per-language plots are featured below.

↔️ The verb is asymmetric

The MAL effect is stronger after the verb (RMAL), while Anti-MAL is concentrated before the verb (LMAL). Length-based ordering does not apply symmetrically.

🔁 Word order matters

VO languages compress more strongly on the postverbal side; OV languages on the preverbal side. Fisher’s exact tests (all p-values here) confirm this is significant.

🧪 Universality? Only for some.

A permutation test on the slope β(1→max) flags only 40 of 185 languages (21.6%) as showing a significantly positive MAL effect (α=0.05); 11 go the other way. The p-values are now in the big effect table (how the test works).

See it in three languages

Live log-log regressions computed from the cached data — click any title to jump to the language’s row in the big effect table.

German — textbook MAL →

German · MAL · β = +0.171 MAL

A clean negative slope: every additional dependent shrinks the average constituent. Almost perfect power law (R² ≈ 0.99).

Old East Slavic — Anti-MAL →

OldEastSlavic · MAL · β = -0.255 Anti-MAL

Going the other way: longer verbs come with longer constituents. One of only a handful of languages doing this.

Naija — strongest Anti-LMAL →

Naija · Left MAL · β = -0.734 Anti-MAL

Preverbally, Naija contradicts MAL more strongly than any other language in the sample — a near-perfect positive power law.

Reference paper:

Pegah Faghiri, Kim Gerdes, Sylvain Kahane (2026). Verifying the Menzerath-Altmann law in the verbal domain in 180 languages. UDW26 @ LREC 2026.

This site is the complete data presentation of the paper. It complements the printed analysis with sortable tables, clickable per-language log-log regressions, and the underlying notebook commentary.

Reading guide — paper ↔ web pages

Each row maps one section of the paper to the page on this site that illustrates it interactively.

Paper section	On this site
§1 Introduction	index.html — overview and references
§2 The Menzerath-Altmann law	Notebook Plots & Commentary — historical/empirical context
§3 UD constituent extraction	Notebook Plots & Commentary — construction of MAL_n from UD treebanks
§4 The MAL effect metric — β(1→∞)	MAL Effect — slopes for every language; more on MAL shows R² distributions and β(1→2) vs β(2→max)
§4.2 MAL compliance	MAL Compliance — local change scores β(n→n+1) and high/middle/low classification
§5.1 Results (MAL effect)	MAL Effect — paper Tables 1–3 reproduced at top of page; consistency scatter for the RMAL bias / Anti-LMAL bias
§5.2 Results (MAL compliance)	MAL Compliance Summary — paper Table 4 reproduced; box plots and family×transition heatmap
§5.3 Zoom on outliers	MAL Effect — green box at top with anchor links to each discussed language (Old East Slavic, Naija, Occitan, Bambara, Egyptian, Khoekhoe, Western Armenian, Gothic, Latin)
Appendix A — per-language plots	MAL Effect — the big sortable table at the bottom is the interactive equivalent of `tab:all-miniplots`
§6 Conclusion	Notebook Plots & Commentary