Typology dashboard: family & VO maps, MAL profiles, compliance, and per-language curves

Pegah Faghiri, Kim Gerdes, Sylvain Kahane (2026). Verifying the Menzerath-Altmann law in the verbal domain in 180 languages. UDW26 @ LREC 2026.

The MAL-effect map already shown on the per-direction pages (MAL, LMAL, RMAL) is colored by β. The two maps below project the same WALS coordinates onto qualitative typological categories, so the geographic clustering of OV in Eurasia and of language families across continents can be inspected directly.

Map by language family

Each language is shown with two visual dimensions to make matching legend ↔ map easier with 33 families on a single chart. Marker shape indicates the genealogical macro-group (Indo-European, Afro-Asiatic, Other Eurasian, Sub-Saharan, Pacific, Northern Americas, South American); marker color picks the specific family within that macro-group from a dedicated sub-palette, so within any one shape the colors are mutually distinct.

Family (color) · Macro-group (shape)
  • Indo-European (1)
  • Indo-European
  • Afro-Asiatic (1)
  • Afro-Asiatic
  • Other Eurasian (16)
  • Altaic
  • Austro-Asiatic
  • Basque
  • Chukotko-Kamchatkan
  • Constructed
  • Dravidian
  • Japanese
  • Kartvelian
  • Korean
  • Mongolic
  • Northwest Caucasian
  • Sino-Tibetan
  • Tai-Kadai
  • Tungusic
  • Turkic
  • Uralic
  • Sub-Saharan (3)
  • Khoe-Kwadi
  • Mande
  • Niger-Congo
  • Pacific (2)
  • Austronesian
  • Pama-Nyungan
  • Northern Americas (5)
  • Chibchan
  • Eskimo-Aleut
  • Mayan
  • Na-Dene
  • Uto-Aztecan
  • South American (5)
  • Arauan
  • Arawakan
  • Bororoan
  • Macro-Ge
  • Tupian
  • Other / unclassified (2)
  • Mixed
  • other

Map by VO/OV/NDO type

Word order
  • VO
  • OV
  • NDO
  • unknown

MAL profile map (PCA of per-language MAL/LMAL/RMAL curves)

Each language is summarized by its MAL/LMAL/RMAL values at n = 2..5 (12 features). After standardization these vectors are projected to 2D with PCA: the horizontal axis (PC1) explains 64.7% of the variance, the vertical axis (PC2) 13.1%. Dots sit at their real PCA positions; labels are pushed apart by a force simulation (with leader lines back to the dot when displaced) so every language stays readable. When several languages collapse to the same projected coordinate the dots are fanned out on a small dashed ring drawn at that exact PCA position (the ring marks the true location, the dots its members). Color and shape encoding mirrors the family map above. Hover any point to see its MAL/LMAL/RMAL profile. The orange arrows are the biplot loadings: each of the 12 input features (direction × n) is shown as a vector from the centroid — features pointing the same way co-vary across languages, and a feature pointing toward a region of the cloud is high for the languages in that region.

Family (color) · Macro-group (shape)
  • Indo-European
  • Indo-European
  • Afro-Asiatic
  • Afro-Asiatic
  • Other Eurasian
  • Altaic
  • Austro-Asiatic
  • Basque
  • Chukotko-Kamchatkan
  • Constructed
  • Dravidian
  • Japanese
  • Kartvelian
  • Korean
  • Mongolic
  • Northwest Caucasian
  • Sino-Tibetan
  • Tai-Kadai
  • Tungusic
  • Turkic
  • Uralic
  • Sub-Saharan
  • Khoe-Kwadi
  • Mande
  • Niger-Congo
  • Pacific
  • Austronesian
  • Pama-Nyungan
  • Northern Americas
  • Chibchan
  • Eskimo-Aleut
  • Mayan
  • Na-Dene
  • Uto-Aztecan
  • South American
  • Arauan
  • Arawakan
  • Bororoan
  • Macro-Ge
  • Tupian
  • Other / unclassified
  • Mixed
  • other

MAL-compliance map (PCA of per-language MAL/LMAL/RMAL compliance scores)

Each language is described by its three MAL-compliance scores (decrease ratio for MAL, LMAL, RMAL — the share of the observed average-size curve that decreases monotonically as n grows). The 3-vector is standardized and projected to 2D with PCA: PC1 explains 50.0%, PC2 30.4%. Languages with similar compliance profiles across the three directions cluster together. Dots sit at their real PCA positions, except where several languages would land on the exact same coordinate — in that case the dots are fanned out on a small dashed ring drawn at that coordinate, so each language gets its own label slot while the ring itself still marks the true PCA position. Labels are pushed apart with leader lines.

Family (color) · Macro-group (shape)
  • Indo-European
  • Indo-European
  • Afro-Asiatic
  • Afro-Asiatic
  • Other Eurasian
  • Altaic
  • Austro-Asiatic
  • Basque
  • Chukotko-Kamchatkan
  • Constructed
  • Dravidian
  • Japanese
  • Kartvelian
  • Korean
  • Mongolic
  • Northwest Caucasian
  • Sino-Tibetan
  • Tai-Kadai
  • Tungusic
  • Turkic
  • Uralic
  • Sub-Saharan
  • Khoe-Kwadi
  • Mande
  • Niger-Congo
  • Pacific
  • Austronesian
  • Pama-Nyungan
  • Northern Americas
  • Chibchan
  • Eskimo-Aleut
  • Mayan
  • Na-Dene
  • Uto-Aztecan
  • South American
  • Arauan
  • Arawakan
  • Bororoan
  • Macro-Ge
  • Tupian
  • Other / unclassified
  • Mixed
  • other

MAL spaghetti panels (every language overlaid, by direction)

Each thin gray line is one language's average dependent length as a function of the size n = 2..5 of the sub-tree. The thick dark line is the cross-language median; the shaded band is the inter-quartile range (25–75 %). The MA pattern (decreasing curve) is visible directly: the median drops from n=2 to n=5 in all three panels, with most languages following. Click families in the sidebar to highlight their curves in their assigned color (others fade further); multiple families can be active at once. Hover a curve to identify the language.

Highlight families

Compliance ridgeline by macro-area

For each direction (MAL, LMAL, RMAL) we draw a Gaussian density of per-language compliance, one ridge per macro-group. Tight ridges centered near 1 indicate areas that follow Menzerath’s law strictly; broad or shifted ridges mark areas that deviate. The contrast between the LMAL and RMAL panels is the asymmetry summary chart for the paper.

Per-language MAL curve grid (small multiples)

One small panel per language: the three Menzerath-Altmann curves (MAL, LMAL, RMAL) plotted on a common y-axis so cards are directly comparable. Sort by family or by any of the compliance scores to scan the whole sample at a glance, and filter by macro-group to focus on one region of the genealogical tree. Hover any card for the exact values.