MAL Notebook — Plots & Commentary

Pegah Faghiri, Kim Gerdes, Sylvain Kahane (2026). Verifying the Menzerath-Altmann law in the verbal domain in 180 languages. UDW26 @ LREC 2026.

This page mirrors the analysis notebook in order: Markdown commentary (including tables) followed by any plots saved by the corresponding code cells.

Quick links to the interactive equivalents of these notebook sections:

8. Menzerath-Altmann Law (MAL) Analysis

Summary: Comprehensive analysis of Menzerath-Altmann Law across 200+ languages from Universal Dependencies, examining how constituent size decreases as the number of dependents increases.


Theoretical Background

Menzerath-Altmann Law (MAL) states that the larger a linguistic construct, the smaller its constituents tend to be. In verb-centered analysis: - As the number of verbal dependents (n) increases, the average constituent size should decrease - A language with high MAL effect score shows a clear decreasing trend in constituent size as n grows


Notebook Structure

Section Contents
1. Setup and Data Loading Import libraries, load metadata and position statistics
2. Core MAL Computation Compute MAL_n scores (total dependents), compliance metrics, heatmaps, HTML report
3. MAL Dynamics Analysis Step-by-step compliance, heatmaps
4. Language Family Analysis Family-level aggregation, statistical significance tests
5. Typological Analysis MAL asymmetry, decay rates, trajectory clustering
6. Summary and Export Summary statistics, universality tests, data export
7. Generate UD Language Maps Interactive maps of UD languages

Key Measures Computed

Measure Description
MAL_n Average constituent size with exactly n total dependents
MAL_left_n / MAL_right_n Directional MAL scores (n deps on one side, any on other)
MAL effect score Negative slope of MAL_n ~ n (higher = stronger MAL effect)
MAL Asymmetry Difference between right and left MAL effect score
Decay Rate How quickly constituent size drops (early vs late)
Trajectory Cluster Grouping of languages by MAL curve shape

Prerequisites

Notebook Required Data Purpose
01_data_preparation_and_validation.ipynb metadata.pkl Language names, groups, colors
04_data_processing.ipynb all_langs_position2sizes.pkl, all_langs_position2num.pkl Raw position statistics
(Optional) 05_comparative_visualization.ipynb vo_vs_hi_scores.csv VO scores for correlation

Inputs / Outputs

Inputs: - data/metadata.pkl - data/all_langs_position2sizes.pkl, data/all_langs_position2num.pkl - data/vo_vs_hi_scores.csv (for VO correlation)

Outputs: - data/lang2MAL_full.pkl - Full MAL data per language - data/mal_compliance_scores.csv - MAL effect scores - data/mal_asymmetry.csv - Directional asymmetry scores - data/mal_decay_rates.csv - Decay pattern analysis - data/mal_trajectories.csv - Trajectory cluster assignments - plots/mal_*.png - Visualizations

Runtime: ~2-3 minutes


1. Setup and Data Loading

2. Core MAL Computation (Total Dependents)

This section computes the primary MAL measure based on total number of dependents.

2.1 MAL_n Scores

MAL_n = Average constituent size when a verb has total n dependents (left + right combined = n).

For n=4, this includes ALL configurations: - VXXXX (4 right, 0 left) → bilateral_L0_R4_* - XVXXX (1 left, 3 right) → bilateral_L1_R3_*
- XXVXX (2 left, 2 right) → bilateral_L2_R2_* - XXXVX (3 left, 1 right) → bilateral_L3_R1_* - XXXXV (4 left, 0 right) → bilateral_L4_R0_*

Note: Uses bilateral keys in the data. Also computes directional MAL (MAL_right_n, MAL_left_n) for use in Section 3.

2.2 MAL Effect Scores

The MAL Effect Score quantifies how well a language follows the Menzerath-Altmann Law.

Intuition

If MAL holds, then as n (number of dependents) increases, the average constituent size (MAL_n) should decrease. A language with high MAL effect score will show: - MAL_1 > MAL_2 > MAL_3 > MAL_4 (decreasing trend)

How It's Computed

We fit a linear regression: $\text{MAL}_n = \alpha + \beta \cdot n$

Metric Formula Interpretation
Slope (β) From regression Negative = sizes decrease with n (MAL holds)
Normalized Slope $\beta / \text{MAL}_1$ Scale-independent version
MAL effect score $-\beta / \text{MAL}_1$ Higher = stronger MAL effect
Spearman ρ Rank correlation of n vs MAL_n More negative = stronger monotonic decrease
Decrease Ratio $\text{MAL}1 / \text{MAL}{\max}$ > 1 means sizes decreased overall

Example

If a language has MAL_1=3.5, MAL_2=2.8, MAL_3=2.2, MAL_4=1.9: - Slope β ≈ -0.52 (negative → good) - Normalized slope ≈ -0.15 - MAL effect score ≈ 0.15 (positive → follows MAL) - Spearman ρ = -1.0 (perfect monotonic decrease)

2.3 MAL_n Curves Visualization

2.4 VO Score Correlation

Investigate relationships between MAL effect score and word order typology.

(Visualization cells follow - MAL_n curves are part of Section 2.3)

MAL_n Curves (Total Dependents)

mal_n_total_curves.png
plots/mal_n_total_curves.png
mal_n_total_curves.png
plots/mal_n_total_curves.png

Interpretation: MAL_n Curves

The plot above shows the Menzerath-Altmann Law (MAL) across languages. Each line represents a language, with the x-axis showing the total number of dependents (n) and the y-axis showing the average constituent size (MAL_n).

Key observations:

  1. Strong MAL effect for n=1 to 3: The mean curve (black line) shows a clear downward trend from n=1 to n=3, confirming MAL — as the number of dependents increases, the average size of each constituent decreases.

  2. Upturn at higher n values: The mean curve rises from n=3 to n=6. This is likely a sampling artifact: at higher n values, only languages with sufficient data (≥10 occurrences) contribute to the mean. These tend to be languages with larger corpora or more complex verbal constructions, which may have systematically different properties than the full set of languages.

  3. Survivor bias: As n increases, fewer languages have enough data to be included. The languages that "survive" to n=5 or n=6 are not a random sample — they may represent specific typological profiles or simply better-resourced languages.

  4. Cross-linguistic pattern (n=1 to 3): The consistent downward pattern across diverse language families (indicated by colors) for low n values suggests MAL is a linguistic universal, driven by cognitive processing constraints and communicative efficiency.

  5. Variation around the mean: The spread of individual language curves indicates cross-linguistic variation in the strength of the MAL effect, which may correlate with typological features such as head-directionality (VO vs OV).

2.5 MAL vs VO Scatter Plot

2.6 Spearman vs VO Scatter

2.7 MAL effect score vs Spearman Scatter

Plot the relationship between MAL effect scores and Spearman correlation coefficients across languages.

Interpretation: MAL effect score vs Spearman Correlation

What this plot shows: This scatter plot compares two different metrics for measuring MAL adherence:

Expected relationship: Since both metrics measure the same underlying phenomenon (constituent size decreasing with valency), we expect a strong negative correlation: languages with more negative Spearman $r$ should have higher MAL effect score (β)s.

Key observations: - A tight linear relationship validates that both metrics capture the same phenomenon consistently - Outliers may indicate languages where the MAL relationship is non-monotonic or non-linear - The regression line slope indicates how the two metrics scale relative to each other


Ceiling/Floor Effects and Discriminative Power:

A notable pattern in this plot is that languages with extreme Spearman values (approaching -1 or +1) show considerable vertical spread in their MAL effect scores. This phenomenon is known as a ceiling effect (or floor effect for -1):

For example, two languages might both have Spearman $r = -1$ (perfect negative monotonic relationship), but one might show a steep decline in constituent size (high MAL effect score) while another shows only a gradual decline (lower MAL effect score). The Spearman correlation conflates these cases; the MAL effect score score differentiates them.

Implication for metric choice:

This suggests that MAL effect score is the more informative metric for cross-linguistic comparison, particularly when: 1. Many languages cluster at extreme Spearman values 2. The research question concerns the strength of the MAL effect, not just its presence 3. Fine-grained distinctions between languages with similar monotonic patterns are needed

The Spearman correlation remains useful as a robustness check (confirming monotonicity) and for cases where the relationship may be non-linear, but the MAL effect score score provides greater discriminative resolution across the full range of MAL behavior.

2.8 Directional MAL Analysis (Left vs Right)

Directional MAL Curves

This section analyzes MAL separately for left-side and right-side dependents.

mal_directional_curves.png
plots/mal_directional_curves.png

Interpretation: Directional MAL Curves

What these plots show: - Left panel (MAL_right_n): Average constituent size when a head has n right-side dependents (with any number of left dependents). A decreasing curve means constituents shrink as more right dependents are added. - Right panel (MAL_left_n): Average constituent size when a head has n left-side dependents (with any number of right dependents).

Key observations: - Both curves typically show negative slopes (decreasing size with more dependents), confirming MAL operates on both sides of the head - The steepness of each curve reflects how strongly that side exhibits MAL - Languages where right curve is steeper than left → stronger MAL effect on the right (typical for VO languages) - Languages where left curve is steeper than right → stronger MAL effect on the left (typical for OV languages) - The black mean line shows the cross-linguistic average trend


Additional Directional MAL Measures to Consider

Measure Description Linguistic Interpretation
Directional Slope Ratio slope_right / slope_left Values >1 indicate stronger MAL on right side; <1 indicates left-side dominance
Directional R² Comparison Compare R² of right vs left linear fits Which side shows more consistent MAL behavior?
Crossover Point At what n does left MAL = right MAL? Identifies asymmetry threshold
Directional MAL Difference MAL_right_n - MAL_left_n for each n How size differs by direction at each dependent count
Interaction Effect Size when n_left × n_right jointly considered Does having dependents on both sides amplify/dampen MAL?
Positional Decay How does MAL vary by position within left/right dependents? First dependent vs. second vs. third on each side
Head-Dependent Asymmetry Compare dependent size reduction left vs right of their own heads Recursive MAL within the dependency tree
Directional Compliance Rate % of languages with slope < 0 for each direction Is MAL more universal on one side?
Variance by Direction Std of constituent sizes at each n per direction Which direction has more stable MAL effect?

2.9 MAL HTML Report with Heatmap

3. MAL Dynamics Analysis

This section analyzes the temporal dynamics of MAL: how constituent size changes step-by-step.

3.1 Step-by-Step Compliance

Analyzes which specific transitions follow MAL for each language:

Measure Description
Step Compliance For each transition (1→2, 2→3, 3→4): is MAL_{n} > MAL_{n+1}?
Compliance Category Fully compliant / First-step only / Partial / Anti-MAL
Compliance Count Number of decreasing transitions (0 to 3 for n=1..4)
Weighted Score Early transitions weighted more: w₁(1→2) + w₂(2→3) + w₃(3→4)

Categories: - 🟢 Fully MAL-conformant: All steps decrease (MAL₁ > MAL₂ > MAL₃ > MAL₄) - 🟢 First-step compliant: At least MAL₁ > MAL₂ (most important linguistically) - 🟡 Partial: Some steps decrease, but not monotonic - 🔴 Anti-MAL: Sizes increase with n (MAL₁ < MAL₄)

Understanding the MAL Values Heatmap

What are MAL values?

MAL_n represents the average constituent size (in words) when a verb has exactly n total dependents. It's computed as the geometric mean of all constituent sizes across all verb configurations with n dependents.

What is "dependent count" (n)?

The total number of dependents of a verb, regardless of whether they appear on the left or right side. This is the sum of left + right dependents.

How to interpret the heatmap:

Reading Interpretation
Color intensity Darker red = larger constituent sizes; lighter yellow = smaller sizes
↓ arrows Size decreased from previous n → MAL-conformant transition
↑ arrows Size increased from previous n → anti-MAL transition
Row order Languages sorted by MAL effect score (most compliant at top)
Category labels Right side shows compliance category (Fully, First, Partial, Anti)

Example reading: If a row shows 2.50 → ↓1.80 → ↓1.45 → ↓1.20, this language is fully MAL-conformant because constituent sizes consistently decrease as the number of dependents increases (2.50 > 1.80 > 1.45 > 1.20).

mal_step_compliance_heatmap.png
plots/mal_step_compliance_heatmap.png

Interpreting the Results

The heatmap displays languages sorted by MAL effect score score (highest at top). Since we're showing only the top 60 languages, these are predominantly the most MAL-conformant ones—hence most show "Fully" (fully MAL-conformant). Languages with partial or anti-MAL patterns appear further down the list and may not be visible in this truncated view.

What does being at the top mean?

Yakut, at the top of the list, is the language with the strongest MAL effect among those shown. Looking at its values: - MAL_1 ≈ 1.37 → MAL_2 ≈ 1.35 → MAL_3 ≈ 1.15 → MAL_4 ≈ (lower)

This means: - When a Yakut verb has 1 dependent, its average constituent size is ~1.37 words - When it has 4 dependents, constituents shrink to share the limited space - All transitions show ↓ (decreasing), confirming perfect MAL effect score

Why do some languages have larger MAL_1 values?

Languages like Galician (MAL_1 ≈ 5.82) or Catalan (MAL_1 ≈ 4.96) have larger baseline constituent sizes. This doesn't mean they're "less compliant"—they still show consistent decreases. The compliance score is normalized by MAL_1, so languages with different absolute sizes can be fairly compared.

Key insight: Nearly all languages in this sample follow MAL, with constituent sizes consistently shrinking as the number of dependents increases. This supports the universality of the Menzerath-Altmann Law across diverse language families.

mal_compliance_categories.png
plots/mal_compliance_categories.png
mal_beta_by_family.png
plots/mal_beta_by_family.png

3.2 Statistical Tests & Interpretation

Reading the Bar Plot: - Each bar shows the mean MAL effect score for all languages in that family - Error bars represent ±1 standard deviation within each family - Numbers in parentheses indicate sample size (number of languages/treebanks) - Higher (more positive) values = stronger MAL effect (more dependents → smaller constituent size)

Statistical Significance:

  1. Global MAL Effect: The one-sample t-test determines if languages overall exhibit MAL behavior (compliance ≠ 0). A significant result confirms MAL is a genuine cross-linguistic tendency.

  2. Between-Family Differences: The Kruskal-Wallis test (non-parametric ANOVA) tests whether families differ significantly in their MAL effect score. If significant (p < 0.05), family membership matters for MAL strength.

  3. Per-Family Deviations: Individual t-tests identify which families deviate significantly from the global mean: - Families with significantly higher compliance show stronger head-planning effects - Families with significantly lower compliance may have other structural factors counteracting MAL

Caveats: - Sample sizes vary greatly across families (some have 50+ treebanks, others only 3-5) - Small families have less statistical power and larger confidence intervals - Family groupings may obscure within-family diversity - Significance stars: * p < 0.05, ** p < 0.01, *** p < 0.001

4. Language Family Analysis

4.1 MAL effect score by Family

5. Typological Analysis

5.1 MAL Asymmetry (Left vs Right)

Question: Does MAL effect score differ between left-side and right-side dependents? Does this correlate with head-directionality?

We compute: - MAL Asymmetry = MAL_compliance_right - MAL_compliance_left - Positive asymmetry → stronger MAL effect on right-side dependents - Negative asymmetry → stronger MAL effect on left-side dependents

Hypothesis: VO (head-initial) languages may show stronger right-side MAL, while OV (head-final) languages show stronger left-side MAL.

mal_beta_asymmetry_left_vs_right.png
plots/mal_beta_asymmetry_left_vs_right.png
mal_beta_asymmetry_by_family.png
plots/mal_beta_asymmetry_by_family.png
mal_beta_asymmetry_vs_vo.png
plots/mal_beta_asymmetry_vs_vo.png

Interpreting the MAL Asymmetry Results

Left Panel: Left vs Right MAL effect score Scatter - Each point represents a language - Above the diagonal: Languages where MAL is stronger on the right side (right dependents shrink more as their count increases) - Below the diagonal: Languages where MAL is stronger on the left side - Languages near the diagonal have symmetric MAL across both sides

Middle Panel: Asymmetry by Language Family - Bars to the right of zero: Families with stronger right-side MAL - Bars to the left of zero: Families with stronger left-side MAL - Error bars show within-family variation

Right Panel: Asymmetry vs VO Score - Tests whether word order predicts which side shows stronger MAL - Hypothesis: VO (head-initial) languages may show stronger right-side MAL because right dependents are more common - Positive correlation would support this hypothesis - Weak/no correlation suggests MAL asymmetry is independent of basic word order

Key Questions Answered: 1. Is MAL universal across both directions, or does one side dominate? 2. Do language families show consistent directional biases? 3. Does head-directionality predict MAL asymmetry?

5.2 Decay Rate Analysis

Question: Beyond binary compliance (yes/no decrease), how steep is the MAL curve?

We compute: - Early decay rate = (MAL_1 - MAL_2) / MAL_1 — relative drop in first step - Late decay rate = (MAL_3 - MAL_4) / MAL_3 — relative drop in last step
- Total decay = (MAL_1 - MAL_max) / MAL_1 — overall shrinkage

This reveals whether languages have "front-loaded" decay (big drop early, then flat) or "gradual" decay (consistent shrinkage throughout).

mal_decay_rates.png
plots/mal_decay_rates.png

Interpreting the Decay Rate Results

Top-Left: Distribution of Decay Rates by Transition - Histograms show how decay rates are distributed across languages - Positive values = constituent size decreased (MAL-conformant) - Negative values = constituent size increased (anti-MAL) - Compare the distributions: Is early decay (blue) typically larger than late decay (red)?

Top-Right: Early vs Late Decay Scatter - Each point is a language - Below diagonal: "Front-loaded" languages — most size reduction happens early (1→2) - Above diagonal: "Back-loaded" languages — most reduction happens late (3→4) - Near diagonal: "Gradual" languages — consistent decay throughout

Bottom-Left: Decay Rates by Family - Grouped bars show mean decay at each transition for each family - Families with tall blue bars but short red bars have front-loaded MAL - Families with similar bar heights have gradual MAL

Bottom-Right: Decay Pattern Distribution (Pie Chart) - Shows the proportion of languages in each decay category - Front-loaded: Big drop from n=1→2, then flattens - Gradual: Consistent shrinkage at each step - Back-loaded: Small early drop, bigger late drop - Unknown: Insufficient data or mixed patterns

Key Insight: If most languages are "front-loaded," it suggests that the constraint to shrink constituents is strongest when going from 1 to 2 dependents—the first "competition" for space around the verb.

5.3 Trajectory Clustering

Question: What are the different "shapes" of MAL curves across languages? Can we identify clusters or typological patterns?

We: 1. Normalize MAL values: MAL_n_norm = MAL_n / MAL_1 (so all curves start at 1.0) 2. Visualize trajectories using parallel coordinates plots 3. Cluster languages by trajectory shape to identify "types"

This reveals whether languages have similar MAL "signatures" across families.

mal_trajectories.png
plots/mal_trajectories.png

Interpreting the Trajectory Analysis Results

Top-Left: All Normalized Trajectories by Family - Each thin line is a language's MAL trajectory, normalized so MAL_1 = 1.0 - Lines going down = constituent sizes shrink as n increases (MAL-conformant) - Lines staying flat = weak or no MAL effect - Lines going up = anti-MAL (rare) - The black line shows the global mean trajectory

Top-Right: Trajectory Clusters - K-means clustering identifies distinct trajectory "shapes" - Each cluster's mean trajectory is shown as a thick line - Clusters might include: - Steep decline: Drops rapidly to ~0.5-0.6 by n=4 - Gradual decline: Steady decrease to ~0.7-0.8 - Flat/Weak MAL: Stays near 1.0 (little shrinkage) - Early steep: Big drop at n=2, then flattens

Bottom-Left: Mean Trajectory by Language Family - Each family's average trajectory - Families with lower endpoints (right side) have stronger MAL - Families with steeper slopes show more dramatic shrinkage - Compare families: Do related languages behave similarly?

Bottom-Right: Cluster Distribution by Family - Shows which trajectory types are common in each family - Families dominated by one color have consistent MAL behavior - Mixed-color families have diverse MAL patterns

Key Questions Answered: 1. Are there typologically distinct MAL "signatures"? 2. Do language families cluster together in trajectory space? 3. What is the typical amount of constituent shrinkage (e.g., 20%, 40%)?

6. Summary and Export

6.1 Summary Statistics

6.2 Cross-Linguistic Universality Test

Question: Is MAL a genuine cross-linguistic universal, or could the observed effects arise by chance?

For each language, we test whether the MAL effect is statistically significant using: 1. Permutation test: Shuffle the MAL_n values and check if observed slope is extreme 2. One-sample t-test: Is the slope significantly different from 0? 3. Bootstrap confidence intervals: 95% CI for the slope

A language shows significant MAL if: - The slope is negative AND statistically significant (p < 0.05)

We report: - % of languages with significant MAL effect - % by language family - Effect sizes (Cohen's d)

mal_universality_test_beta.png
plots/mal_universality_test_beta.png

Interpretation: Cross-Linguistic Universality

Results Summary (n=165 languages):

Category Count Percentage
Negative slope (MAL direction) 115 69.7%
Significant slope (p<0.05) 63 38.2%
Significant MAL effect 51 30.9%
Significant anti-MAL 12 7.3%

Key Findings:

  1. ~70% of languages show MAL direction: The majority of languages exhibit the expected pattern where constituent size decreases as the number of dependents increases.

  2. ~31% show statistically significant MAL: About one-third of languages have a MAL effect strong enough to be statistically significant via permutation test. This is far above the 5% expected by chance (binomial test p < 0.001).

  3. Mean effect size = -0.66: This is a medium-to-large effect (Cohen's d), indicating that MAL is not just statistically significant but also substantively meaningful.

  4. Anti-MAL is rare (~7%): Only 12 languages show significant effects in the opposite direction, suggesting MAL violations are uncommon.

Conclusion: MAL is a genuine cross-linguistic tendency. While not universal in the strict sense (not all languages show significant effects), the pattern is far more prevalent than chance would predict, with 70% showing the expected direction and 31% reaching statistical significance.

Pie Chart Interpretation: - Left pie: Shows the split between languages with significant MAL (green, ~31%) vs. not significant (red, ~69%) - Right pie: Breaks down by direction—green for significant MAL, red for significant anti-MAL, gray for non-significant

mal_n_total_curves.png
plots/mal_n_total_curves.png

6.3 Export to HTML Report

Generate an HTML report combining all the plots and markdown text from this notebook, and add it to the index.html.

7. Generate UD Language Maps

This cell generates the UD_maps.html file containing two interactive maps: 1. Map 1: Languages by family (equal-sized dots, colored by language group) 2. Map 2: Languages by corpus size (dot size proportional to token count)

The info box below each map displays statistics when hovering over a language dot.

Built from 08_menzerath_altmann_analysis.ipynb: 32 markdown cells, 13 plots embedded.