What Is 1 norm

Content on WhatAnswers is provided "as is" for informational purposes. While we strive for accuracy, we make no guarantees. Content is AI-assisted and should not be used as professional advice.

Last updated: April 15, 2026

Quick Answer: The 1-norm, also known as the Manhattan or taxicab norm, measures the sum of absolute values of vector components. For a vector [3, −4, 5], the 1-norm is |3| + |−4| + |5| = 12. It is widely used in machine learning and signal processing due to its robustness to outliers.

Key Facts

The 1-norm is defined as the sum of absolute values of vector components
For vector [−2, 3, 4], the 1-norm equals 9
Also known as the Manhattan norm due to grid-like distance analogy
Used in Lasso regression to enforce sparsity in models
Differs from 2-norm, which calculates Euclidean distance using square roots

Overview

The 1-norm, formally known as the L1 norm, is a mathematical method for measuring the magnitude of a vector by summing the absolute values of its components. Unlike the more common Euclidean (L2) norm, the 1-norm does not involve squaring or square roots, making it computationally simpler in certain applications.

It is particularly useful in scenarios where robustness to outliers is important, as it penalizes large deviations less severely than squared norms. The 1-norm plays a critical role in optimization, signal processing, and machine learning, especially in regularization techniques that promote sparsity.

Definition: The 1-norm of a vector is the sum of the absolute values of its elements; for vector v = [−2, 5, 1], the 1-norm is |−2| + |5| + |1| = 8.
Manhattan distance: Also called the taxicab norm because it mimics the path a taxi takes in a grid-based city layout, summing horizontal and vertical distances without diagonals.
Mathematical notation: Denoted as ||x||₁, where the subscript 1 indicates the norm type, distinguishing it from L2 or L∞ norms.
Sparsity promotion: In machine learning, the 1-norm is used in Lasso regression to shrink coefficients toward zero, effectively selecting fewer features.
Robustness: Because it uses absolute values, the 1-norm is less sensitive to extreme values than the 2-norm, which squares differences and amplifies large errors.

How It Works

Understanding the 1-norm involves breaking down how it computes vector magnitude differently from other norms, focusing on absolute values rather than squared ones.

Vector components: Each element in a vector is treated independently; the 1-norm sums the absolute values of all entries without considering direction.
Formula: For an n-dimensional vector x = [x₁, x₂, ..., xₙ], the 1-norm is computed as Σ_i=1ⁿ |xᵢ|, resulting in a non-negative scalar.
Example calculation: A vector [−3, 0, 4] has a 1-norm of |−3| + |0| + |4| = 7, reflecting total displacement without diagonal shortcuts.
Geometric interpretation: In 2D space, the 1-norm forms a diamond-shaped unit ball, contrasting with the circular shape of the 2-norm.
Computational efficiency: Since it avoids squaring and square roots, the 1-norm is faster to compute in high-dimensional data environments like image processing.
Optimization use: The 1-norm is used in compressed sensing and sparse coding, where minimizing ||x||₁ helps recover signals from incomplete measurements.

Comparison at a Glance

Below is a comparison of the 1-norm with other common vector norms:

Norm Type	Formula	Example (Vector [3, −4])	Key Feature
1-norm (L1)	Σ \|xᵢ\|	\|3\| + \|−4\| = 7	Encourages sparsity, robust to outliers
2-norm (L2)	√(Σ xᵢ²)	√(9 + 16) = √25 = 5	Euclidean distance, smooth optimization
Infinity norm (L∞)	max(\|xᵢ\|)	max(3, 4) = 4	Measures maximum component
L0 "norm"	Count of non-zero entries	2 non-zero values → 2	Measures sparsity directly, not a true norm
Weighted 1-norm	Σ wᵢ\|xᵢ\|	w=[1,2], result = 3 + 8 = 11	Allows variable penalties per component

This table highlights how the 1-norm differs from alternatives in both calculation and application. While the 2-norm is ideal for geometric accuracy, the 1-norm excels in feature selection and noise resilience, especially in high-dimensional datasets like genomics or text analysis.

Why It Matters

The 1-norm is more than a mathematical curiosity—it has real-world impact across data science, engineering, and statistics. Its ability to induce sparsity makes it indispensable in modern machine learning pipelines.

Lasso regression: Uses 1-norm regularization to prevent overfitting by shrinking irrelevant coefficients to zero, improving model interpretability.
Image denoising: Total variation (based on 1-norm) removes noise while preserving edges in photographs and medical scans.
Compressed sensing: Enables reconstruction of signals from fewer samples, crucial in MRI imaging and wireless transmission.
Feature selection: In high-dimensional datasets, minimizing the 1-norm helps identify the most relevant variables automatically.
Robust statistics: Less influenced by extreme values than L2 methods, making it suitable for financial or outlier-prone data.
Computational scalability: Efficient algorithms like coordinate descent make 1-norm optimization feasible for big data applications.

As data complexity grows, the 1-norm remains a foundational tool for extracting meaningful patterns without overcomplication. Its balance of simplicity, efficiency, and effectiveness ensures continued relevance in both theoretical and applied fields.