# owid-catalog

> Python library for accessing Our World in Data's catalog of research data. Version: 1.0.1

# Catalog Library

The `owid-catalog` library is the foundation of Our World in Data's data management system. It serves two main purposes:

1. **Data API**: Access OWID data through unified client interfaces. We provide a reference for the most important objects and methods.
2. **Data Structures**: Enhanced pandas DataFrames with rich metadata support

## Installation

```bash
pip install owid-catalog
```

## Quick Start

### Accessing Data via API

```python
from owid.catalog import Client

client = Client()

# Get chart data
tb = client.charts.fetch("life-expectancy")

# Search for indicators
results = client.indicators.search("renewable energy")
variable = results[0].fetch()

# Query catalog tables
results = client.tables.search(table="population", namespace="un")
tb = results[0].fetch()
```

### Working with Data Structures

```python
from owid.catalog import Table

# Tables are pandas DataFrames with metadata
tb = Table(df)
tb.metadata.short_name = "population"

# Metadata propagates through operations
tb_filtered = tb[tb["year"] > 2000]  # Keeps metadata
tb_grouped = tb.groupby("country").sum()  # Preserves metadata
```

---

# owid-catalog: Data APIs

The Data API provides unified access to OWID's published data through a simple client interface.

## Quick Reference
The API library is centered around the `Client` class, which provides quick access to different data APIs: `IndicatorsAPI`, `TablesAPI`, and `ChartsAPI`. Each API provides methods `search()` and `fetch()` for discovering and retrieving data, respectively.

For example to fetch a table by its path:

```python
from owid.catalog import Client

client = Client()
tb = client.tables.fetch("garden/un/2024-07-12/un_wpp/population")
```

For convenience, the library provides functions for the most common use cases:

```python
from owid.catalog import search, fetch

# Search for charts (default)
results = search("population")
tb = results[0].fetch()

# Direct fetch (by chart slug or table path)
tb = fetch("life-expectancy")
tb = fetch("garden/un/2024-07-12/un_wpp/population")
```

### Lazy Loading

All `fetch()` methods return `Table`-like objects, which resemble pandas.DataFrame with the addition of metadata attributes that describe the data.
```python
tb = client.charts.fetch("life-expectancy")
tb.metadata  # Available immediately
tb["life_expectancy_0"].metadata  # Column metadata available
```

Optionally, you can defer data loading until it's actually needed, by using the `load_data=False` parameter in `fetch()` methods.


### Path Formats

Different APIs use different path conventions:

- **Charts**: `"life-expectancy"` (simple slug), `"years-of-schooling?metric_type=expected_years_schooling&level=primary&sex=boys"` (with query params), or `"https://ourworldindata.org/grapher/life-expectancy"` (full URL)
- **Tables**: `"garden/un/2024-07-12/un_wpp/population"` (channel/namespace/version/dataset/table)
- **Indicators**: `"garden/un/2024-07-12/un_wpp/population#population"` (table path + #column)

## API Reference

### API result types

Result objects returned by `fetch()` and `search()` methods.

---

We have created a python library to enable easy access to our large data catalog. It also assists our work in ETL, as it contains various methods and objects essential to the data wrangling procceses.


Currently, this library lives in the `etl` repository ( find it here).

### Installation
Simply install it from PyPI:

```shell
pip install owid-catalog
```

### Update release
After working on your changes in the library, publishing to PyPI is automated:

1. **Bump the version** in  `lib/catalog/pyproject.toml`
2. **Update the changelog** in  `lib/catalog/README.md`
3. **Commit and push to `master`** - the package will be automatically published to PyPI via  GitHub Actions

The workflow triggers automatically when `lib/catalog/pyproject.toml` changes on the master branch. It includes a safety check to ensure the version was actually bumped before publishing.

**Manual trigger:** You can still manually trigger the workflow by clicking `Run Workflow` in  GitHub Actions if needed.

### Generate `llms.txt`

The library ships an `llms.txt` file (at `docs/libraries/catalog/llms.txt`) that is auto-generated from module docstrings and documentation markdown files. To regenerate it after changing docstrings or docs:

```shell
make docs.llms
```

This runs `docs/ignore/others/bake_llms_txt.py`, which inspects the public API surface and doc files so the output stays in sync with the codebase.

---

# owid-catalog: Data Structures and Processing

Enhanced pandas data structures with rich metadata support for OWID's data processing pipelines.

## Quick Reference

```python
from owid.catalog import Dataset, Table, Variable
from owid.catalog import processing as pr

# Create a table with metadata
tb = Table(df, metadata={"short_name": "population"})
```

### Metadata Hierarchy

```
Dataset
├── metadata: DatasetMeta (sources, licenses, title)
└── Tables
    ├── metadata: TableMeta (table-level info)
    └── Variables (columns)
        └── metadata: VariableMeta (unit, description, sources)
```

### Metadata Propagation
As the table is processed, metadata is preserved and propagated to resulting tables and variables.

```python
# Slicing
tb_filtered = tb[tb["year"] > 2000]  # Keeps metadata
# Filtering
tb_loc = tb.loc[tb["country"] == "USA"]  # Keeps metadata
# Sorting
tb_sorted = tb.sort_values("gdp_per_capita")  # Keeps metadata
# Column operations
tb["gdp_per_capita_usd"] = tb["gdp_per_capita"] * 2

# Merging
tb_merged = pr.merge(tb1, tb2, on="country")  # Merges metadata
# Concatenating
tb_concat = pr.concat([tb1, tb2])  # Combines metadata
# Pivoting
tb_pivot = pr.pivot(tb, index="year", ...)  # Adjusts metadata
# Melting
tb_melted = pr.melt(tb, ...)
```

### File Formats

Tables support multiple formats with automatic detection: feather, parquet, and CSV. Metadata is stored separately in `.meta.json` files.

## Reference

Metadata-aware alternatives to pandas functions.

Container for multiple tables with shared metadata.

pandas DataFrame with column-level metadata.

pandas Series with metadata.

---

# API Reference (owid.catalog.api)

## quick

Quick access functions for data discovery and retrieval.

### `fetch(path: 'str') -> 'Table | ChartTable'`

Fetch data directly by path (auto-detects tables, indicators, or charts).

This function downloads the data associated with the given path. It auto-detects
whether you're accessing a table, indicator, or chart based on the path format.

Args:
    path: Path to the data resource:

        - Table: "channel/namespace/version/dataset/table"
        - Indicator: "channel/namespace/version/dataset/table#variable"
        - Chart slug: "life-expectancy"
        - Chart URL: "https://ourworldindata.org/grapher/life-expectancy"
        - Chart slug with query params: "years-of-schooling?metric_type=expected_years_schooling&level=primary&sex=boys"
        - Explorer URL: "https://ourworldindata.org/explorers/energy"

Returns:
    Table (for tables or indicators) or CharTable (for charts)

Raises:
    ValueError: If path format is invalid or resource not found

Example:
    ```python
    # Fetch table
    tb = fetch("garden/un/2024-07-12/un_wpp/population")
    print(tb.shape)
    print(tb.metadata)

    # Fetch indicator as Table (single column)
    tb = fetch("garden/un/2024-07-12/un_wpp/population#population")
    print(tb.columns)

    # Fetch chart data (slug auto-detected)
    tb = fetch("life-expectancy")
    print(tb.metadata.title)

    # Fetch chart with query params
    tb = fetch("years-of-schooling?metric_type=expected_years_schooling&level=primary&sex=boys")

    # Fetch chart by full URL
    tb = fetch("https://ourworldindata.org/grapher/life-expectancy")

    # Fetch from grapher channel
    tb = fetch("grapher/demography/2025-10-22/life_expectancy/life_expectancy_at_birth")
    ```


### `search(name: 'str | None' = None, *, kind: "Literal['table', 'indicator', 'chart']" = 'chart', limit: 'int' = 10, namespace: 'str | None' = None, version: 'str | None' = None, dataset: 'str | None' = None, channel: 'str | None' = None, match: "Literal['exact', 'contains', 'regex', 'fuzzy']" = 'fuzzy', fuzzy_threshold: 'int' = 70, case: 'bool' = False, latest: 'bool' = False) -> 'ResponseSet[TableResult] | ResponseSet[IndicatorResult] | ResponseSet[ChartResult]'`

Search for available data without downloading (for browsing/discovery).

This function searches for data in the catalog and returns a ResponseSet of results
without downloading the actual data. Use this to explore and find the exact path or
slug, then use fetch() to download the data.

Args:
    name: Name or pattern to search for (e.g., "population", "gdp", "life-expectancy").
        Required for indicators and charts. Optional for tables (can filter by other params).
    kind: What to search for (default: "chart"):

        - "chart": Search published charts (returns ResponseSet[ChartResult])
        - "table": Search catalog tables (returns ResponseSet[TableResult])
        - "indicator": Search indicators/variables (returns ResponseSet[IndicatorResult])
    limit: Maximum number of results to return (default: 10)
    namespace: Filter by namespace (e.g., "un", "worldbank"). Only for tables.
    version: Filter by specific version (e.g., "2024-01-15"). Only for tables.
    dataset: Filter by dataset name. Only for tables.
    channel: Filter by channel (e.g., "garden", "grapher"). Only for tables, and `name` field.
    match: Matching mode (default: "fuzzy" for typo-tolerance) (only for tables, and `name` field):

        - "fuzzy": Typo-tolerant similarity matching
        - "exact": Exact string match
        - "contains": Substring match
        - "regex": Regular expression
    fuzzy_threshold: Minimum similarity score 0-100 for fuzzy matching (default: 70).  Only for tables, and `name` field.
    case: Case-sensitive search (default: False).  Only for tables.
    latest: If True, keep only the latest version of each result
        (grouped by namespace/dataset/table or indicator). Only for tables and indicators.
        Note: results without a version are dropped when this is enabled.

Returns:
    Search results. Results can be indexed, iterated, and provide access to metadata without downloading data.

Example:
    ```python
    # Search for charts (default)
    results = search("population")
    print(f"Found {len(results)} charts")
    print(results[0].slug)  # Access chart slug without downloading data

    # Search for tables
    results = search("population", kind="table")
    print(results[0].path)

    # Search for indicators
    results = search("gdp", kind="indicator")
    print(results[0].title)

    # Exact match for tables
    results = search("population", kind="table", match="exact")

    # Filter tables by namespace and version
    results = search("wdi", kind="table", namespace="worldbank_wdi", version="2024-01-10")

    # Then fetch the data you need:
    tb = results[0].fetch()
    ```

Warning:
    For indicators and charts, filtering parameters (namespace, version, dataset, channel)
    are ignored as they don't apply to those search types.


## client

### Client

Unified client for all OWID data APIs.

Provides access to our main APIs:

- ChartsAPI: Fetch and search for published charts
- IndicatorsAPI: Semantic search for data indicators
- TablesAPI: Query and load tables from the data catalog

Attributes:
    charts: ChartsAPI instance for chart operations and search.
    indicators: IndicatorsAPI instance for indicator search.
    tables: TablesAPI instance for catalog operations.

Example:
    ```python
    from owid.catalog import Client

    client = Client()

    # Charts: Published visualizations
    results = client.charts.search("climate change")
    chart = client.charts.fetch("life-expectancy")

    # Tables: Catalog datasets
    results = client.tables.search(table="population", namespace="un")
    tb = client.tables.fetch("garden/un/2024-07-12/un_wpp/population")

    # Indicators: Semantic search for data series
    results = client.indicators.search("renewable energy")
    variable = client.indicators.fetch("garden/un/2024-07-12/un_wpp/population#population")

    # Custom URLs (e.g., for staging environments)
    staging_client = Client(catalog_url="https://staging-catalog.example.com/")
    ```


## charts

- **`ChartNotFoundError`**: Raised when a chart does not exist.

### ChartResult

An OWID chart (from fetch or search).

Fields populated depend on the source:
- fetch(): Provides config and metadata
- search(): Provides subtitle, available_entities, num_related_articles, published_at, last_updated, popularity

Core fields (slug, title, url) are always populated.

Attributes:
    slug: Chart URL identifier (e.g., "life-expectancy").
    title: Chart title.
    url: Full URL to the interactive chart.
    config: Raw grapher configuration dict (from fetch).
    metadata: Chart metadata dict including column info (from fetch).
    subtitle: Chart subtitle/description (from search).
    available_entities: List of entities/countries in the chart (from search).
    num_related_articles: Number of related articles (from search).
    published_at: When the chart was first published (from search).
    last_updated: When the chart was last updated (from search).
    popularity: Popularity score (0.0 to 1.0) based on analytics views (from search).

#### `ChartResult.chart_base_url` (property)

Base URL for this chart type (grapher or explorer, derived from site_url and type).

#### `ChartResult.description` (property)

Return a string description of the chart result.

#### `fetch(self, *, load_data: 'bool' = True) -> 'ChartTable'`

Fetch chart data as ChartTable with rich metadata.

Args:
    load_data: If True (default), load full chart data.
               If False, load only structure (columns and metadata) without rows.

Returns:
    ChartTable with chart data and chart_config. Column metadata (unit, description, etc.)
    is populated from the chart's metadata.json.

Note:
    Explorer views (``type="explorerView"``) are best-effort. Some explorers
    may return 503 or other errors from their CSV endpoint. In those cases an
    :class:`ExplorerFetchError` is raised with details.

Example:
    ```python
    result = client.charts.search("life expectancy")[0]
    tb = result.fetch()
    print(tb.head())
    print(tb["life_expectancy_0"].metadata.unit)
    ```

#### `init_private_attributes(self: 'BaseModel', context: 'Any', /) -> 'None'`

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that's what pydantic-core passes when calling it.

Args:
    self: The BaseModel instance.
    context: The context.

#### `ChartResult.url` (property)

Full URL to the interactive chart (built from chart_base_url, slug, and query_params).


### ChartsAPI

API for accessing OWID chart data and metadata.

Provides methods to fetch data and metadata from published charts
on ourworldindata.org. Also includes search functionality to find
charts by keywords.

Example:
    ```python
    from owid.catalog import Client

    client = Client()

    # Fetch chart data as ChartTable
    tb = client.charts.fetch("life-expectancy")
    print(tb.head())
    print(tb["life_expectancy_0"].metadata.unit)
    print(tb.metadata.chart_config.get("title"))  # Access chart config

    # Search for charts
    results = client.charts.search("gdp per capita")
    tb = results[0].fetch()  # Fetch chart data as ChartTable
    ```

#### `ChartsAPI.base_url` (property)

Base URL for the Grapher (read-only).

#### `fetch(self, slug_or_url: 'str', *, type: 'ChartType | None' = None, load_data: 'bool' = True, timeout: 'int | None' = None) -> 'ChartTable'`

Fetch chart data as a ChartTable with rich metadata.

Accepts a chart slug, a slug with query parameters, or a full URL. The slug,
query parameters, and chart type are extracted automatically.

Args:
    slug_or_url: One of:

        - Chart slug: ``"life-expectancy"``
        - Slug with query params: ``"education-spending?level=primary&spending_type=gdp_share"``
        - Full grapher URL: ``"https://ourworldindata.org/grapher/life-expectancy?tab=table"``
        - Full explorer URL: ``"https://ourworldindata.org/explorers/covid?Metric=Cases"``
    type: Override the chart type. Defaults to ``"chart"`` (grapher).
        Use ``"explorerView"`` for explorer views. Auto-detected from full URLs.
    load_data: If True (default), load full chart data.
               If False, load only structure (columns and metadata) without rows.
    timeout: HTTP request timeout in seconds. Defaults to client timeout.

Returns:
    ChartTable with chart data and chart_config. Column metadata (unit, description, etc.)
    is populated from the chart's metadata.json. Chart config is accessible via .metadata.chart_config.

Note:
    Explorer views are best-effort. Some explorers may return 503 or other errors
    from their CSV endpoint.

Example:
    ```python
    # Fetch a grapher chart by slug
    tb = client.charts.fetch("life-expectancy")

    # Fetch with query params (e.g., a multiDim view)
    tb = client.charts.fetch("education-spending?level=primary&spending_type=gdp_share")

    # Fetch from a full URL (type and query params auto-detected)
    tb = client.charts.fetch("https://ourworldindata.org/explorers/covid?Metric=Cases")

    # Explicitly fetch an explorer view
    tb = client.charts.fetch("covid?Metric=Cases", type="explorerView")
    ```

#### `search(self, query: 'str', *, countries: 'list[str] | None' = None, topics: 'list[str] | None' = None, require_all_countries: 'bool' = False, limit: 'int' = 10, page: 'int' = 0, timeout: 'int | None' = None) -> 'ResponseSet[ChartResult]'`

Search for charts matching a query.

Args:
    query: Search query string.
    countries: Optional list of country names to filter by.
    topics: Optional list of topic names to filter by.
    require_all_countries: If True, only return charts with ALL
        specified countries. Default False (any country matches).
    limit: Maximum results to return (1-100). Default 20.
    page: Page number for pagination (0-indexed). Default 0.
    timeout: HTTP request timeout in seconds. Defaults to client timeout.

Returns:
    ResponseSet containing ChartResult objects, sorted by popularity (most viewed first).
    Each result includes a `popularity` field (0.0-1.0) based on analytics views.

Example:
    ```python
    # Basic search (sorted by popularity)
    results = client.charts.search("life expectancy")
    for chart in results:
        print(f"{chart.title}: popularity={chart.popularity:.3f}")

    # Filter by countries
    results = client.charts.search(
        "gdp",
        countries=["France", "Germany"],
        require_all_countries=True
    )

    # Get data from search results
    tb = results[0].fetch()
    ```


- **`ExplorerFetchError`**: Raised when an explorer view cannot be fetched (e.g., 503 from CSV endpoint).

- **`LicenseError`**: Raised when chart data cannot be downloaded due to licensing.

### ParsedSlug

Result of parsing a chart slug or URL.


### `parse_chart_slug(slug_or_url: 'str') -> 'ParsedSlug'`

Extract slug, query params, and type from a URL or plain slug.

Args:
    slug_or_url: Chart slug, grapher URL, or explorer URL.

Returns:
    ParsedSlug with slug, query_params, and type.

Raises:
    ValueError: If URL is not a valid grapher or explorer URL.


## tables

- **`CatalogVersionError`**: Raised when catalog format version is newer than library version.

### TableResult

A table found in the catalog.

Attributes:
    table: Table name.
    path: Full path to the table.
    channel: Data channel (garden, meadow, etc.).
    namespace: Data provider namespace.
    version: Version string.
    dataset: Dataset name.
    dimensions: List of dimension columns.
    title: Human-readable title (from table or dataset metadata).
    description: Detailed description (from table or dataset metadata).
    is_public: Whether the data is publicly accessible.
    formats: List of available formats.
    popularity: Popularity score (0.0 to 1.0) based on analytics views.

#### `fetch(self, *, load_data: 'bool' = True) -> 'Table'`

Fetch table data.

Args:
    load_data: If True (default), load full table data.
               If False, load only structure (columns and metadata) without rows.

Returns:
    Table with data and metadata (or just metadata if load_data=False).

Example:
    ```python
    result = client.tables.search(table="population")[0]
    tb = result.fetch()
    print(tb.head())
    print(tb.columns)
    ```

#### `init_private_attributes(self: 'BaseModel', context: 'Any', /) -> 'None'`

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that's what pydantic-core passes when calling it.

Args:
    self: The BaseModel instance.
    context: The context.


### TablesAPI

API for querying and loading tables from the OWID catalog.

Provides methods to search for tables by various criteria and
load table data from the catalog.

Example:
    ```python
    from owid.catalog import Client

    client = Client()

    # Search for tables
    results = client.tables.search(table="population", namespace="un")

    # Load the first result
    table = results[0].fetch()

    # Fetch table directly by path
    tb = client.tables.fetch("garden/un/2024-07-12/un_wpp/population")
    print(tb.head())
    ```

#### `TablesAPI.catalog_url` (property)

Base URL for the catalog (read-only).

#### `fetch(self, path: 'str', *, load_data: 'bool' = True, formats: 'list[str] | None' = None, is_public: 'bool' = True, timeout: 'int | None' = None) -> 'Table'`

Fetch a table by catalog path.

Loads the table directly from the catalog.

Args:
    path: Full catalog path (e.g., "garden/un/2024-07-12/un_wpp/population").
    load_data: If True (default), load full table data.
               If False, load only table structure (columns and metadata) without rows.
    formats: List of formats to try. If None, tries all supported formats.
    is_public: Whether the table is publicly accessible. Default True.
    timeout: HTTP request timeout in seconds (currently unused, reserved for future).

Returns:
    Table with data and metadata (or just metadata if load_data=False).

Raises:
    ValueError: If path format is invalid.
    KeyError: If table not found at path.

Example:
    ```python
    # Load table with data
    tb = client.tables.fetch("garden/un/2024-07-12/un_wpp/population")
    print(tb.head())

    # Load only metadata (no data rows)
    tb = client.tables.fetch("garden/un/2024-07-12/un_wpp/population", load_data=False)
    print(tb.columns)
    ```

#### `search(self, table: 'str | None' = None, namespace: 'str | None' = None, version: 'str | None' = None, dataset: 'str | None' = None, channel: 'str | None' = None, case: 'bool' = False, match: "Literal['exact', 'contains', 'regex', 'fuzzy']" = 'exact', fuzzy_threshold: 'int' = 70, timeout: 'int | None' = None, refresh_index: 'bool' = False, latest: 'bool' = False) -> 'ResponseSet[TableResult]'`

Search the catalog for tables matching criteria.

Args:
    table: Table name pattern to search for
    namespace: Filter by namespace (exact match)
    version: Filter by version (exact match)
    dataset: Dataset name pattern to search for
    channel: Filter by channel (exact match). Defaults to 'garden' if not specified.
    case: Case-sensitive search (default: False)
    match: How to match table/dataset names (default: "exact"):
        - "fuzzy": Typo-tolerant similarity matching
        - "exact": Exact string match
        - "contains": Substring match
        - "regex": Regular expression pattern
    fuzzy_threshold: Minimum similarity score 0-100 for fuzzy matching.
        Only used when match="fuzzy". (default: 70)
    timeout: HTTP request timeout in seconds for catalog loading. Defaults to client timeout.
    refresh_index: If True, force re-download of the catalog index. Default False.
    latest: If True, keep only the latest version of each table
        (grouped by namespace, dataset, table, channel). Default False.
        Note: results without a version are dropped when this is enabled.

Returns:
    ResponseSet containing matching TableResult objects, sorted by popularity (most viewed first).
    If match="fuzzy", results are sorted by fuzzy relevance score instead.
    Each result includes a `popularity` field (0.0-1.0) based on analytics views.

Example:
    ```python
    # Exact match (default) - searches garden channel by default
    results = client.tables.search(table="population")

    # Substring match
    results = client.tables.search(table="pop", match="contains")

    # Regex search
    results = client.tables.search(table="population.*density", match="regex")

    # Fuzzy search sorted by relevance
    results = client.tables.search(table="populaton", match="fuzzy")

    # Case-sensitive fuzzy search with custom threshold
    results = client.tables.search(table="GDP", match="fuzzy", case=True, fuzzy_threshold=85)

    # Filter by namespace and version
    results = client.tables.search(
        table="wdi",
        namespace="worldbank_wdi",
        version="2025-09-08",
    )

    # Search in a specific channel
    results = client.tables.search(
        table="wdi",
        namespace="worldbank_wdi",
        version="2025-09-08",
        channel="meadow",
    )

    # Load a specific result
    tb = results[0].fetch()
    ```


## indicators

### IndicatorResult

An indicator found via semantic search.

Attributes:
    title: Indicator title/name.
    indicator_id: Unique indicator ID.
    path: Path in the catalog (e.g., "grapher/un/2024-07-12/un_wpp/population#population").
    channel: Data channel (parsed from path).
    namespace: Data provider namespace (parsed from path).
    version: Version string (parsed from path).
    dataset: Dataset name (parsed from path).
    column_name: Column name in the table.
    description: Full indicator description.
    unit: Unit of measurement.
    score: Semantic similarity score (0-1).
    n_charts: Number of charts using this indicator.
    popularity: Popularity score (0.0 to 1.0) based on analytics views.

#### `fetch(self, *, load_data: 'bool' = True) -> 'Table'`

Fetch indicator data as a single-column Table.

Args:
    load_data: If True (default), load full indicator data.
               If False, load only structure (columns and metadata) without rows.

Returns:
    Table with the indicator column (plus index). Metadata is preserved.

Example:
    ```python
    result = client.indicators.search("population")[0]
    tb = result.fetch()
    print(tb.head())
    print(tb[tb.columns[0]].metadata.unit)
    ```

#### `fetch_table(self, *, load_data: 'bool' = True) -> 'Table'`

Fetch the full table containing this indicator.

Args:
    load_data: If True (default), load full table data.
               If False, load only structure (columns and metadata) without rows.

Returns:
    Table with all columns including this indicator.

Example:
    ```python
    result = client.indicators.search("population")[0]
    tb = result.fetch_table()
    print(tb.columns)
    ```

#### `model_post_init(self, _IndicatorResult__context: 'Any') -> 'None'`

Parse dataset, version, namespace, channel from path.


### IndicatorsAPI

API for semantic search of OWID indicators.

Uses the search.owid.io service to find indicators using
natural language queries and vector embeddings.

Example:
    ```python
    from owid.catalog import Client

    client = Client()

    # Search for indicators
    results = client.indicators.search("solar power generation")
    for ind in results:
        print(f"{ind.title} (score: {ind.score:.2f})")

    # Fetch the indicator data as a single-column Table
    tb = results[0].fetch()

    # Or fetch the full table containing the indicator
    full_table = results[0].fetch_table()
    ```

#### `IndicatorsAPI.catalog_url` (property)

Base URL for the catalog (read-only).

#### `fetch(self, path: 'str', *, load_data: 'bool' = True, timeout: 'int | None' = None) -> 'Table'`

Fetch a specific indicator by catalog path.

Args:
    path: Catalog path in format "channel/namespace/version/dataset/table#column"
    load_data: If True (default), load full indicator data.
               If False, load only structure (columns and metadata) without rows.
    timeout: HTTP request timeout in seconds (reserved for future use).

Returns:
    Table with a single indicator column (plus index). Metadata is preserved.

Raises:
    ValueError: If path format is invalid, table not found, or column doesn't exist.

Example:
    ```python
    # Fetch indicator by path
    tb = client.indicators.fetch("garden/un/2024-07-12/un_wpp/population#population")
    print(tb.head())
    print(tb["population"].metadata.unit)
    ```

#### `search(self, query: 'str', *, limit: 'int' = 10, show_legacy: 'bool' = False, latest: 'bool' = False, sort_by: "Literal['relevance', 'similarity']" = 'relevance', timeout: 'int | None' = None) -> 'ResponseSet[IndicatorResult]'`

Search for indicators using natural language.

Uses semantic search to find indicators that match the
meaning of your query, not just keyword matching.

Args:
    query: Natural language search query
        (e.g., "renewable energy capacity", "child mortality rate").
    limit: Maximum number of results to return. Default 10.
    show_legacy: If True, show pre-ETL indicators only. Default False.
    latest: If True, keep only the latest version of each indicator
        (grouped by namespace, dataset, column_name). Default False.
        Note: results without a version are dropped when this is enabled.
    sort_by: How to sort results (default: "relevance"):

        - "relevance": Combined score blending semantic similarity (60%) and popularity (40%).
        - "similarity": Sort by semantic similarity score only.
    timeout: HTTP request timeout in seconds. Defaults to client timeout.

Returns:
    SearchResults containing IndicatorResult objects, sorted according to `sort_by`.
    Each result includes a `popularity` field (0.0-1.0) based on analytics views.

Example:
    ```python
    # Search for indicators (sorted by relevance by default)
    results = client.indicators.search("CO2 emissions per capita")

    # View results
    for ind in results:
        print(f"{ind.title}")
        print(f"  Score: {ind.score:.3f}")
        print(f"  Popularity: {ind.popularity:.3f}")

    # Load data from top result
    tb = results[0].fetch()

    # Sort by semantic similarity only (original behavior)
    results = client.indicators.search("CO2 emissions", sort_by="similarity")
    ```

#### `IndicatorsAPI.search_url` (property)

URL for the indicators search API (read-only).


## models

### ResponseSet

Generic container for API responses.

Provides iteration, indexing, and conversion to CatalogFrame
for backwards compatibility.

Attributes:
    items: List of result objects.
    query: The query that produced these results.
    total_count: Total number of results available (may be more than len(items)).

#### `filter(self, predicate: 'Callable[[T], bool]') -> 'ResponseSet[T]'`

Filter results by predicate function.

Returns a new ResponseSet with only items that match the predicate.
The predicate should return True for items to keep.

Args:
    predicate: Function that takes an item of results (e.g. ChartResult) and returns True/False.

Returns:
    New ResponseSet with filtered results.

Example:
    ```py
    >>> # Filter results by version
    >>> results.filter(lambda r: r.version > '2024')

    >>> # Filter by namespace
    >>> results.filter(lambda r: r.namespace == "worldbank")

    >>> # Chain multiple filters
    >>> results.filter(lambda r: r.version > '2024').filter(lambda r: r.namespace == "un")
    ```

#### `latest(self, by: 'str | None' = None) -> 'T'`

Get the most recent result.

Returns the single item with the highest value for the sort key.

Args:
    by: Attribute name to sort by. If None (default), auto-detects:
        - ChartResult: uses last_updated (as ISO string with time)
        - TableResult/IndicatorResult: uses version

Returns:
    Single item with the highest value for the specified field.

Raises:
    ValueError: If no results are available.
    AttributeError: If the specified attribute doesn't exist on the results.

Example:
    ```py
    >>> # For TableResult/IndicatorResult - auto-detects version
    >>> latest_table = results.latest()
    >>> tb = latest_table.fetch()

    >>> # For ChartResult - auto-detects last_updated
    >>> latest_chart = chart_results.latest()
    ```

#### `model_post_init(self, _ResponseSet__context: 'Any') -> 'None'`

Set total_count to length of results if not provided.

#### `set_ui_advanced(self) -> 'ResponseSet[T]'`

Switch to advanced display showing all fields (type, slug, popularity, etc.).

Returns:
    Self (for chaining).

Example:
    ```py
    >>> results.set_ui_advanced()
    ```

#### `set_ui_basic(self) -> 'ResponseSet[T]'`

Switch to basic display showing only key fields (title, description, url).

Returns:
    Self (for chaining).

Example:
    ```py
    >>> results.set_ui_basic()
    ```

#### `sort_by(self, key: 'str | Callable[[T], Any]', *, reverse: 'bool' = False) -> 'ResponseSet[T]'`

Sort results by attribute name or key function.

Returns a new ResponseSet with items sorted by the specified key.

Args:
    key: Either an attribute name (string) or a function that extracts a comparison key from each item.
    reverse: If True, sort in descending order (default: False).

Returns:
    New ResponseSet with sorted results.

Example:
    ```py
    >>> # Sort by version (ascending)
    >>> results.sort_by('version')

    >>> # Sort by version (descending - latest first)
    >>> results.sort_by('version', reverse=True)

    >>> # Sort by custom function (e.g., by score)
    >>> results.sort_by(lambda r: r.score, reverse=True)

    >>> # Chain sorting and filtering
    >>> results.filter(lambda r: r.version > '2024').sort_by('version', reverse=True)
    ```

#### `to_dict(self) -> 'list[dict[str, Any]]'`

Convert results to a list of plain dictionaries.

Useful for serializing results for AI/LLM context windows
or any scenario where you need simple dict representations.

Returns:
    List of dictionaries, one per result item.

Example:
    ```py
    >>> results = client.charts.search("gdp")
    >>> results.to_dict()
    [{'slug': 'gdp-per-capita', 'title': 'GDP per capita', ...}, ...]
    ```

#### `to_frame(self, all_fields: 'bool | None' = None) -> 'pd.DataFrame'`

Convert results to a DataFrame.

Args:
    all_fields: If True, show all fields. If False, show only key fields.
        If None (default), use the instance's _ui_advanced setting.

Returns:
    DataFrame with one row per result.


### `get_thumbnail_url(url: 'str') -> 'str'`

Turn https://ourworldindata.org/grapher/life-expectancy?country=~CHN"
Into https://ourworldindata.org/grapher/life-expectancy.png?country=~CHN


---

# Core Reference (owid.catalog.core)

## tables

### Table

`Table` extends `pandas.DataFrame`. All standard DataFrame methods are available. Only methods unique to this class are listed below.


Enhanced pandas DataFrame with rich metadata support.

Table extends pandas DataFrame to include metadata at both the table level
and individual column level. It's the primary data structure for ETL operations.

Attributes:
    metadata: Table-level metadata (title, description, sources, etc).
    _fields: Dictionary mapping column names to their VariableMeta objects.
    DEBUG: Set to True to enable metadata validation debugging.

Example:
    Create a table from a DataFrame:
    ```
    df = pd.DataFrame({"country": ["USA", "UK"], "gdp": [20, 3]})
    table = Table(df, short_name="gdp")
    ```

    Create with metadata:
    ```python
    meta = TableMeta(short_name="gdp", title="GDP by country")
    table = Table(df, metadata=meta)
    ```

    Copy metadata from another table:
    ```python
    new_table = Table(df, like=old_table)
    ```

#### `Table.all_columns` (property)

Get names of all columns including index levels.

Returns both regular columns and index names in a single list,
useful for iterating over all variables in the table.

Returns:
    List of all column names and index level names.

Example:
    ```python
    table = table.set_index(["country", "year"])
    print(table.all_columns)  # ["country", "year", "gdp", "population"]
    ```

#### `astype(self, *args: 'Any', **kwargs: 'Any') -> 'Table'`

Cast table columns to specified dtype(s).

Convert one or more columns to a specified data type. Wrapper
around pandas astype that returns a Table.

Args:
    *args: Positional arguments passed to pandas.DataFrame.astype.
    **kwargs: Keyword arguments passed to pandas.DataFrame.astype.

Returns:
    Table with columns cast to specified types.

Example:
    Cast single column:
    ```python
    table = table.astype({"population": int})
    ```

    Cast multiple columns:
    ```python
    table = table.astype({"year": int, "gdp": float})
    ```

    Cast all columns:
    ```python
    table = table.astype(str)
    ```

#### `check_metadata(self, ignore_columns: 'list[str] | None' = None) -> 'None'`

Check that all variables in the table have origins.

#### `Table.codebook` (property)

Generate a human-readable codebook for this table.

Creates a DataFrame summarizing all variables in the table with their
titles, descriptions, units, and source attributions.

Returns:
    DataFrame with columns:

        - column: Column name (including index columns)
        - title: Title from metadata (title_public > display.name > title)
        - description: Short description of the indicator
        - unit: Unit of measurement with short unit in parentheses
        - source: Formatted source attribution with URLs

Example:
    ```python
    codebook = table.codebook
    print(codebook.to_markdown())
    ```

#### `copy(self, deep: 'bool' = True) -> 'Table'`

Create a copy of the table with all metadata.

Args:
    deep: If True (default), make a deep copy of the data and metadata.
        If False, creates a shallow copy.

Returns:
    A new Table with copied data and metadata.

Example:
    ```python
    table_copy = table.copy()  # Deep copy
    table_copy = table.copy(deep=False)  # Shallow copy
    ```

#### `copy_metadata(self, from_table: 'Table', deep: 'bool' = False) -> 'Table'`

Copy metadata from another table to this table.

Copies both table-level metadata and variable-level metadata for all
matching columns. Useful for preserving metadata after transformations.

Args:
    from_table: Source table to copy metadata from.
    deep: If True, make a deep copy of the metadata. Default is False.

Returns:
    Self, for method chaining.

Example:
    ```python
    new_table = Table(transformed_df)
    new_table.copy_metadata(original_table)
    ```

#### `drop(self, *args: 'Any', **kwargs: 'Any') -> 'Table'`

Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and axis.
Wrapper around pandas drop that returns a Table.

Args:
    *args: Positional arguments passed to pandas.DataFrame.drop.
    **kwargs: Keyword arguments passed to pandas.DataFrame.drop.

Returns:
    Table with specified labels dropped.

Example:
    Drop columns:
    ```python
    table = table.drop(columns=["column1", "column2"])
    ```

    Drop rows by index:
    ```python
    table = table.drop(index=["row1", "row2"])
    ```

    Drop columns with axis parameter:
    ```python
    table = table.drop(["column1"], axis=1)
    ```

#### `equals_table(self, table: 'Table') -> 'bool'`

Check if two tables are equal including metadata.

Compares both data and metadata for equality. This is more
comprehensive than pandas equals() which only checks data.

Args:
    table: Table to compare with.

Returns:
    True if tables have identical data, metadata, and variable
    metadata. False otherwise.

Note:
    NaN values are handled specially to ensure consistent comparison
    even when NaN values are present.

Example:
    ```python
    if table1.equals_table(table2):
    ... print("Tables are identical")
    ```

#### `fillna(self, value: 'Any' = None, **kwargs: 'Any') -> 'Table'`

Usual fillna, but, if the object given to fill values with is a table, transfer its metadata to the filled
table.

#### `filter(self, *args: 'Any', **kwargs: 'Any') -> 'Table'`

Subset rows or columns based on their labels.

Filter the table to include only specified rows or columns by name.
Wrapper around pandas filter that returns a Table.

Args:
    *args: Positional arguments passed to pandas.DataFrame.filter.
    **kwargs: Keyword arguments passed to pandas.DataFrame.filter.
        Common kwargs include:
        - items: List of axis labels to select
        - like: Keep labels matching this string pattern
        - regex: Keep labels matching this regex pattern
        - axis: Axis to filter on (0 for rows, 1 for columns)

Returns:
    Filtered Table with only selected labels.

Example:
    Filter columns by exact names:
    ```python
    table = table.filter(items=["country", "year", "gdp"])
    ```

    Filter columns containing pattern:
    ```python
    table = table.filter(like="population")
    ```

    Filter columns with regex:
    ```python
    table = table.filter(regex="^gdp_.*")
    ```

#### `format(self, keys: 'str | list[str] | None' = None, verify_integrity: 'bool' = True, underscore: 'bool' = True, sort_rows: 'bool' = True, sort_columns: 'bool' = False, short_name: 'str | None' = None, **kwargs: 'Any') -> 'Table'`

Format the table according to OWID standards.

Applies standard OWID formatting: underscores column names, sets index,
verifies uniqueness, and sorts data. This is a convenience method that
chains multiple operations commonly used in ETL workflows.

Note:
    Underscoring happens first, so use underscored key names in the
    keys parameter (e.g., use 'country' if original had 'Country').

Args:
    keys: Index column name(s). If None, uses ["country", "year"].
    verify_integrity: If True (default), raise error if index has
        duplicate entries.
    underscore: If True (default), convert column names to snake_case
        format. Disable if names are already properly formatted.
    sort_rows: If True (default), sort rows by index in ascending order.
    sort_columns: If True, sort columns alphabetically. Default is False.
    short_name: Optional short name to assign to table metadata.
    **kwargs: Additional arguments passed to the underscore() method.

Returns:
    Formatted Table with standardized structure and metadata.

Raises:
    KeyError: If specified keys are not found in table columns.
    ValueError: If verify_integrity=True and index has duplicates.

Example:
    Basic formatting with default country/year index:
    ```python
    table = table.format()
    ```

    Equivalent to:
    ```python
    table = table.underscore().set_index(
        ["country", "year"], verify_integrity=True
    ).sort_index()
    ```

    Custom index columns:
    ```python
    table = table.format(["country", "year", "sex"])
    ```

    Skip underscoring if already formatted:
    ```python
    table = table.format(underscore=False, keys=["country", "year"])
    ```

    Format with custom table name:
    ```python
    table = table.format(short_name="population_density")
    ```

#### `from_records(*args: 'Any', **kwargs: 'Any') -> 'Table'`

Calling `Table.from_records` returns a Table, but does not call __init__ and misses metadata.

#### `get_column_or_index(self, name: 'str') -> 'indicators.Indicator'`

Get a variable by name from either columns or index.

Retrieves a Variable from the table, checking both regular columns
and index levels. This is useful when you don't know whether a
variable is stored as a column or index.

Args:
    name: Name of the variable to retrieve.

Returns:
    Variable object with data and metadata.

Raises:
    ValueError: If name is not found in either columns or index.

Example:
    ```python
    var = table.get_column_or_index("country")  # Works for column or index
    print(var.metadata.title)
    ```

#### `groupby(self, *args: 'Any', observed: 'bool' = True, **kwargs: 'Any') -> 'TableGroupBy'`

Groupby that preserves metadata. It uses observed=True by default.

#### `join(self, other: 'pd.DataFrame | Table', *args: 'Any', **kwargs: 'Any') -> 'Table'`

Join tables while preserving metadata.

Extends pandas join with proper type signature for Table.
Metadata from both tables is preserved in the result.

Args:
    other: Table or DataFrame to join with.
    *args: Positional arguments passed to pandas.DataFrame.join.
    **kwargs: Keyword arguments passed to pandas.DataFrame.join.
        Supports all pandas join parameters.

Returns:
    Joined table with combined metadata.

Example:
    ```python
    joined = table1.join(table2, on="country")
    joined = table1.join(table2, how="outer")
    ```

#### `Table.m` (property)

Metadata alias for shorter access (table.m instead of table.metadata).

#### `melt(self, id_vars: 'tuple[str] | list[str] | str | None' = None, value_vars: 'tuple[str] | list[str] | str | None' = None, var_name: 'str' = 'variable', value_name: 'str' = 'value', short_name: 'str | None' = None, *args: 'Any', **kwargs: 'Any') -> 'Table'`

Unpivot table from wide to long format.

Converts columns into rows, transforming wide-format data into
long-format. Wrapper around pandas melt that preserves metadata.
See owid.catalog.tables.melt() for full documentation.

Args:
    id_vars: Column(s) to use as identifier variables (not melted).
    value_vars: Column(s) to unpivot. If None, uses all columns
        except id_vars.
    var_name: Name for the variable column. Default is "variable".
    value_name: Name for the value column. Default is "value".
    short_name: Optional short name for resulting table metadata.
    *args: Additional positional arguments passed to melt().
    **kwargs: Additional keyword arguments passed to melt().

Returns:
    Melted Table in long format with preserved metadata.

Example:
    Melt all columns except country and year:
    ```python
    >>> long_table = table.melt(id_vars=["country", "year"])

    >>> # Melt specific columns:
    >>> long_table = table.melt(
    ...     id_vars=["country", "year"],
    ...     value_vars=["gdp", "population"]
    ... )

    >>> # Custom column names:
    >>> long_table = table.melt(
    ...     id_vars="country",
    ...     var_name="indicator",
    ...     value_name="measurement"
    ... )
    ```

#### `merge(self, right: 'Any', *args: 'Any', **kwargs: 'Any') -> 'Table'`

Merge with another DataFrame or Table.

Wrapper around pandas merge that preserves Table metadata.
See owid.catalog.tables.merge() for full documentation.

Args:
    right: DataFrame or Table to merge with.
    *args: Positional arguments passed to merge().
    **kwargs: Keyword arguments passed to merge().

Returns:
    Merged Table with combined metadata.

Example:
    ```python
    result = table1.merge(table2, on="country")
    result = table1.merge(table2, left_on="code", right_on="country_code")
    ```

#### `metadata_filename(self, path: 'str')`

#### `pivot(self, *, index: 'str | list[str] | None' = None, columns: 'str | list[str] | None' = None, values: 'str | list[str] | None' = None, join_column_levels_with: 'str | None' = None, short_name: 'str | None' = None, fill_dimensions: 'bool' = True, **kwargs: 'Any') -> 'Table'`

Reshape table from long to wide format.

Converts rows into columns, transforming long-format data into
wide-format. Wrapper around pandas pivot that preserves metadata.
See owid.catalog.tables.pivot() for full documentation.

Args:
    index: Column(s) to use for the new index. If None, uses
        existing index.
    columns: Column(s) whose unique values become new columns.
    values: Column(s) to aggregate. If None, uses all remaining
        columns.
    join_column_levels_with: If pivoting creates multi-level columns,
        join them with this separator (e.g., "_").
    short_name: Optional short name for resulting table metadata.
    fill_dimensions: If True, fill missing dimension values.
        Default is True.
    **kwargs: Additional arguments passed to pivot().

Returns:
    Pivoted Table in wide format with preserved metadata.

Example:
    ```python
    >>> # Basic pivot:
    >>> wide = table.pivot(
    ...     index="country",
    ...     columns="year",
    ...     values="gdp"
    ... )

    >>> # Flatten multi-level columns:
    >>> wide = table.pivot(
    ...     index="country",
    ...     columns=["year", "sex"],
    ...     values="population",
    ...     join_column_levels_with="_"
    ... )
    ```

#### `Table.primary_key` (property)

Get the table's primary key column names.

Returns the names of index levels, which serve as the table's
primary key for identifying unique rows.

Returns:
    List of index level names (excluding None values).

Example:
    ```python
    table = table.set_index(["country", "year"])
    print(table.primary_key)  # ["country", "year"]
    ```

#### `prune_metadata(self) -> 'Table'`

Remove metadata for columns no longer in the table.

Cleans up the internal metadata dictionary to remove entries for columns
that have been dropped. Useful after column filtering or selection operations.

Returns:
    Self, for method chaining.

Example:
    ```python
    subset = table[["country", "gdp"]]  # Only 2 columns
    subset.prune_metadata()  # Remove metadata for dropped columns
    ```

#### `read(path: 'str | Path', **kwargs: 'Any') -> 'Table'`

Read a table from disk in any supported format.

Automatically detects the format from file extension and loads
the table with its metadata. Supports .csv, .feather, and .parquet.

Args:
    path: Path to the file to read. Extension determines format.
    **kwargs: Additional arguments passed to format-specific reader.

Returns:
    Loaded Table with data and metadata.

Raises:
    ValueError: If file extension is not recognized.

Example:
    ```python
    table = Table.read("data.feather")
    table = Table.read("data.csv")
    table = Table.read("data.parquet")
    ```

#### `read_csv(path: 'str | Path', **kwargs: 'Any') -> 'Table'`

Read table from CSV file with accompanying metadata.

Loads a table from a CSV file and its associated .meta.json metadata file.
For example, reads both "data.csv" and "data.meta.json".

Args:
    path: Path to the CSV file (must end with .csv).
    **kwargs: Additional arguments passed to the internal metadata loader.

Returns:
    Table with data and metadata loaded.

Raises:
    ValueError: If path doesn't end with .csv.

Example:
    ```python
    table = Table.read_csv("data.csv")
    table = Table.read_csv(Path("data.csv"))
    ```

#### `read_feather(path: 'str | Path', load_data: 'bool' = True, **kwargs: 'Any') -> 'Table'`

Read table from Feather file with accompanying metadata.

Loads a table from a Feather file and its associated .meta.json metadata file.
Supports both local file paths and URLs.

Args:
    path: Path or URL to the Feather file (must end with .feather).
    load_data: If True, load the actual data. If False, only load metadata
        and column structure (useful for inspecting large files).
    **kwargs: Additional arguments passed to the internal metadata loader.

Returns:
    Table with data and metadata loaded.

Raises:
    ValueError: If path doesn't end with .feather.

Example:
    ```python
    table = Table.read_feather("data.feather")
    table = Table.read_feather("https://example.com/data.feather")
    metadata_only = Table.read_feather("data.feather", load_data=False)
    ```

#### `read_json(path: 'str | Path', **kwargs: 'Any') -> 'Table'`

Read the table from a JSON file plus accompanying JSON sidecar.

The path may be a local file path or a URL.

#### `read_parquet(path: 'str | Path', **kwargs: 'Any') -> 'Table'`

Read table from Parquet file with accompanying metadata.

Loads a table from a Parquet file and its associated .meta.json metadata file.
Supports both local file paths and URLs.

Args:
    path: Path or URL to the Parquet file (must end with .parquet).
    **kwargs: Additional arguments passed to the internal metadata loader.

Returns:
    Table with data and metadata loaded.

Raises:
    ValueError: If path doesn't end with .parquet.

Example:
    ```python
    table = Table.read_parquet("data.parquet")
    table = Table.read_parquet("https://example.com/data.parquet")
    ```

#### `reindex(self, *args: 'Any', **kwargs: 'Any') -> 'Table'`

Conform table to new index with optional filling logic.

Create a new Table with changed index. Missing values are filled
according to the specified method. Wrapper around pandas reindex.

Args:
    *args: Positional arguments passed to pandas.DataFrame.reindex.
    **kwargs: Keyword arguments passed to pandas.DataFrame.reindex.

Returns:
    Table conformed to new index.

Example:
    Reindex with new labels:
    ```python
    table = table.reindex(["A", "B", "C", "D"])
    ```

    Fill missing values:
    ```python
    table = table.reindex(new_index, fill_value=0)
    ```

    Forward fill:
    ```python
    table = table.reindex(new_index, method="ffill")
    ```

#### `rename(self, *args: 'Any', **kwargs: 'Any') -> 'Table | None'`

Rename columns while preserving their metadata.

Extends pandas rename to maintain variable metadata when renaming columns
or index levels. Metadata follows the renamed columns automatically.

Args:
    *args: Positional arguments passed to pandas.DataFrame.rename.
    **kwargs: Keyword arguments passed to pandas.DataFrame.rename.
        Supports all pandas rename parameters including mapper, index,
        columns, and inplace.

Returns:
    Renamed table if inplace=False (default), None if inplace=True.

Example:
    ```python
    new_table = table.rename(columns={"old_name": "new_name"})
    table.rename(columns={"gdp": "gdp_usd"}, inplace=True)
    ```

#### `rename_index_names(self, renames: 'dict[str, str]') -> 'Table'`

Rename index values names.

#### `reset_index(self, level: 'Any' = None, *, inplace: 'bool' = False, **kwargs: 'Any') -> 'Table | None'`

Reset the index to default integer index.

Extends `pandas.reset_index` with proper type signature for Table.
Converts index levels to regular columns.

Args:
    level: Index level(s) to reset. If None, resets all levels.
    inplace: If True, modify the table in place. Default is False.
    **kwargs: Additional arguments passed to pandas.DataFrame.reset_index.

Returns:
    Table with reset index if inplace=False, None if inplace=True.

Example:
    ```python
    new_table = table.reset_index()  # Reset all index levels
    new_table = table.reset_index(level="country")  # Reset one level
    table.reset_index(inplace=True)  # Modify in place
    ```

#### `rolling(self, *args: 'Any', **kwargs: 'Any') -> 'TableRolling'`

Rolling operation that preserves metadata.

#### `set_index(self, keys: 'str | list[str]', **kwargs: 'Any') -> 'Table | None'`

Set the DataFrame index using specified columns.

Extends pandas set_index to update table metadata with primary key
and dimension information. The index columns become the table's
identifying dimensions.

Args:
    keys: Column name or list of column names to set as index.
    **kwargs: Additional arguments passed to pandas.DataFrame.set_index.

Returns:
    Table with new index if inplace=False, None if inplace=True.

Example:
    ```python
    table = table.set_index("country")
    table = table.set_index(["country", "year"])
    table.set_index("country", inplace=True)
    ```

#### `to(self, path: 'str | Path', repack: 'bool' = True) -> 'None'`

Save this table to disk in a supported format.

The format is automatically detected from the file extension
(.csv, .feather, or .parquet).

Args:
    path: Output file path. Extension determines format.
    repack: If True, optimize column dtypes to reduce file size.
        Set to False for very large tables if optimization fails.

Example:
    ```python
    table.to("data.feather")  # Save as Feather with optimization
    table.to("data.csv")  # Save as CSV
    table.to("data.parquet", repack=False)  # Skip optimization
    ```

#### `to_csv(self, path: 'Any | None' = None, **kwargs: 'Any') -> 'None | str'`

Save table as CSV with accompanying metadata file.

Saves both the data as CSV and metadata as a separate JSON file.
For example, "mytable.csv" will have metadata at "mytable.meta.json".

Args:
    path: Output CSV path. If None, returns CSV as string.
    **kwargs: Additional arguments passed to pandas.DataFrame.to_csv.
        By default, includes index only if table has a primary key.

Returns:
    CSV string if path is None, otherwise None.

Example:
    ```python
    table.to_csv("data.csv")  # Saves data.csv and data.meta.json
    csv_str = table.to_csv()  # Returns CSV as string
    ```

#### `to_excel(self, excel_writer: 'Any', with_metadata: 'bool' = True, sheet_name: 'str' = 'data', metadata_sheet_name: 'str' = 'metadata', **kwargs: 'Any') -> 'None'`

Save table to Excel file with optional metadata codebook.

Exports the table data to an Excel file, optionally including a separate
sheet with the codebook metadata.

Args:
    excel_writer: File path or ExcelWriter object to save to.
    with_metadata: If True, include a metadata codebook sheet. Default is True.
    sheet_name: Name for the data sheet. Default is "data".
    metadata_sheet_name: Name for the metadata sheet. Default is "metadata".
    **kwargs: Additional arguments passed to pandas.DataFrame.to_excel.

Example:
    ```python
    table.to_excel("output.xlsx")  # With metadata
    table.to_excel("output.xlsx", with_metadata=False)  # Data only
    ```

#### `to_feather(self, path: 'Any', repack: 'bool' = True, compression: "Literal['zstd', 'lz4', 'uncompressed']" = 'zstd', **kwargs: 'Any') -> 'None'`

Save table as Feather file with accompanying metadata.

Saves the table in Apache Arrow Feather format with a separate JSON
metadata file. For example, "mytable.feather" will have metadata at
"mytable.meta.json".

Note:
    Feather format cannot store indexes, so the index is reset before
    saving and restored when reading.

Args:
    path: Output file path (must end with .feather).
    repack: If True, optimize column dtypes to reduce file size.
        Set to False for very large tables if repacking is slow.
    compression: Compression algorithm to use. Options are:
        - "zstd" (default): High compression ratio
        - "lz4": Faster compression
        - "uncompressed": No compression
    **kwargs: Additional arguments passed to pandas.DataFrame.to_feather.

Raises:
    ValueError: If path doesn't end with .feather or if index names
        overlap with column names.

Example:
    ```python
    table.to_feather("data.feather")  # With compression
    table.to_feather("data.feather", repack=False)  # Skip optimization
    table.to_feather("data.feather", compression="lz4")  # Fast compression
    ```

#### `to_json(self, path: 'Any | None' = None, **kwargs: 'Any') -> 'None | str'`

Save this table as a JSON file plus accompanying JSON metadata file.
If the table is stored at "mytable.json", the metadata will be at
"mytable.meta.json".

By default, uses orient="records" which outputs a simple array of objects
without schema information. The index is reset and included as regular columns.

#### `to_parquet(self, path: 'Any', repack: 'bool' = True) -> 'None'`

Save table as Parquet file with metadata sidecar.

Saves the table in Apache Parquet format with a separate JSON metadata file.
Parquet provides efficient columnar storage and compression.

Note:
    Metadata is stored in a separate .meta.json file rather than embedded
    in the Parquet schema to enable efficient partial reading of large files.

Args:
    path: Output file path (must end with .parquet).
    repack: If True, optimize column dtypes to reduce file size.
        Set to False for very large tables if repacking is slow.

Raises:
    ValueError: If path doesn't end with .parquet.

Example:
    ```python
    table.to_parquet("data.parquet")  # With optimization
    table.to_parquet("data.parquet", repack=False)  # Skip optimization
    ```

#### `underscore(self, collision: "Literal['raise', 'rename', 'ignore']" = 'raise', inplace: 'bool' = False, camel_to_snake: 'bool' = False) -> 'Table'`

Convert column and index names to underscore format.

Converts all column names and index names to snake_case format.
In rare cases where two columns map to the same underscored name,
the collision parameter controls the behavior.

Args:
    collision: How to handle naming collisions:
        - "raise" (default): Raise ValueError if collision occurs
        - "rename": Append numbered suffix to duplicates
        - "ignore": Keep first occurrence
    inplace: If True, modify the table in place. Default is False.
    camel_to_snake: If True, convert camelCase to snake_case.
        Default is False (only converts spaces and special chars).

Returns:
    Table with underscored names (or None if inplace=True).

Example:
    Basic underscoring
    ```python
    table = table.underscore()
    ```

    Convert camelCase
    ```python
    table = table.underscore(camel_to_snake=True)
    ```

    Handle collisions
    ```python
    table = table.underscore(collision="rename")
    ```

    Modify in place
    ```python
    table.underscore(inplace=True)
    ```

#### `update_metadata(self, **kwargs: 'Any') -> 'Table'`

Update table-level metadata fields.

Convenience method to update multiple metadata fields at once.

Args:
    **kwargs: Metadata field names and values to update.
        Must be valid TableMeta attributes.

Returns:
    Self, for method chaining.

Raises:
    AssertionError: If any field name is not a valid TableMeta attribute.

Example:
    ```python
    table.update_metadata(title="GDP Data", description="GDP by country")
    table.update_metadata(short_name="gdp_data")
    ```

#### `update_metadata_from_yaml(self, path: 'Path | str', table_name: 'str', yaml_params: 'dict[str, Any] | None' = None, extra_variables: "Literal['raise', 'ignore']" = 'raise', if_origins_exist: 'SOURCE_EXISTS_OPTIONS' = 'replace') -> 'None'`

Update table and variable metadata from a YAML file.

Loads metadata definitions from a .meta.yml file and updates both
table-level and variable-level metadata. This is the primary way
to add rich metadata in the ETL workflow.

Args:
    path: Path to the .meta.yml file with metadata definitions.
    table_name: Name of the table in the YAML file to load metadata from.
        Also updates the table's short_name to this value.
    yaml_params: Additional parameters to pass to the YAML loader.
    extra_variables: How to handle variables in YAML not in table:
        - "raise" (default): Raise exception
        - "ignore": Skip extra variables
    if_origins_exist: How to handle existing origins:
        - "replace" (default): Replace existing origin with new one
        - "append": Append new origin to existing origins
        - "fail": Raise exception if origin already exists

Example:
    ```python
    >>> table.update_metadata_from_yaml("dataset.meta.yml", "population")
    >>> table.update_metadata_from_yaml(
    ...     Path("dataset.meta.yml"),
    ...     "gdp_data",
    ...     extra_variables="ignore"
    ... )
    ```


### `merge(left: 'Table | pd.DataFrame', right: 'Table | pd.DataFrame', how: 'str' = 'inner', on: 'str | list[str] | None' = None, left_on: 'str | list[str] | None' = None, right_on: 'str | list[str] | None' = None, suffixes: 'tuple[str, str]' = ('_x', '_y'), short_name: 'str | None' = None, **kwargs: 'Any') -> 'Table'`


### `concat(objs: 'list[Table]', *, axis: 'int | str' = 0, join: 'str' = 'outer', ignore_index: 'bool' = False, short_name: 'str | None' = None, **kwargs: 'Any') -> 'Table'`


### `melt(frame: 'Table', id_vars: 'tuple[str] | list[str] | str | None' = None, value_vars: 'tuple[str] | list[str] | str | None' = None, var_name: 'str' = 'variable', value_name: 'str' = 'value', short_name: 'str | None' = None, *args: 'Any', **kwargs: 'Any') -> 'Table'`


### `pivot(data: 'Table', *, index: 'str | list[str] | None' = None, columns: 'str | list[str] | None' = None, values: 'str | list[str] | None' = None, join_column_levels_with: 'str | None' = None, short_name: 'str | None' = None, fill_dimensions: 'bool' = True, **kwargs: 'Any') -> 'Table'`


### `read_csv(filepath_or_buffer: 'str | Path | IO[AnyStr]', metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False, *args: 'Any', **kwargs: 'Any') -> 'Table'`


### `read_feather(filepath: 'str | Path | IO[AnyStr]', metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False, *args: 'Any', **kwargs: 'Any') -> 'Table'`


### `read_excel(io: 'str | Path | IO[AnyStr]', *args: 'Any', metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False, **kwargs: 'Any') -> 'Table'`


### `read_parquet(filepath_or_buffer: 'str | Path | IO[AnyStr]', metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False, *args: 'Any', **kwargs: 'Any') -> 'Table'`


### `read_from_df(data: 'pd.DataFrame', metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False) -> 'Table'`


### `read_from_dict(data: 'dict[Any, Any]', *args: 'Any', metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False, **kwargs: 'Any') -> 'Table'`


### `multi_merge(tables: 'list[Table]', *args: 'Any', **kwargs: 'Any') -> 'Table'`

Merge multiple tables.

This is a helper function when merging more than two tables on common columns.

Args:
    tables: Tables to merge.

Returns:
    combined: Merged table.


### `keep_metadata(func: 'Callable[..., pd.DataFrame | pd.Series]') -> 'Callable[..., Table | indicators.Indicator]'`

Decorator that turns a function that works on DataFrame or Series into a function that works
on Table or Variable and preserves metadata.  If the decorated function renames columns, their
metadata won't be copied.

Example:
    ```python
    import owid.catalog.processing as pr

    @pr.keep_metadata
    def my_df_func(df: pd.DataFrame) -> pd.DataFrame:
        return df + 1

    tb = my_df_func(tb)


    @pr.keep_metadata
    def my_series_func(s: pd.Series) -> pd.Series:
        return s + 1

    tb.a = my_series_func(tb.a)
    ```


### `copy_metadata(from_table: 'Table', to_table: 'Table', deep: 'bool' = False) -> 'Table'`

Copy metadata from a different table to self.


### ExcelFile

Class for parsing tabular Excel sheets into DataFrame objects.

See read_excel for more documentation.

Parameters
----------
path_or_buffer : str, bytes, path object (pathlib.Path or py._path.local.LocalPath),
    A file-like object, xlrd workbook or openpyxl workbook.
    If a string or path object, expected to be a path to a
    .xls, .xlsx, .xlsb, .xlsm, .odf, .ods, or .odt file.
engine : str, default None
    If io is not a buffer or path, this must be set to identify io.
    Supported engines: ``xlrd``, ``openpyxl``, ``odf``, ``pyxlsb``, ``calamine``
    Engine compatibility :

    - ``xlrd`` supports old-style Excel files (.xls).
    - ``openpyxl`` supports newer Excel file formats.
    - ``odf`` supports OpenDocument file formats (.odf, .ods, .odt).
    - ``pyxlsb`` supports Binary Excel files.
    - ``calamine`` supports Excel (.xls, .xlsx, .xlsm, .xlsb)
      and OpenDocument (.ods) file formats.

    .. versionchanged:: 1.2.0

       The engine `xlrd <https://xlrd.readthedocs.io/en/latest/>`_
       now only supports old-style ``.xls`` files.
       When ``engine=None``, the following logic will be
       used to determine the engine:

       - If ``path_or_buffer`` is an OpenDocument format (.odf, .ods, .odt),
         then `odf <https://pypi.org/project/odfpy/>`_ will be used.
       - Otherwise if ``path_or_buffer`` is an xls format,
         ``xlrd`` will be used.
       - Otherwise if ``path_or_buffer`` is in xlsb format,
         `pyxlsb <https://pypi.org/project/pyxlsb/>`_ will be used.

       .. versionadded:: 1.3.0

       - Otherwise if `openpyxl <https://pypi.org/project/openpyxl/>`_ is installed,
         then ``openpyxl`` will be used.
       - Otherwise if ``xlrd >= 2.0`` is installed, a ``ValueError`` will be raised.

       .. warning::

        Please do not report issues when using ``xlrd`` to read ``.xlsx`` files.
        This is not supported, switch to using ``openpyxl`` instead.
engine_kwargs : dict, optional
    Arbitrary keyword arguments passed to excel engine.

Examples
--------
>>> file = pd.ExcelFile('myfile.xlsx')  # doctest: +SKIP
>>> with pd.ExcelFile("myfile.xls") as xls:  # doctest: +SKIP
...     df1 = pd.read_excel(xls, "Sheet1")  # doctest: +SKIP


## indicators

### Indicator

`Indicator` extends `pandas.Series`. All standard Series methods are available. Only methods unique to this class are listed below.


Enhanced pandas Series with indicator-level metadata support.

Indicator is a pandas Series subclass that stores rich metadata about individual
indicators. It serves as the column type in Table objects and automatically
propagates metadata through operations.

Note:
    This class was formerly called `Variable`. The old name is still available
    as an alias for backwards compatibility.

Key features:

- Automatic metadata propagation through arithmetic operations
- Processing log tracking for data provenance
- Integration with OWID catalog metadata system
- Support for rich metadata including sources, origins, licenses

Attributes:
    _name: Internal name storage for metadata mapping.
    _fields: Dictionary mapping indicator names to their VariableMeta objects.
    metadata: Indicator-level metadata accessible via `.metadata` or `.m` property.

Example:
    Create an indicator with metadata:

    ```python
    from owid.catalog import Indicator, VariableMeta

    ind = Indicator(
        [1, 2, 3],
        name="gdp",
        metadata=VariableMeta(
            title="GDP",
            unit="trillion USD",
            description="Gross Domestic Product"
        )
    )
    ```

    Access metadata using shortcuts:

    ```python
    print(ind.metadata.title)  # Full property access
    print(ind.m.title)         # Shorthand alias
    print(ind.title)           # Direct property access
    ```

    Metadata propagates through operations:

    ```python
    gdp_per_capita = ind / population
    # Result combines metadata from both indicators
    ```

#### `Indicator.additional_info` (property)

#### `Indicator.checked_name` (property)

#### `copy_metadata(self, from_variable: 'Indicator', inplace: 'bool' = False) -> 'Indicator | None'`

Copy metadata from another indicator.

Args:
    from_variable: Source indicator to copy metadata from.
    inplace: If True, modifies the current indicator. If False, returns a new indicator.

Returns:
    New indicator with copied metadata if `inplace=False`, otherwise None.

Example:
    Create new indicator with copied metadata
    ```python
    new_ind = ind1.copy_metadata(from_variable=ind2)
    ```

    Copy metadata in-place
    ```python
    ind1.copy_metadata(from_variable=ind2, inplace=True)
    ```

#### `Indicator.description` (property)

#### `Indicator.description_from_producer` (property)

#### `Indicator.description_key` (property)

#### `Indicator.description_processing` (property)

#### `Indicator.description_short` (property)

#### `Indicator.dimensions` (property)

#### `Indicator.display` (property)

#### `Indicator.license` (property)

#### `Indicator.licenses` (property)

#### `Indicator.m` (property)

Metadata alias for shorter access.

Provides convenient shorthand access to indicator metadata.

Returns:
    The indicator's VariableMeta object.

Example:
    ```python
    # These are equivalent:
    ind.metadata.title
    ind.m.title
    ind.title  # Direct property access
    ```

#### `Indicator.metadata` (property)

#### `Indicator.original_short_name` (property)

#### `Indicator.original_title` (property)

#### `Indicator.origins` (property)

#### `Indicator.presentation` (property)

#### `Indicator.processing_level` (property)

#### `rolling(self, *args: 'Any', **kwargs: 'Any') -> 'IndicatorRolling'`

Create a rolling window operation that preserves metadata.

This method wraps pandas rolling operations while maintaining the indicator's metadata.

Args:
    *args: Arguments passed to `pandas.Series.rolling`.
    **kwargs: Keyword arguments passed to `pandas.Series.rolling`.

Returns:
    IndicatorRolling object that applies operations while preserving metadata.

Example:
    Calculate 7-day rolling average
    ```python
    rolling_avg = ind.rolling(window=7).mean()
    ```

    The result retains the original indicator's metadata
    ```python
    assert rolling_avg.metadata.title == ind.metadata.title
    ```

#### `set_categories(self, *args: 'Any', **kwargs: 'Any') -> 'Indicator'`

#### `Indicator.short_unit` (property)

#### `Indicator.sort` (property)

#### `Indicator.sources` (property)

#### `Indicator.title` (property)

#### `to_frame(self, name: 'str | None' = None) -> 'Table'`

Convert Indicator to a Table (single-column table).

When a new name is given, the indicator's metadata is copied to the renamed column
so that origins are not lost.

#### `Indicator.type` (property)

#### `Indicator.unit` (property)


### Indicator

`Indicator` extends `pandas.Series`. All standard Series methods are available. Only methods unique to this class are listed below.


Enhanced pandas Series with indicator-level metadata support.

Indicator is a pandas Series subclass that stores rich metadata about individual
indicators. It serves as the column type in Table objects and automatically
propagates metadata through operations.

Note:
    This class was formerly called `Variable`. The old name is still available
    as an alias for backwards compatibility.

Key features:

- Automatic metadata propagation through arithmetic operations
- Processing log tracking for data provenance
- Integration with OWID catalog metadata system
- Support for rich metadata including sources, origins, licenses

Attributes:
    _name: Internal name storage for metadata mapping.
    _fields: Dictionary mapping indicator names to their VariableMeta objects.
    metadata: Indicator-level metadata accessible via `.metadata` or `.m` property.

Example:
    Create an indicator with metadata:

    ```python
    from owid.catalog import Indicator, VariableMeta

    ind = Indicator(
        [1, 2, 3],
        name="gdp",
        metadata=VariableMeta(
            title="GDP",
            unit="trillion USD",
            description="Gross Domestic Product"
        )
    )
    ```

    Access metadata using shortcuts:

    ```python
    print(ind.metadata.title)  # Full property access
    print(ind.m.title)         # Shorthand alias
    print(ind.title)           # Direct property access
    ```

    Metadata propagates through operations:

    ```python
    gdp_per_capita = ind / population
    # Result combines metadata from both indicators
    ```

#### `Indicator.additional_info` (property)

#### `Indicator.checked_name` (property)

#### `copy_metadata(self, from_variable: 'Indicator', inplace: 'bool' = False) -> 'Indicator | None'`

Copy metadata from another indicator.

Args:
    from_variable: Source indicator to copy metadata from.
    inplace: If True, modifies the current indicator. If False, returns a new indicator.

Returns:
    New indicator with copied metadata if `inplace=False`, otherwise None.

Example:
    Create new indicator with copied metadata
    ```python
    new_ind = ind1.copy_metadata(from_variable=ind2)
    ```

    Copy metadata in-place
    ```python
    ind1.copy_metadata(from_variable=ind2, inplace=True)
    ```

#### `Indicator.description` (property)

#### `Indicator.description_from_producer` (property)

#### `Indicator.description_key` (property)

#### `Indicator.description_processing` (property)

#### `Indicator.description_short` (property)

#### `Indicator.dimensions` (property)

#### `Indicator.display` (property)

#### `Indicator.license` (property)

#### `Indicator.licenses` (property)

#### `Indicator.m` (property)

Metadata alias for shorter access.

Provides convenient shorthand access to indicator metadata.

Returns:
    The indicator's VariableMeta object.

Example:
    ```python
    # These are equivalent:
    ind.metadata.title
    ind.m.title
    ind.title  # Direct property access
    ```

#### `Indicator.metadata` (property)

#### `Indicator.original_short_name` (property)

#### `Indicator.original_title` (property)

#### `Indicator.origins` (property)

#### `Indicator.presentation` (property)

#### `Indicator.processing_level` (property)

#### `rolling(self, *args: 'Any', **kwargs: 'Any') -> 'IndicatorRolling'`

Create a rolling window operation that preserves metadata.

This method wraps pandas rolling operations while maintaining the indicator's metadata.

Args:
    *args: Arguments passed to `pandas.Series.rolling`.
    **kwargs: Keyword arguments passed to `pandas.Series.rolling`.

Returns:
    IndicatorRolling object that applies operations while preserving metadata.

Example:
    Calculate 7-day rolling average
    ```python
    rolling_avg = ind.rolling(window=7).mean()
    ```

    The result retains the original indicator's metadata
    ```python
    assert rolling_avg.metadata.title == ind.metadata.title
    ```

#### `set_categories(self, *args: 'Any', **kwargs: 'Any') -> 'Indicator'`

#### `Indicator.short_unit` (property)

#### `Indicator.sort` (property)

#### `Indicator.sources` (property)

#### `Indicator.title` (property)

#### `to_frame(self, name: 'str | None' = None) -> 'Table'`

Convert Indicator to a Table (single-column table).

When a new name is given, the indicator's metadata is copied to the renamed column
so that origins are not lost.

#### `Indicator.type` (property)

#### `Indicator.unit` (property)


### `copy_metadata(from_variable: 'Indicator', to_variable: 'Indicator', inplace: 'bool' = False) -> 'Indicator | None'`

Copy metadata from one indicator to another.

Args:
    from_variable: Source indicator to copy metadata from.
    to_variable: Target indicator to copy metadata to.
    inplace: If True, modifies `to_variable` in place. If False, returns a new indicator.

Returns:
    New indicator with copied metadata if `inplace=False`, otherwise None.

Example:
    Create new indicator with copied metadata
    ```python
    new_ind = copy_metadata(from_variable=source, to_variable=target)
    ```

    Copy metadata in-place
    ```python
    copy_metadata(from_variable=source, to_variable=target, inplace=True)
    ```


## datasets

### Dataset

A dataset is a folder containing data tables with metadata.

A Dataset represents a collection of related data tables stored in a directory.
Each dataset has an `index.json` file containing metadata about the dataset
and references to its tables.

Attributes:
    path: Path to the dataset directory.
    metadata: Dataset-level metadata (title, description, sources, etc).

Example:
    Load an existing dataset:

    ```python
    >>> ds = Dataset("data://garden/demography/2023-03-31/population")
    >>> table = ds["population"]
    ```

    Create a new dataset:
    ```python
    >>> ds = Dataset.create_empty("path/to/dataset")
    >>> ds.add(table)
    >>> ds.save()
    ```

#### `add(self, table: 'tables.Table', formats: 'list[FileFormat]' = ['feather'], repack: 'bool' = True) -> 'None'`

Add a table to this dataset.

Saves the table to the dataset's directory in the specified format(s).
By default, saves in multiple formats for compatibility.

Args:
    table: The table to add to the dataset.
    formats: List of file formats to save (feather, parquet, csv).
        Defaults to DEFAULT_FORMATS (usually ["feather"]).
    repack: If True, optimize column dtypes to reduce file size
        (e.g. float64 -> float32). Set to False for very large dataframes
        if repacking fails or is too slow.

Raises:
    PrimaryKeyMissing: If table has no primary key and OWID_STRICT is set.
    NonUniqueIndex: If table index has duplicates and OWID_STRICT is set.

Example:
    ```python
    >>> ds.add(table)  # Save in default format
    >>> ds.add(table, formats=["csv"])  # Save only as CSV
    >>> ds.add(table, repack=False)  # Skip optimization
    ```

#### `Dataset.additional_info` (property)

#### `Dataset.channel` (property)

#### `checksum(self) -> 'str'`

Calculate MD5 checksum of all data and metadata in the dataset.

Generates a checksum that includes the dataset's index file and all
data files. Useful for detecting changes to the dataset.

Returns:
    MD5 checksum as a hexadecimal string.

Example:
    ```python
    >>> checksum = ds.checksum()
    >>> print(f"Dataset checksum: {checksum}")
    ```

#### `create_empty(path: 'str | Path', metadata: 'DatasetMeta | None' = None) -> 'Dataset'`

#### `Dataset.description` (property)

#### `index(self, catalog_path: 'Path' = PosixPath('/')) -> 'pd.DataFrame'`

Generate an index DataFrame describing all tables in this dataset.

Creates a summary DataFrame with one row per table, including metadata
like namespace, version, checksum, dimensions, and file paths.

Args:
    catalog_path: Base path for calculating relative paths. Defaults to "/".

Returns:
    DataFrame with columns: namespace, dataset, version, table, checksum, is_public,
    title, description, dimensions, path, channel, and formats.

Example:
    ```python
    >>> index = ds.index()
    >>> print(index[["table", "dimensions", "checksum"]])
    ```

#### `Dataset.is_public` (property)

#### `Dataset.licenses` (property)

#### `Dataset.m` (property)

Metadata alias for shorter access (ds.m instead of ds.metadata).

#### `Dataset.namespace` (property)

#### `Dataset.non_redistributable` (property)

#### `read(self, name: 'str | None' = None, reset_index: 'bool' = True, safe_types: 'bool' = True, reset_metadata: "Literal['keep', 'keep_origins', 'reset']" = 'keep', load_data: 'bool' = True) -> 'tables.Table'`

Read a table from the dataset with performance options.

This is an alternative to `ds[table_name]` with more control over
loading behavior for performance optimization.

Args:
    name: Name of the table to read. If None and dataset has only one
        table, reads that table automatically.
    reset_index: If True, don't set primary keys. This can make loading
        large multi-index datasets much faster. Default is True.
    safe_types: If True, convert numeric columns to nullable types
        (Float64, Int64) and categorical to string[pyarrow]. This increases
        memory usage but prevents type issues. Default is True.
    reset_metadata: Controls variable metadata reset behavior:
        - "keep": Leave metadata unchanged (default)
        - "keep_origins": Reset metadata but retain origins attribute
        - "reset": Reset all variable metadata
    load_data: If False, only load metadata without actual data. Useful
        when you only need to inspect metadata. Default is True.

Returns:
    The loaded table with data and metadata.

Raises:
    ValueError: If name is None but dataset contains multiple tables.
    KeyError: If the specified table name doesn't exist.

Example:
    Read single table with safe defaults
    ```python
    table = ds.read()
    ```

    Keep index
    ```python
    >>> table = ds.read("population", reset_index=False)
    ```

    Faster, less memory
    ```python
    >>> table = ds.read("large_table", safe_types=False)
    ```

    Only metadata
    ```python
    >>> meta_only = ds.read(load_data=False)
    ```

#### `save(self) -> 'None'`

#### `Dataset.short_name` (property)

#### `Dataset.source_checksum` (property)

#### `Dataset.sources` (property)

#### `Dataset.table_names` (property)

#### `Dataset.title` (property)

#### `update_metadata(self, metadata_path: 'Path', yaml_params: 'dict[str, Any] | None' = None, if_source_exists: 'SOURCE_EXISTS_OPTIONS' = 'replace', if_origins_exist: 'SOURCE_EXISTS_OPTIONS' = 'replace', errors: "Literal['ignore', 'warn', 'raise']" = 'raise', extra_variables: "Literal['raise', 'ignore']" = 'raise') -> 'None'`

Update dataset and table metadata from a YAML file.

Loads metadata from a .meta.yml file and updates the dataset's metadata
and all referenced tables. This is the primary way to add rich metadata
to datasets in the ETL workflow.

Args:
    metadata_path: Path to the .meta.yml file with metadata definitions.
        See existing metadata files for examples of the expected structure.
    yaml_params: Additional parameters to pass to the YAML loader.
    if_source_exists: How to handle existing sources:
        - "replace" (default): Replace existing source with new one
        - "append": Append new source to existing sources
        - "fail": Raise exception if source already exists
    if_origins_exist: How to handle existing origins:
        - "replace" (default): Replace existing origin with new one
        - "append": Append new origin to existing origins
        - "fail": Raise exception if origin already exists
    errors: How to handle errors during update:
        - "raise" (default): Raise exception on errors
        - "warn": Issue warning but continue processing
        - "ignore": Silently ignore errors
    extra_variables: How to handle variables in metadata not in dataset:
        - "raise" (default): Raise exception
        - "ignore": Skip extra variables

Example:
    ```python
    >>> ds.update_metadata(Path("dataset.meta.yml"))
    >>> ds.update_metadata(
    ...     Path("dataset.meta.yml"),
    ...     if_origins_exist="append",
    ...     errors="warn"
    ... )
    ```

#### `Dataset.update_period_days` (property)

#### `Dataset.version` (property)


### `Literal(*args, **kwargs)`


### `Literal(*args, **kwargs)`


## meta

### MetaBase

Base class for all metadata objects in the catalog.

Provides common functionality for metadata serialization, hashing, comparison,
and persistence. All metadata classes (DatasetMeta, TableMeta, VariableMeta, etc.)
inherit from this base class.

Key features:

- JSON serialization/deserialization
- Deterministic hashing for deduplication
- Deep copying support
- File persistence (save/load)
- Dictionary conversion

Example:
    ```python
    from owid.catalog import DatasetMeta

    # Create metadata
    meta = DatasetMeta(title="GDP Data", short_name="gdp")

    # Save to file
    meta.save("metadata.json")

    # Load from file
    loaded = DatasetMeta.load("metadata.json")

    # Convert to dictionary
    d = meta.to_dict()

    # Create deep copy
    copy = meta.copy(deep=True)
    ```

#### `copy(self, deep: bool = True) -> Self`

Create a copy of the metadata object.

Args:
    deep: If True, creates a deep copy (copies nested objects).
        If False, creates a shallow copy.

Returns:
    Copy of the metadata object.

Example:
    ```python
    original = DatasetMeta(title="GDP")
    copy = original.copy(deep=True)
    copy.title = "Population"  # Doesn't affect original
    ```

#### `from_dict(d: dict[str, typing.Any]) -> ~T`

Create metadata object from dictionary.

Args:
    d: Dictionary with metadata fields.

Returns:
    New metadata object of the appropriate type.

Example:
    ```python
    d = {"title": "GDP", "short_name": "gdp"}
    meta = DatasetMeta.from_dict(d)
    ```

Note:
    This uses a custom implementation that's significantly faster than
    the default dataclasses_json method.

#### `load(filename: str) -> Self`

Load metadata from a JSON file.

Args:
    filename: Path to the JSON file containing metadata.

Returns:
    Metadata object loaded from the file.

Example:
    ```python
    meta = DatasetMeta.load("dataset_meta.json")
    print(meta.title)
    ```

#### `save(self, filename: str | pathlib._local.Path) -> None`

Save metadata to a JSON file.

Args:
    filename: Path where the metadata should be saved.

Example:
    ```python
    meta = DatasetMeta(title="GDP")
    meta.save("dataset_meta.json")
    ```

#### `to_dict(self, encode_json: bool = False) -> dict[str, typing.Any]`

Convert metadata object to dictionary.

Args:
    encode_json: If True, encodes values for JSON serialization.

Returns:
    Dictionary representation of the metadata.

Example:
    ```python
    meta = DatasetMeta(title="GDP", short_name="gdp")
    d = meta.to_dict()
    print(d["title"])  # "GDP"
    ```

#### `update(self, **kwargs: dict[str, typing.Any]) -> None`

Update metadata fields with new values.

Args:
    **kwargs: Field names and their new values. None values are ignored.

Example:
    ```python
    meta = DatasetMeta(title="GDP")
    meta.update(title="GDP Data", description="Annual GDP figures")
    ```


### License

License information for data products.

Stores licensing details for datasets and variables, including the license
name and URL to the full license text.

Attributes:
    name: License name (e.g., "CC BY 4.0", "MIT", "Public Domain").
    url: URL to the full license text or information page.

Example:
    ```python
    from owid.catalog import License

    # Creative Commons license
    license = License(
        name="CC BY 4.0",
        url="https://creativecommons.org/licenses/by/4.0/"
    )

    # Check if license is defined
    if license:
        print(f"Licensed under: {license.name}")
    ```


### Source

Legacy source metadata for datasets.

Warning:
    **DEPRECATED**: Use `Origin` instead for new datasets. This class is
    maintained for backward compatibility only.

Source contains metadata about the origin of data in legacy format. Modern
datasets should use the `Origin` class which provides more comprehensive
metadata fields.

Attributes:
    name: Source name or identifier.
    description: Description of the source.
    url: URL to the source's main page.
    source_data_url: Direct URL to download the data.
    owid_data_url: OWID-hosted URL for the data.
    date_accessed: Date when the source was accessed (ISO format).
    publication_date: Date when the source was published.
    publication_year: Year of publication.
    published_by: Publisher or institution name (used in Grapher).

    Example:
        ```python
        # Legacy usage (prefer Origin for new code)
        source = Source(
            name="World Bank",
            published_by="World Bank Group",
            url="https://data.worldbank.org"
        )
        ```

Note:
    In Grapher admin, only the first source of a dataset is visible and editable.
    The most important fields for Grapher are `published_by` and `description`.


### Origin

Comprehensive metadata about the origin of a data product.

Origin provides detailed provenance information for datasets, including
producer details, citations, URLs, publication dates, and licensing. This is
the modern replacement for the legacy `Source` class.

Attributes:
    producer: Name of the institution or author(s) that produced the data
        (e.g., "World Bank", "United Nations").
    title: Title of the original data product.
    description: Description of the data product and its methodology.
    title_snapshot: Title of the specific data subset extracted from the product.
        Only use if different from `title`.
    description_snapshot: Description of the snapshot subset. Use when the
        snapshot differs from the full data product.
    citation_full: Complete citation for the data product in academic format.
    attribution: Name to use for attribution (e.g., "V-Dem Institute" instead
        of individual authors). Defaults to `producer` if not provided.
    attribution_short: Short form of attribution for space-constrained contexts.
    version_producer: Version number or identifier from the data producer
        (e.g., "v12", "2023.1").
    url_main: Authoritative URL for the dataset's main page.
    url_download: Direct URL to download the dataset.
    date_accessed: ISO-format date when the dataset was accessed (YYYY-MM-DD).
    date_published: Publication date (YYYY-MM-DD), year (YYYY), or "latest"
        for continuously updated datasets.
    license: License information for the data product.

Example:
    ```python
    from owid.catalog import Origin, License

    # Comprehensive origin metadata
    origin = Origin(
        producer="World Bank",
        title="World Development Indicators",
        description="Annual indicators of development",
        attribution_short="World Bank",
        version_producer="2024",
        url_main="https://datatopics.worldbank.org/world-development-indicators/",
        url_download="https://databank.worldbank.org/data/download/WDI_CSV.zip",
        date_accessed="2024-01-15",
        date_published="2024",
        license=License(
            name="CC BY 4.0",
            url="https://creativecommons.org/licenses/by/4.0/"
        )
    )

    # Minimal origin (only required fields)
    origin_minimal = Origin(
        producer="UN",
        title="Population Data"
    )
    ```

Raises:
    ValueError: If `date_published` is not a valid year, date, or "latest".


### FaqLink

FaqLink(gdoc_id: str, fragment_id: str)


### VariablePresentationMeta

VariablePresentationMeta(grapher_config: dict[str, typing.Any] | None = None, title_public: str | None = None, title_variant: str | None = None, attribution_short: str | None = None, attribution: str | None = None, topic_tags: list[str] = <factory>, faqs: list[owid.catalog.core.meta.FaqLink] = <factory>)


### VariableMeta

Allowed fields for `display` attribute used for grapher:
    name
    zeroDay
    yearIsDay
    includeInTable
    numDecimalPlaces
    conversionFactor
    entityAnnotationsMap
Fields `unit` and `shortUnit` are copied from attributes `unit` and `short_unit`
on VariableMeta object

NOTE: consider using its own object for `display` instead of dict and also possibly
underscoring fields and converting them back to camelCase before inserting to grapher

#### `render(self, dim_dict: dict[str, typing.Any], remove_dods: bool = False) -> 'VariableMeta'`

Render Jinja in all fields of VariableMeta. Return a new VariableMeta object.

:param dim_dict: dictionary of dimensions to render
:param remove_dods: remove references to details on demand from a text

Usage:
    from owid.catalog import Dataset
    from etl import paths

    ds = Dataset(paths.DATA_DIR / "garden/emissions/2025-02-12/ceds_air_pollutants")
    tb = ds['ceds_air_pollutants']
    tb.emissions.m.render({'pollutant': 'CO', 'sector': 'Transport'})

#### `VariableMeta.schema_version` (property)

Schema version is used to easily understand everywhere what metadata standard was used
for authoring this variable metadata. Defaults to 1 for our legacy variables. "Modern" variables
that fill in the presentation key and use origins should record 2 here.


### DatasetMeta

The metadata for this entire dataset kept in JSON (e.g. mydataset/index.json).

The number of fields is limited, but should handle everything that we get from
Snapshot. There is a lot more opportunity to store more metadata at the table and
the variable level.

#### `update_from_yaml(self, path: pathlib._local.Path | str, if_source_exists: Literal['fail', 'append', 'replace'] = 'fail') -> None`

The main reason for wanting to do this is to manually override what goes into Grapher before an export.

#### `DatasetMeta.uri` (property)

Return unique URI for this dataset if


### TableDimension

dict() -> new empty dictionary
dict(mapping) -> new dictionary initialized from a mapping object's
    (key, value) pairs
dict(iterable) -> new dictionary initialized as if via:
    d = {}
    for k, v in iterable:
        d[k] = v
dict(**kwargs) -> new dictionary initialized with the name=value pairs
    in the keyword argument list.  For example:  dict(one=1, two=2)


### TableMeta

TableMeta(short_name: str | None = None, title: str | None = None, description: str | None = None, dataset: owid.catalog.core.meta.DatasetMeta | None = None, primary_key: list[str] = <factory>, dimensions: list[owid.catalog.core.meta.TableDimension] | None = None)

#### `TableMeta.checked_name` (property)

#### `TableMeta.uri` (property)

Return unique URI for this table.


### `to_html(record: Any) -> str | None`


## processing

Common operations performed on tables and variables.

### ExcelFile

Class for parsing tabular Excel sheets into DataFrame objects.

See read_excel for more documentation.

Parameters
----------
path_or_buffer : str, bytes, path object (pathlib.Path or py._path.local.LocalPath),
    A file-like object, xlrd workbook or openpyxl workbook.
    If a string or path object, expected to be a path to a
    .xls, .xlsx, .xlsb, .xlsm, .odf, .ods, or .odt file.
engine : str, default None
    If io is not a buffer or path, this must be set to identify io.
    Supported engines: ``xlrd``, ``openpyxl``, ``odf``, ``pyxlsb``, ``calamine``
    Engine compatibility :

    - ``xlrd`` supports old-style Excel files (.xls).
    - ``openpyxl`` supports newer Excel file formats.
    - ``odf`` supports OpenDocument file formats (.odf, .ods, .odt).
    - ``pyxlsb`` supports Binary Excel files.
    - ``calamine`` supports Excel (.xls, .xlsx, .xlsm, .xlsb)
      and OpenDocument (.ods) file formats.

    .. versionchanged:: 1.2.0

       The engine `xlrd <https://xlrd.readthedocs.io/en/latest/>`_
       now only supports old-style ``.xls`` files.
       When ``engine=None``, the following logic will be
       used to determine the engine:

       - If ``path_or_buffer`` is an OpenDocument format (.odf, .ods, .odt),
         then `odf <https://pypi.org/project/odfpy/>`_ will be used.
       - Otherwise if ``path_or_buffer`` is an xls format,
         ``xlrd`` will be used.
       - Otherwise if ``path_or_buffer`` is in xlsb format,
         `pyxlsb <https://pypi.org/project/pyxlsb/>`_ will be used.

       .. versionadded:: 1.3.0

       - Otherwise if `openpyxl <https://pypi.org/project/openpyxl/>`_ is installed,
         then ``openpyxl`` will be used.
       - Otherwise if ``xlrd >= 2.0`` is installed, a ``ValueError`` will be raised.

       .. warning::

        Please do not report issues when using ``xlrd`` to read ``.xlsx`` files.
        This is not supported, switch to using ``openpyxl`` instead.
engine_kwargs : dict, optional
    Arbitrary keyword arguments passed to excel engine.

Examples
--------
>>> file = pd.ExcelFile('myfile.xlsx')  # doctest: +SKIP
>>> with pd.ExcelFile("myfile.xls") as xls:  # doctest: +SKIP
...     df1 = pd.read_excel(xls, "Sheet1")  # doctest: +SKIP


### `concat(objs: 'list[Table]', *, axis: 'int | str' = 0, join: 'str' = 'outer', ignore_index: 'bool' = False, short_name: 'str | None' = None, **kwargs: 'Any') -> 'Table'`


### `ignore_warnings(ignore_warnings: collections.abc.Iterable[type] = (<class 'Warning'>,))`

Ignore warnings. You can pass a list of specific warnings to ignore like MetadataWarning or StepWarning.

Usage:
    with ignore_warnings():
        ds_garden = create_dataset(...)


### `keep_metadata(func: 'Callable[..., pd.DataFrame | pd.Series]') -> 'Callable[..., Table | indicators.Indicator]'`

Decorator that turns a function that works on DataFrame or Series into a function that works
on Table or Variable and preserves metadata.  If the decorated function renames columns, their
metadata won't be copied.

Example:
    ```python
    import owid.catalog.processing as pr

    @pr.keep_metadata
    def my_df_func(df: pd.DataFrame) -> pd.DataFrame:
        return df + 1

    tb = my_df_func(tb)


    @pr.keep_metadata
    def my_series_func(s: pd.Series) -> pd.Series:
        return s + 1

    tb.a = my_series_func(tb.a)
    ```


### `melt(frame: 'Table', id_vars: 'tuple[str] | list[str] | str | None' = None, value_vars: 'tuple[str] | list[str] | str | None' = None, var_name: 'str' = 'variable', value_name: 'str' = 'value', short_name: 'str | None' = None, *args: 'Any', **kwargs: 'Any') -> 'Table'`


### `merge(left: 'Table | pd.DataFrame', right: 'Table | pd.DataFrame', how: 'str' = 'inner', on: 'str | list[str] | None' = None, left_on: 'str | list[str] | None' = None, right_on: 'str | list[str] | None' = None, suffixes: 'tuple[str, str]' = ('_x', '_y'), short_name: 'str | None' = None, **kwargs: 'Any') -> 'Table'`


### `multi_merge(tables: 'list[Table]', *args: 'Any', **kwargs: 'Any') -> 'Table'`

Merge multiple tables.

This is a helper function when merging more than two tables on common columns.

Args:
    tables: Tables to merge.

Returns:
    combined: Merged table.


### `pivot(data: 'Table', *, index: 'str | list[str] | None' = None, columns: 'str | list[str] | None' = None, values: 'str | list[str] | None' = None, join_column_levels_with: 'str | None' = None, short_name: 'str | None' = None, fill_dimensions: 'bool' = True, **kwargs: 'Any') -> 'Table'`


### `read(filepath_or_buffer: 'str | Path | IO[AnyStr]', *args: 'Any', file_extension: 'str | None' = None, metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False, **kwargs: 'Any') -> 'Table'`

Read a file based on extension, dispatching to the appropriate reader.

Args:
    filepath_or_buffer: Path to the file or file-like object to read.
    *args: Additional positional arguments passed to the format-specific reader.
    file_extension: File extension (without dot). If None, inferred from filepath.
    metadata: Table metadata.
    origin: Origin of the table data.
    underscore: True to make all column names snake case.
    **kwargs: Additional keyword arguments passed to the format-specific reader.

Returns:
    Table with data and metadata.

Note:
    For reading ZIP files, use Snapshot.extracted() context manager instead.
    See etl/snapshot.py for the recommended approach to handling archives.


### `read_csv(filepath_or_buffer: 'str | Path | IO[AnyStr]', metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False, *args: 'Any', **kwargs: 'Any') -> 'Table'`


### `read_feather(filepath: 'str | Path | IO[AnyStr]', metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False, *args: 'Any', **kwargs: 'Any') -> 'Table'`


### `read_excel(io: 'str | Path | IO[AnyStr]', *args: 'Any', metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False, **kwargs: 'Any') -> 'Table'`


### `read_from_df(data: 'pd.DataFrame', metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False) -> 'Table'`


### `read_from_dict(data: 'dict[Any, Any]', *args: 'Any', metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False, **kwargs: 'Any') -> 'Table'`


### `read_from_records(data: 'Any', *args: 'Any', metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False, **kwargs: 'Any')`


### `read_json(path_or_buf: 'str | Path | IO[AnyStr]', metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False, *args: 'Any', **kwargs: 'Any') -> 'Table'`


### `read_fwf(filepath_or_buffer: 'FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str]', metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False, *args: 'Any', **kwargs: 'Any') -> 'Table'`


### `read_stata(filepath_or_buffer: 'str | Path | IO[AnyStr]', metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False, *args: 'Any', **kwargs: 'Any') -> 'Table'`


### `read_rda(filepath_or_buffer: 'str | Path | IO[AnyStr]', table_name: 'str | None' = None, metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False) -> 'Table'`


### `read_rda_multiple(filepath_or_buffer: 'str | Path | IO[AnyStr]', table_names: 'list[str] | None' = None, metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False) -> 'dict[str, Table]'`


### `read_rds(filepath_or_buffer: 'str | Path | IO[AnyStr]', metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False) -> 'Table'`


### `read_df(df: 'pd.DataFrame', metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False) -> 'Table'`

Create a Table (with metadata and an origin) from a DataFrame.

Args:
    df: Input DataFrame.
    metadata: Table metadata (with a title and description).
    origin: Origin of the table.
    underscore: True to ensure all column names are snake case.

Returns:
    Table: Original data as a Table with metadata and an origin.


### `read_custom(read_function: 'Callable', filepath_or_buffer: 'str | Path | IO[AnyStr]', metadata: 'TableMeta', origin: 'Origin | None' = None, underscore: 'bool' = False, *args: 'Any', **kwargs: 'Any') -> 'Table'`

Read data using a custom reader function and return a Table with metadata.

This function allows using any custom data reading function while automatically
attaching metadata and origin information to the resulting Table. Useful when
standard read functions (read_csv, read_excel, etc.) don't meet specific needs.

Args:
    read_function: Custom function to read the data. Must accept filepath_or_buffer as first argument and return a DataFrame or Table.
    filepath_or_buffer: Path to the file or file-like object to read.
    metadata: Table metadata.
    origin: Origin of the table data.
    underscore: True to make all column names snake case.
    *args: Additional positional arguments to pass to read_function.
    **kwargs: Additional keyword arguments to pass to read_function.

Returns:
    Table: Data read by the custom function as a Table with attached metadata and origin.


### `read_parquet(filepath_or_buffer: 'str | Path | IO[AnyStr]', metadata: 'TableMeta | None' = None, origin: 'Origin | None' = None, underscore: 'bool' = False, *args: 'Any', **kwargs: 'Any') -> 'Table'`