Fire in da houseTop Tip:Paying $100+ per month for Perplexity, MidJourney, Runway, ChatGPT and other tools is crazy - get all your AI tools in one site starting at $15 per month with Galaxy AI Fire in da houseCheck it out free

dataset-viewer

MCP.Pizza Chef: privetin

The dataset-viewer MCP server integrates with the Hugging Face Dataset Viewer API to provide structured, real-time access to datasets hosted on the Hugging Face Hub. It supports the dataset:// URI scheme, allowing clients to browse dataset configurations, splits, and contents with pagination. The server handles authentication for private datasets, enabling secure access. It offers tools to validate dataset existence, retrieve detailed dataset information, fetch dataset rows with filtering and searching capabilities, and obtain dataset statistics and analysis. This server is ideal for developers building AI workflows that require dynamic dataset exploration and analysis within the MCP ecosystem.

Use This MCP server To

Browse Hugging Face datasets with pagination Search and filter dataset contents dynamically Access private datasets securely with authentication Retrieve detailed metadata about datasets Validate dataset availability and access Fetch dataset rows for analysis or training Obtain dataset statistics and summaries

README

Dataset Viewer MCP Server

An MCP server for interacting with the Hugging Face Dataset Viewer API, providing capabilities to browse and analyze datasets hosted on the Hugging Face Hub.

Features

Resources

  • Uses dataset:// URI scheme for accessing Hugging Face datasets
  • Supports dataset configurations and splits
  • Provides paginated access to dataset contents
  • Handles authentication for private datasets
  • Supports searching and filtering dataset contents
  • Provides dataset statistics and analysis

Tools

The server provides the following tools:

  1. validate

    • Check if a dataset exists and is accessible
    • Parameters:
      • dataset: Dataset identifier (e.g. 'stanfordnlp/imdb')
      • auth_token (optional): For private datasets
  2. get_info

    • Get detailed information about a dataset
    • Parameters:
      • dataset: Dataset identifier
      • auth_token (optional): For private datasets
  3. get_rows

    • Get paginated contents of a dataset
    • Parameters:
      • dataset: Dataset identifier
      • config: Configuration name
      • split: Split name
      • page (optional): Page number (0-based)
      • auth_token (optional): For private datasets
  4. get_first_rows

    • Get first rows from a dataset split
    • Parameters:
      • dataset: Dataset identifier
      • config: Configuration name
      • split: Split name
      • auth_token (optional): For private datasets
  5. get_statistics

    • Get statistics about a dataset split
    • Parameters:
      • dataset: Dataset identifier
      • config: Configuration name
      • split: Split name
      • auth_token (optional): For private datasets
  6. search_dataset

    • Search for text within a dataset
    • Parameters:
      • dataset: Dataset identifier
      • config: Configuration name
      • split: Split name
      • query: Text to search for
      • auth_token (optional): For private datasets
  7. filter

    • Filter rows using SQL-like conditions
    • Parameters:
      • dataset: Dataset identifier
      • config: Configuration name
      • split: Split name
      • where: SQL WHERE clause (e.g. "score > 0.5")
      • orderby (optional): SQL ORDER BY clause
      • page (optional): Page number (0-based)
      • auth_token (optional): For private datasets
  8. get_parquet

    • Download entire dataset in Parquet format
    • Parameters:
      • dataset: Dataset identifier
      • auth_token (optional): For private datasets

Installation

Prerequisites

  • Python 3.12 or higher
  • uv - Fast Python package installer and resolver

Setup

  1. Clone the repository:
git clone https://github.com/privetin/dataset-viewer.git
cd dataset-viewer
  1. Create a virtual environment and install:
# Create virtual environment
uv venv

# Activate virtual environment
# On Unix:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

# Install in development mode
uv add -e .

Configuration

Environment Variables

  • HUGGINGFACE_TOKEN: Your Hugging Face API token for accessing private datasets

Claude Desktop Integration

Add the following to your Claude Desktop config file:

On Windows: %APPDATA%\Claude\claude_desktop_config.json

On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "dataset-viewer": {
      "command": "uv",
      "args": [
        "--directory",
        "parent_to_repo/dataset-viewer",
        "run",
        "dataset-viewer"
      ]
    }
  }
}

License

MIT License - see LICENSE for details

dataset-viewer FAQ

How do I authenticate to access private datasets?
You provide an optional auth_token parameter with your Hugging Face API token to authenticate and access private datasets securely.
What URI scheme does the dataset-viewer server use?
It uses the dataset:// URI scheme to identify and access Hugging Face datasets.
Can I retrieve specific parts of a dataset?
Yes, the server supports dataset configurations and splits, allowing you to access specific subsets of data.
How does pagination work in dataset viewing?
The server provides paginated access to dataset contents, enabling efficient browsing of large datasets in chunks.
Is searching supported within datasets?
Yes, the server supports searching and filtering dataset contents to help find relevant data quickly.
What kind of dataset information can I get?
You can retrieve detailed metadata, including dataset statistics, configurations, splits, and analysis summaries.
How do I check if a dataset exists?
Use the validate tool with the dataset identifier to verify if the dataset is accessible on the Hugging Face Hub.
Can this server be used with multiple LLM providers?
Yes, it is compatible with models from OpenAI, Anthropic Claude, and Google Gemini, among others.