ScreenPilot MCP Server

Use This MCP server To

Automate repetitive GUI tasks via LLM-driven mouse and keyboard control Capture and analyze screen content for context-aware automation Create educational demos by scripting GUI interactions Enable LLMs to navigate and control desktop applications Develop interactive tutorials with real-time screen feedback Test software UI by automating user input sequences Build fun interactive experiences controlled by natural language

README

ScreenPilot

MCP server to let LLM take full control on your device by providing screen automation toolkit for controlling and interacting with graphical user interfaces. Good for automation, education and having fun.

Main Features

📷 Screen capture and analysis
🖱️ Mouse control (clicking, positioning)
⌨️ Keyboard input (typing, key presses, hotkeys)

watch demo

Screen.Pilot.mp4

Installation

Install python 3.12

Clone the repository:

git clone https://github.com/Mtehabsim/ScreenPilot.git

create virtiual environment

python -m venv venv

activate the env

venv\Scripts\activate

Install the required packages:
```
pip install -r requirements.txt
```
Open Claude AI desktop
file -> settings -> developer -> edit config
open config file and paste this

{
    "mcpServers": {
        "device-controll": {
            "command": "pathToEnv\\venv\\Scripts\\python.exe",
            "args": [
                "pathToProject\\ScreenPilot\\main.py"
            ]
        }
    }
}

Replace "pathToEnv\venv\Scripts\python.exe" → with the full path to your python.exe "pathToProject\ScreenPilot\main.py" → with the full path to your main.py file
Save the config file.
Open Claude AI Desktop.
Go to File → Exit
You can now open Claude AI Desktop and enjoy ScreenPilot.

Available Tools

Screen Capture: Take screenshots and get screen information
Mouse Control: Move the mouse and perform clicks
Keyboard Actions: Type text, press keys, and use hotkey combinations
Scrolling: Scroll in different directions and to specific positions
Element Detection: Check if elements exist on screen and wait for them to appear
Action Sequences: Perform multiple actions in sequence

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

ScreenPilot FAQ

How do I install ScreenPilot?

Install Python 3.12, clone the repo, create and activate a virtual environment, then install dependencies with pip.

What platforms does ScreenPilot support?

ScreenPilot primarily supports desktop environments where Python 3.12 can run and control the GUI.

How does ScreenPilot enable LLMs to control the device?

It exposes screen capture, mouse, and keyboard control APIs to the MCP client, allowing LLMs to interact with the GUI.

Can ScreenPilot be used with multiple LLM providers?

Yes, it is provider-agnostic and works with OpenAI, Claude, Gemini, and others via MCP.

Is ScreenPilot safe to use?

It runs locally and requires explicit configuration, ensuring scoped and secure control over your device.

What programming languages are needed to use ScreenPilot?

Basic Python knowledge is helpful for setup; interaction is primarily via MCP protocol.

Can ScreenPilot automate any desktop application?

Yes, as long as the application has a graphical interface accessible via screen capture and input events.

How do I configure ScreenPilot with an MCP client?

Add the server configuration in the MCP client settings as per the provided example in the GitHub readme.