No description

Find a file

OpenCode Agent 2173b60097 chore: update STATE.md with all 41 issues created		2026-03-24 23:40:45 +01:00
configs	Initial commit: Project structure and foundation	2026-03-24 23:13:00 +01:00
src	Initial commit: Project structure and foundation	2026-03-24 23:13:00 +01:00
.gitignore	Initial commit: Project structure and foundation	2026-03-24 23:13:00 +01:00
.issue_num	chore: update STATE.md with all 41 issues created	2026-03-24 23:40:45 +01:00
AGENTS.md	Initial commit: Project structure and foundation	2026-03-24 23:13:00 +01:00
forgejo_tools.py	chore: update STATE.md with all 41 issues created	2026-03-24 23:40:45 +01:00
GIT_WORKFLOW.md	Initial commit: Project structure and foundation	2026-03-24 23:13:00 +01:00
PROJECT_PLAN.md	Initial commit: Project structure and foundation	2026-03-24 23:13:00 +01:00
pyproject.toml	Initial commit: Project structure and foundation	2026-03-24 23:13:00 +01:00
README.md	Initial commit: Project structure and foundation	2026-03-24 23:13:00 +01:00
STATE.md	chore: update STATE.md with all 41 issues created	2026-03-24 23:40:45 +01:00

README.md

Agent Visual Checker

VLM-based visual validation tool for automated GUI testing

Overview

Agent Visual Checker is a tool designed to automate visual validation of GUI applications using Vision Language Models (VLM). Instead of requiring human-in-the-loop validation, a VLM agent provides feedback as a senior testing engineer, analyzing screen captures and providing validation results.

Key Features

High-Frequency Screen Capture: Supports 30-60Hz screenshot capture for detailed recording
Session-Based Recording: Stateful recording sessions that capture complete application workflows
Cross-Platform Support: Windows-first implementation with macOS and Linux support planned
MCP Tools: Model Context Protocol tools for seamless VLM integration
Window Management: Bring applications to foreground, minimize, enumerate windows
VLM Flexibility: Support for local VLMs (Ollama, vLLM) and API-based VLMs (OpenAI, Claude)
Web Dashboard: Real-time monitoring, session replay, and feedback visualization

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                        AGENT VISUAL CHECKER                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                     CROSS-PLATFORM ABSTRACTION                       │   │
│  │   ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐   │   │
│  │   │ Win32/WinRT│  │    MacOS    │  │   Linux (X11/Wayland)  │   │   │
│  │   └─────────────┘  └─────────────┘  └─────────────────────────┘   │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                      │                                      │
│  ┌───────────────────────────────────▼────────────────────────────────┐   │
│  │                      CAPTURE SERVICE LAYER                          │   │
│  │   ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐   │   │
│  │   │ High-Freq    │  │ Session      │  │ Compression          │   │   │
│  │   │ Screenshot   │  │ Manager      │  │ Encoder              │   │   │
│  │   └──────────────┘  └──────────────┘  └──────────────────────┘   │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                      │                                      │
│  ┌───────────────────────────────────▼────────────────────────────────┐   │
│  │                        MCP SERVER LAYER                             │   │
│  │   ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐   │   │
│  │   │ Screen Tools │  │Window Tools  │  │ Session Tools        │   │   │
│  │   └──────────────┘  └──────────────┘  └──────────────────────┘   │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                      │                                      │
│  ┌───────────────────────────────────▼────────────────────────────────┐   │
│  │                      VLM ADAPTER LAYER                              │   │
│  │   ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐   │   │
│  │   │ Local VLM    │  │ API VLM      │  │ Feedback             │   │   │
│  │   └──────────────┘  └──────────────┘  └──────────────────────┘   │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                      │                                      │
│  ┌───────────────────────────────────▼────────────────────────────────┐   │
│  │                        WEB UI LAYER                                 │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Quick Start

Prerequisites

Python 3.10+
For Windows: Windows 10/11
For macOS: macOS 11+ (planned)
For Linux: X11 or Wayland (planned)
VLM endpoint (Ollama, vLLM, OpenAI, etc.)

Installation

pip install agent-visual-checker

Or install from source:

git clone https://git.nazimyildiz.com/NAMCHO/agent-visual-checker.git
cd agent-visual-checker
pip install -e ".[dev]"

Configuration

Edit configs/default.yaml:

capture:
  fps: 30
  quality: 85
  format: "png"

storage:
  base_path: "./sessions"
  retention_days: 7

vlm:
  provider: "ollama"
  endpoint: "http://localhost:11434"
  model: "llama3.2-vision"

mcp:
  host: "0.0.0.0"
  port: 8765

webui:
  host: "0.0.0.0"
  port: 8000

Running

# Start MCP server
python -m src.mcp.server

# Start Web UI (in another terminal)
python -m src.ui.main

MCP Tools

Screen Tools

screenshot - Capture a single screenshot
list_windows - List all visible windows
get_window_info - Get detailed window information
bring_to_front - Bring a window to foreground
minimize_window - Minimize a window

Session Tools

start_recording_session - Start a new recording session
stop_recording_session - Stop an active recording session
list_sessions - List all recording sessions
get_session_info - Get session metadata
delete_session - Delete a session

Analysis Tools

analyze_screenshot - Analyze a single screenshot
analyze_session - Analyze a complete recording session
get_validation_feedback - Get validation feedback

Session Workflow

┌─────────────────────────────────────────────────────────────────────────────┐
│                         RECORDING SESSION FLOW                              │
│                                                                             │
│   Agent                      MCP Server                    Capture Service  │
│    │                            │                               │            │
│    │──start_recording_session──►                               │            │
│    │                            │──start_session───────────────►            │
│    │                            │                               │            │
│    │                            │◄──session_id──────────────────│            │
│    │◄──session_id───────────────│                               │            │
│    │                            │                               │            │
│    │  (do actions in app)      │                               │            │
│    │                            │◄─continuous capture @30-60Hz──│            │
│    │                            │                               │            │
│    │──stop_recording_session───►                               │            │
│    │                            │──stop_session────────────────►            │
│    │                            │                               │            │
│    │                            │◄──session_summary─────────────│            │
│    │◄──session_summary──────────│                               │            │
│    │                            │                               │            │
│    │──analyze_session──────────►                               │            │
│    │                            │──VLM analysis─────────────────►            │
│    │                            │                               │            │
│    │◄──validation_feedback──────│                               │            │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Development

Running Tests

# Unit tests
pytest tests/unit/

# Integration tests
pytest tests/integration/

# With coverage
pytest --cov=src tests/unit/

Code Quality

# Lint
ruff check src/

# Type check
mypy src/

# Format
ruff format src/

Project Structure

agent-visual-checker/
├── src/
│   ├── capture/              # Cross-platform screen capture
│   ├── session/              # Recording session management
│   ├── mcp/                  # MCP server and tools
│   ├── vlm/                  # VLM adapters
│   ├── feedback/             # Validation feedback engine
│   └── ui/                   # Web dashboard
├── tests/
│   ├── unit/
│   ├── integration/
│   └── manual/
├── configs/
├── docs/
├── README.md
├── AGENTS.md
└── pyproject.toml

Milestones

Milestone	Description	Status
M1	Project Foundation & Cross-Platform Abstraction	TODO
M2	Windows Capture Implementation	TODO
M3	MCP Server with Core Tools	TODO
M4	Session Management & Storage	TODO
M5	VLM Adapter Layer	TODO
M6	Web UI Dashboard	TODO
M7	Feedback Engine (Senior Tester)	TODO
M8	macOS/Linux Capture Ports	TODO
M9	Integration Testing & Polish	TODO

License

MIT License

Contributing

Contributions are welcome! Please read the AGENTS.md for development guidelines and GIT_WORKFLOW.md for commit conventions.