---
name: local-web-collection-apps
description: "Build local dashboards or agents that collect public web data, rank results, and present them in a browser or desktop UI."
version: 1.0.0
author: Hermes Agent
license: MIT
platforms: [linux, macos, windows]
metadata:
  hermes:
    tags: [web, scraping, dashboard, local-server, search, automation]
    related_skills: [plan, test-driven-development, systematic-debugging, linux-host-administration]
---

# Local Web Collection Apps

Use this skill when building a local app that searches or scrapes the web, filters/ranks results, or wraps LLM/agent backends behind a browser UI or API.

Typical signs:
- user wants a local search/monitoring dashboard
- target sites block direct requests, so discovery must rely on search-engine results or public summaries
- the app needs a browser UI, health endpoint, and simple API
- credentials or tokens must be entered safely and stored encrypted locally

Reference notes and provider quirks live in `references/web-discovery-and-secrets.md`.

## Local media redaction apps

This skill also covers local browser apps that process uploaded images or videos, including automatic visual redaction/censoring workflows. When the user asks for automatic detection or tracking, build that as the primary path rather than defaulting to manual rectangle drawing. A proven pattern is FastAPI + OpenCV + NudeNet for automatic sensitive-region detection, with IoU-based box smoothing across video frames, Gaussian blur on expanded boxes, and ffmpeg audio remuxing for video outputs. For videos, prefer a preview-first workflow: upload once, generate a blurred still from a user-selected frame with adjustable blur/margin/tracking interval, then run the expensive full render only after approval. See `references/media-redaction-apps.md` for implementation details, safety/UX notes, Tailscale Serve/Funnel exposure, and test patterns.

## Agent-agnostic GUI wrappers

This skill also covers local GUI wrappers for external CLI agents (Hermes, Claude Code, OpenClaw, and similar tools).

Recommended pattern:
- Build a tiny adapter layer that converts each backend into the same event model.
- Keep a deterministic `echo` adapter for tests.
- Use argv lists with placeholder substitution instead of shell strings.
- Preserve raw stdout/stderr in events so failures are debuggable.
- Start with one-shot print mode for CLIs that support it, then add streaming later if needed.

Reference notes and an example CLI command shape live in `references/agent-gui-adapters.md`.

## Preferred architecture

- Split into three pieces:
  1. discovery/fetch layer
  2. parser/scorer layer
  3. UI layer
- Write parsers as pure functions so they can be unit tested without network access.
- Treat remote pages as hostile: expect 403s, JS challenges, rate limiting, and inconsistent HTML.
- When direct page scraping is blocked, use search-engine result feeds or public snippets as the discovery layer.

## Workflow

1. Define the search domain and ranking signals.
2. Build a pure parser/scorer first.
3. Add tests for the parser, score logic, and query builder.
4. Implement the fetch layer with a single retry-friendly provider.
5. Add the local UI.
6. Smoke-test the live endpoint against real web data.
7. Document launch steps and known blockers.

## Provider selection

- Prefer the simplest public endpoint that returns structured output.
- If HTML search pages are brittle, prefer RSS/XML or another structured feed.
- If one source blocks requests, do not hard-code a refusal; switch discovery sources or degrade gracefully.
- Keep user-facing copy honest about what is being scraped: search results, previews, or direct pages.

## Safe credential entry and storage

When the app needs usernames/passwords/API tokens:
- Use a GUI prompt available on the machine if present (`zenity` is a good default on Linux desktops).
- Never ask for secrets in the main chat unless the user explicitly chooses that path.
- Store secrets encrypted at rest with `gpg` or an equivalent local vault mechanism.
- Keep vault files under the app config directory, not in Downloads or the project root.
- Permissions should be restrictive (`600` for files, `700` for config directories).
- Prefer a vault passphrase chosen locally by the user.

## Testing and verification

- Add unit tests for:
  - query builder
  - parser(s)
  - price/distance/score extraction
  - fallback behaviors when a source returns nothing
- Run live smoke tests only after unit tests pass.
- Verify the local server by calling `/health` and one real search endpoint.
- If the UI is browser-based, open it and confirm the result list renders and the form submits.

## Pitfalls

- Do not bake in assumptions about one real-estate site or one search provider.
- Do not rely on fragile CSS selectors for every target site.
- Do not store plaintext credentials in shell history, temp files, or repo-tracked files.
- Do not treat a search result title as a verified property price unless the price is clearly visible.
- Do not return empty results without stating whether the source is blocked or the query is too narrow.

## Suggested implementation shape

- `app.py` or `run.py` for server startup
- `scraper.py` for fetch/parse/score logic
- `tests/` for parser and scoring tests
- `README.md` with launch, default filters, and caveats

## Reusable commands

- Unit tests: `python3 -m unittest discover -s tests -v`
- Smoke test: `curl http://127.0.0.1:<port>/health`
- API check: `curl 'http://127.0.0.1:<port>/api/search?...'`

## If the user asks for a secure GUI vault

- Create a tiny launcher or desktop entry.
- Prompt for the secret locally.
- Encrypt immediately with `gpg`.
- Load only into memory for the connection step.
- Never print the secret back to the terminal.
