# Visual Content Declination Notes

Use this reference when building a workflow that removes embedded text, generates localized variants, or proposes crop variants from a source visual.

## Working pattern

- **UI:** a simple Gradio or browser-based front end for previewing boxes and outputs.
- **Backend:** ComfyUI for inpainting/regeneration.
- **Translation:** a local OpenAI-compatible backend such as Ollama for source-copy translation.
- **Fallback behavior:** always return the best available preview even if the clean/inpaint pass fails.

## What worked

### Text detection

A hybrid detector works better than one heuristic:

- blackhat + horizontal gradients
- top-hat + gradients
- direct Canny-based box extraction

Merge and pad nearby boxes before masking. The goal is not perfect OCR, just enough signal to hide or replace the obvious text regions.

### Localization mockups

For quick approval loops:

1. reuse the detected text boxes
2. translate the source copy
3. render the translation inside the original layout
4. add a subtle backing panel if the text needs contrast

Keep the translated text short. Preserve line breaks where possible.

### Crop proposals

Generate multiple crop candidates and score them with a simple saliency + center-bias heuristic.

Useful ratios:

- 1:1
- 4:5
- 9:16
- 16:9
- 3:4

## Output contract

A useful declination tool should emit:

- the original image with detected boxes overlaid
- the cleaned image or a fallback preview
- localized previews per target language
- a contact sheet of crop proposals
- machine-readable metadata for downstream automation

## Pitfalls

- Always allow manual box review.
- Keep the backend abstract so ComfyUI can be swapped later.
- If inpaint fails, still return the preview and the mask.
- Favor a resilient, inspectable workflow over a perfect first-pass result.
