# Local LTX image-to-video in ComfyUI

Use this when the user asks to animate an image with LTX/LTXV locally.

## User preference / workflow pattern

For this user, do not block the chat on multi-GB downloads or long video renders. Create a small self-contained `*_robot.sh` script and start it with `terminal(background=true, notify_on_complete=true)`, then continue the conversation. The robot should print a final `MEDIA:/absolute/path` line on success.

## Known-good baseline on this machine

- ComfyUI workspace: `/home/wildlama/comfy/ComfyUI`
- Clean comfy invocation pattern:
  ```bash
  env -u VIRTUAL_ENV PATH=/home/wildlama/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin comfy --workspace /home/wildlama/comfy/ComfyUI ...
  ```
- Official local template path:
  ```bash
  /home/wildlama/comfy/ComfyUI/.venv/lib/python3.11/site-packages/comfyui_workflow_templates_media_video/templates/ltxv_image_to_video.json
  ```
- Input images must be copied into ComfyUI input dir, e.g.:
  ```bash
  cp source.png /home/wildlama/comfy/ComfyUI/input/ltx_input.png
  ```

## Models for the simple LTXV image-to-video template

Template `ltxv_image_to_video.json` references:

- Checkpoint: `models/checkpoints/ltx-video-2b-v0.9.5.safetensors`
  - URL: `https://huggingface.co/Lightricks/LTX-Video/resolve/main/ltx-video-2b-v0.9.5.safetensors`
  - Observed full size is about 6.0 GiB / 6.34 GB (`6340729500` bytes), not 9.5 GB.
  - Use a minimum-size guard around `6300000000` bytes, not 9500000000.
- Text encoder: `models/text_encoders/t5xxl_fp16.safetensors` in the stock template, but a smaller fp8 encoder can be used by patching the `CLIPLoader` node:
  - URL: `https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp8_e4m3fn_scaled.safetensors`
  - Save as `models/text_encoders/t5xxl_fp8_e4m3fn_scaled.safetensors`
  - Patch `CLIPLoader.inputs.clip_name` to `t5xxl_fp8_e4m3fn_scaled.safetensors` and keep `type: ltxv`.

Use resumable downloads:

```bash
curl -L --fail --retry 50 --retry-delay 10 -C - -o "$out" "$url"
```

## Conversion/execution recipe

1. Ensure ComfyUI is running:
   ```bash
   comfy --workspace /home/wildlama/comfy/ComfyUI launch --background -- --listen 0.0.0.0 --port 8188
   curl -fsS http://127.0.0.1:8188/system_stats
   ```
2. Convert UI template to API format with the official CLI:
   ```bash
   comfy --workspace /home/wildlama/comfy/ComfyUI run \
     --workflow /path/to/ltxv_image_to_video.json \
     --host 127.0.0.1 --port 8188 --print-prompt > /tmp/ltx_prompt_raw.json
   ```
3. Parse from the first `{` in stdout; then patch:
   - `LoadImage.inputs.image` -> copied input filename such as `ltx_input.png`
   - `CLIPLoader.inputs.clip_name` -> installed T5 file
   - `CheckpointLoaderSimple.inputs.ckpt_name` -> `ltx-video-2b-v0.9.5.safetensors`
   - positive `CLIPTextEncode.inputs.text` -> descriptive motion prompt
   - negative prompt -> low-quality/motion-artifact negatives
   - `LTXVImgToVideo.inputs.width/height/length/batch_size/strength` -> local-safe values (e.g. `768x512`, `97`, `1`, `0.12`)
   - `SaveVideo.inputs.filename_prefix` -> useful prefix
4. Run with the skill runner and a long timeout:
   ```bash
   python3 /home/wildlama/.hermes/skills/creative/comfyui/scripts/run_workflow.py \
     --workflow /path/to/ltx_i2v_api.json \
     --output-dir /home/wildlama/comfy/ltx-outputs \
     --timeout 1800
   ```
5. Re-encode MP4 for Telegram/Discord compatibility if ffmpeg exists:
   ```bash
   ffmpeg -y -i input.mp4 -c:v libx264 -profile:v main -preset medium -crf 16 -pix_fmt yuv420p -an output_telegram.mp4
   ```

## Pitfalls

- Do not hand-build the graph from scratch; convert the official template and patch specific nodes.
- Use the template-integrity rules: preserve topology, keep dynamic/dotted input keys, and patch based on server errors.
- Do not use shell-level `nohup ... &`; Hermes rejects that pattern. Use `terminal(background=true, notify_on_complete=true)` so the process is tracked.
- If a download fails because a min-byte threshold is wrong, verify the actual HuggingFace `Content-Length`/file size and patch the threshold; the model may be complete even if smaller than expected.
