## Overview

This node prepares an image-to-video generation setup for AR (Auto-Regressive) video models. It takes a starting image, encodes it into the latent space using a VAE, and stores the encoded image in the model's configuration. This allows the video sampling process to use the image as the first frame, effectively seeding the generation without needing a separate image-to-video model architecture.

## Inputs

| Parameter | Description | Data Type | Required | Range |
| --- | --- | --- | --- | --- |
| `model` | The AR video model to be used for generation. | MODEL | Yes | - |
| `vae` | The VAE model used to encode the starting image into latent space. | VAE | Yes | - |
| `start_image` | The initial image that will serve as the first frame of the generated video. | IMAGE | Yes | - |
| `width` | The width of the generated video frames (default: 832). | INT | Yes | 16 to 8192 (step: 16) |
| `height` | The height of the generated video frames (default: 480). | INT | Yes | 16 to 8192 (step: 16) |
| `length` | The total number of frames in the generated video (default: 81). | INT | Yes | 1 to 1024 (step: 4) |
| `batch_size` | The number of video sequences to generate in a single batch (default: 1). | INT | Yes | 1 to 64 |

## Outputs

| Output Name | Description | Data Type |
| --- | --- | --- |
| `MODEL` | The cloned model with the encoded start image stored in its configuration for video generation. | MODEL |
| `LATENT` | An empty latent tensor with the correct dimensions for the video generation process. | LATENT |

> This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! [Edit on GitHub](https://github.com/Comfy-Org/embedded-docs/blob/main/comfyui_embedded_docs/docs/ARVideoI2V/en.md)

---
**Source fingerprint (SHA-256):** `0445b279ba49fa946050cfa70d1e6b13240eaa600b99dfe63f27c3203dc4b61b`
