The CosmosImageToVideoLatent node creates video latent representations from input images. It generates a blank video latent and optionally encodes start and/or end images into the beginning and/or end frames of the video sequence. When images are provided, it also creates corresponding noise masks to indicate which parts of the latent should be preserved during generation.

## Inputs

| Parameter | Description | Data Type | Required | Range |
| --- | --- | --- | --- | --- |
| `vae` | The VAE model used for encoding images into latent space | VAE | Yes | - |
| `width` | The width of the output video in pixels (default: 1280) | INT | Yes | 16 to MAX_RESOLUTION |
| `height` | The height of the output video in pixels (default: 704) | INT | Yes | 16 to MAX_RESOLUTION |
| `length` | The number of frames in the video sequence (default: 121) | INT | Yes | 1 to MAX_RESOLUTION |
| `batch_size` | The number of latent batches to generate (default: 1) | INT | Yes | 1 to 4096 |
| `start_image` | Optional image to encode at the beginning of the video sequence | IMAGE | No | - |
| `end_image` | Optional image to encode at the end of the video sequence | IMAGE | No | - |

**Note:** When neither `start_image` nor `end_image` are provided, the node returns a blank latent without any noise mask. When either image is provided, the corresponding sections of the latent are encoded and masked accordingly.

## Outputs

| Output Name | Description | Data Type |
| --- | --- | --- |
| `latent` | The generated video latent representation with optional encoded images and corresponding noise masks | LATENT |

> This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! [Edit on GitHub](https://github.com/Comfy-Org/embedded-docs/blob/main/comfyui_embedded_docs/docs/CosmosImageToVideoLatent/en.md)

---
**Source fingerprint (SHA-256):** `4fefd1b6c38c93c260ef8376e8d69ba610a556b3c8555863016a1afd45885eaf`