This node prepares data for training by encoding images and text. It takes a list of images and a corresponding list of text captions, then uses a VAE model to convert the images into latent representations and a CLIP model to convert the text into conditioning data. The resulting paired latents and conditioning are output as lists, ready for use in training workflows.

## Inputs

| Parameter | Description | Data Type | Required | Range |
| --- | --- | --- | --- | --- |
| `images` | List of images to encode. | IMAGE | Yes | N/A |
| `vae` | VAE model for encoding images to latents. | VAE | Yes | N/A |
| `clip` | CLIP model for encoding text to conditioning. | CLIP | Yes | N/A |
| `texts` | List of text captions. Can be length n (matching images), 1 (repeated for all), or omitted (uses empty string). | STRING | No | N/A |

**Parameter Constraints:**

* The number of items in the `texts` list must be 0, 1, or exactly match the number of items in the `images` list. If it is 0, an empty string is used for all images. If it is 1, that single text is repeated for all images.

## Outputs

| Output Name | Description | Data Type |
| --- | --- | --- |
| `latents` | List of latent dicts. | LATENT |
| `conditioning` | List of conditioning lists. | CONDITIONING |

> This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! [Edit on GitHub](https://github.com/Comfy-Org/embedded-docs/blob/main/comfyui_embedded_docs/docs/MakeTrainingDataset/en.md)

---
**Source fingerprint (SHA-256):** `72f1686aa9da9d50b1948040c323c7e944d4a5c1f4cd2ec5e0987d998c20ea43`