The Kling Avatar 2.0 node generates broadcast-style digital human videos from a single reference photo and an audio file. It creates a talking avatar video with an optional text prompt to define the avatar's actions, emotions, and camera movements.

## Inputs

| Parameter | Description | Data Type | Required | Range |
| --- | --- | --- | --- | --- |
| `image` | Avatar reference image. Width and height must be at least 300px. Aspect ratio must be between 1:2.5 and 2.5:1. | IMAGE | Yes | - |
| `sound_file` | Audio input. Must be between 2 and 300 seconds in duration. | AUDIO | Yes | - |
| `mode` | The generation mode to use. | COMBO | Yes | `"std"`<br>`"pro"` |
| `prompt` | Optional prompt to define avatar actions, emotions, and camera movements. (default: empty string) | STRING | No | - |
| `seed` | Seed controls whether the node should re-run; results are non-deterministic regardless of seed. (default: 0) | INT | Yes | 0 to 2147483647 |

**Note:** The `image` and `sound_file` inputs have specific validation requirements. The image must be at least 300x300 pixels with an aspect ratio between 1:2.5 and 2.5:1. The audio file must be between 2 and 300 seconds long.

## Outputs

| Output Name | Description | Data Type |
| --- | --- | --- |
| `output` | The generated digital human video. | VIDEO |

> This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! [Edit on GitHub](https://github.com/Comfy-Org/embedded-docs/blob/main/comfyui_embedded_docs/docs/KlingAvatarNode/en.md)

---
**Source fingerprint (SHA-256):** `d9264e250c578dcb38612c192f8567a8f48c6624e030d8765b13bb71aae2d0b8`