The WanDancerVideo node prepares conditioning data and an empty latent tensor for video generation with the WanDancer model. It combines positive and negative conditioning with optional inputs like a starting image, mask, CLIP vision embeddings, and audio features to control the generated video.

## Inputs

| Parameter | Description | Data Type | Required | Range |
| --- | --- | --- | --- | --- |
| `positive` | The positive conditioning to guide video generation. | CONDITIONING | Yes |  |
| `negative` | The negative conditioning to guide video generation. | CONDITIONING | Yes |  |
| `vae` | The VAE used to encode the start image into the latent space. | VAE | Yes |  |
| `width` | The width of the generated video in pixels (default: 480). | INT | Yes | 16 to MAX_RESOLUTION (step: 16) |
| `height` | The height of the generated video in pixels (default: 832). | INT | Yes | 16 to MAX_RESOLUTION (step: 16) |
| `length` | The number of frames in the generated video. Should stay 149 for WanDancer (default: 149). | INT | Yes | 1 to MAX_RESOLUTION (step: 4) |
| `clip_vision_output` | The CLIP vision embeddings for the first frame. | CLIP_VISION_OUTPUT | No |  |
| `clip_vision_output_ref` | The CLIP vision embeddings for the reference image. | CLIP_VISION_OUTPUT | No |  |
| `start_image` | The initial image(s) to be encoded. Can be any number of frames, up to the specified `length`. | IMAGE | No |  |
| `mask` | Image conditioning mask for the start image(s). White areas are kept, black areas are generated. Used for local generations. | MASK | No |  |
| `audio_encoder_output` | The output from an audio encoder, providing audio features, fps, and inject scale for audio-conditional generation. | AUDIO_ENCODER_OUTPUT | No |  |

**Note on Parameter Constraints:**
- The `start_image` and `mask` inputs are optional but can be used together. When `start_image` is provided, it is encoded and concatenated with the latent. If `mask` is also provided, it controls which parts of the start image are kept (white) and which are regenerated (black). If `mask` is not provided, the entire start image area is used as a conditioning guide.
- The `clip_vision_output` and `clip_vision_output_ref` inputs are optional and can be used together to provide visual context for the first frame and a reference image.
- The `audio_encoder_output` input is optional and provides audio features for audio-conditional generation.

## Outputs

| Output Name | Description | Data Type |
| --- | --- | --- |
| `positive` | The positive conditioning with any additional data (concat latent, CLIP vision, audio) attached. | CONDITIONING |
| `negative` | The negative conditioning with any additional data (concat latent, CLIP vision, audio) attached. | CONDITIONING |
| `latent` | An empty latent tensor with dimensions matching the specified video length, height, and width. | LATENT |

> This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! [Edit on GitHub](https://github.com/Comfy-Org/embedded-docs/blob/main/comfyui_embedded_docs/docs/WanDancerVideo/en.md)

---
**Source fingerprint (SHA-256):** `0a75b24c8e5c164d81b08eb438862d94d4409ece8dc22c126979347e2350c828`
