The TextEncodeAceStepAudio node processes text inputs for audio conditioning by combining tags and lyrics into tokens, then encoding them with adjustable lyrics strength. It takes a CLIP model along with text descriptions and lyrics, tokenizes them together, and generates conditioning data suitable for audio generation tasks. The node allows fine-tuning the influence of lyrics through a strength parameter that controls their impact on the final output.

## Inputs

| Parameter | Description | Data Type | Required | Range |
| --- | --- | --- | --- | --- |
| `clip` | The CLIP model used for tokenization and encoding | CLIP | Yes | - |
| `tags` | Text tags or descriptions for audio conditioning (supports multiline input and dynamic prompts) | STRING | Yes | - |
| `lyrics` | Lyrics text for audio conditioning (supports multiline input and dynamic prompts) | STRING | Yes | - |
| `lyrics_strength` | Controls the strength of lyrics influence on the conditioning output (default: 1.0, step: 0.01) | FLOAT | No | 0.0 - 10.0 |

## Outputs

| Output Name | Description | Data Type |
| --- | --- | --- |
| `conditioning` | The encoded conditioning data containing processed text tokens with applied lyrics strength | CONDITIONING |

> This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! [Edit on GitHub](https://github.com/Comfy-Org/embedded-docs/blob/main/comfyui_embedded_docs/docs/TextEncodeAceStepAudio/en.md)

---
**Source fingerprint (SHA-256):** `79cdc3b7d0728a7fdb771243bc1b30f252cc322892df634584698a8f2c4d1633`