## Overview

Track objects across video frames using SAM3's memory-based tracker. This node processes a sequence of video frames and maintains object identities across frames, using either initial masks or text prompts to define what to track.

## Inputs

| Parameter | Description | Data Type | Required | Range |
| --- | --- | --- | --- | --- |
| `images` | Video frames as batched images | IMAGE | Yes | Batched video frames |
| `model` | The SAM3 model to use for tracking | MODEL | Yes | SAM3 model |
| `initial_mask` | Mask(s) for the first frame to track (one per object). Required if `conditioning` is not provided. | MASK | No | One mask per object |
| `conditioning` | Text conditioning for detecting new objects during tracking. Required if `initial_mask` is not provided. | CONDITIONING | No | Text conditioning |
| `detection_threshold` | Score threshold for text-prompted detection | FLOAT | No | 0.0 to 1.0 (default: 0.5) |
| `max_objects` | Max tracked objects. Initial masks count toward this limit. 0 uses the internal cap of 64. | INT | No | 0 to 64 (default: 0) |
| `detect_interval` | Run detection every N frames (1=every frame). Higher values save compute. | INT | No | 1 to unlimited (default: 1) |

**Note:** Either `initial_mask` or `conditioning` must be provided. If both are omitted, the node will raise an error.

## Outputs

| Output Name | Description | Data Type |
| --- | --- | --- |
| `track_data` | Tracking data containing object masks and metadata across all video frames | SAM3TrackData |

> This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! [Edit on GitHub](https://github.com/Comfy-Org/embedded-docs/blob/main/comfyui_embedded_docs/docs/SAM3_VideoTrack/en.md)

---
**Source fingerprint (SHA-256):** `36ee256c46ea3816be4d06b64d945b79af530032f29e5e4c8741971c7ebf9fae`
