Media & Content Types
Media and content types are designed for managing rich, unstructured datasets used in computer vision, NLP, audio processing, 3D modeling, and content production. All media types share the File Metadata extension and unlock a dedicated metadata tab.
Available Types
Video
Video datasets, recordings, and collections.
- Icon: Video · Color:
#e11d48(rose) - Use cases: Training data for action recognition, object tracking, autonomous driving, content production
Image
Image datasets and collections.
- Icon: Image · Color:
#0891b2(cyan) - Use cases: Computer vision training data, satellite imagery, medical imaging, design assets
Audio
Audio datasets and recordings.
- Icon: Music · Color:
#7c3aed(violet) - Use cases: Speech recognition training, music analysis, environmental sound classification
Text Collection
Text corpora and document collections.
- Icon: FileText · Color:
#059669(emerald) - Use cases: NLP training data, document archives, knowledge bases, translation corpora
3D Model
3D models, point clouds, and spatial datasets.
- Icon: Box · Color:
#d97706(amber) - Use cases: Autonomous driving LiDAR data, architectural models, game assets, medical 3D scans
File Metadata Tab
All media types unlock the File Metadata tab, which replaces the standard Schema tab. This tab is organized into several sections:
Common Properties
These properties are shared across all media types:
| Property | Description | Example |
|---|---|---|
| Media Format | Primary file format | JPEG, MP4, WAV, USDZ |
| Total Files | Number of files in the dataset | 50,000 |
| Total Size | Combined dataset size | 2.4 TB |
| File Metadata | Technical metadata (codec, bitrate, resolution) | {"codec": "H.264", "resolution": "1920x1080"} |
| EXIF Metadata | Embedded camera/device metadata | {"camera_model": "Canon EOS R5", "iso": 400} |
Folder Structure
Define the dataset's directory layout with descriptions and file patterns:
train/
├── *.jpg — Training images
└── labels.json — COCO-format annotations
val/
├── *.jpg — Validation images
└── labels.json — COCO-format annotations
metadata/
└── stats.csv — Dataset statistics
Each folder entry supports:
- Path — Directory path within the dataset
- Description — What this folder contains
- Files — Specific files or file patterns (
*.jpg,annotations.json)
Annotations
Track what kinds of annotations are available in the dataset:
| Field | Description |
|---|---|
| Annotation Type | Label, bounding box, segmentation mask, keypoint, etc. |
| Count | Number of annotations of this type |
Annotation Manifest
Define the schema for annotation manifest files:
- Has Manifest — Whether the dataset includes a structured manifest
- Manifest Schema — JSON schema definition linked from the Schema Registry
Type-Specific Properties
Video Properties
| Property | Description | Values |
|---|---|---|
| Audio Track | Whether video files include audio | Yes / No |
| Closed Captions | CC availability | Yes / No |
| Subtitles | Subtitle track availability | Yes / No |
| Subtitle Languages | Supported subtitle languages | Required/optional per language |
| Audio Languages | Supported audio languages | Required/optional per language |
| Supported Resolutions | Available video resolutions | Required/optional per resolution |
Languages and resolutions use a record-based format where each entry can be marked as required or optional:
[
{"value": "en", "required": true},
{"value": "fr", "required": false},
{"value": "de", "required": false}
]
Image Properties
| Property | Description | Values |
|---|---|---|
| Color Space | Color model | sRGB, Adobe RGB, ProPhoto RGB, CMYK |
| Bit Depth | Bits per channel | 8, 16, 32 |
| Alpha Channel | Transparency support | Yes / No |
Audio Properties
| Property | Description | Values |
|---|---|---|
| Sample Rates | Supported sample rates | Required/optional per rate |
| Channels | Audio channel configurations | Required/optional per config |
| Lyrics | Lyrics availability | Yes / No |
| Transcription | Transcription availability | Yes / No |
Sample rates and channels use the same required/optional record format as video languages.
Tab Availability
| Feature | Media Types | Notes |
|---|---|---|
| Overview | ✅ | Standard overview |
| Schema | ❌ | Replaced by File Metadata |
| File Metadata | ✅ | Type-specific properties, folder structure, annotations |
| Data Profiling | ❌ | No columnar data to profile |
| Env Diff | ❌ | Not applicable for unstructured data |
| Quality Health | ✅ | Standard quality checks |
| Lineage | ✅ | Track data pipeline dependencies |
| Governance | ✅ | Full governance suite |
| Versions | ✅ | Dataset version tracking |
The File/Dataset type also supports the File Metadata tab but retains the standard Schema tab — making it ideal for structured file formats (CSV, Parquet) where you need both column definitions and file-level metadata.
Content Tab Group
For media types, the Schema tab group is relabeled to Content in the navigation, reflecting the shift from columnar schema to content-oriented metadata.