Media & Content Types

Media and content types are designed for managing rich, unstructured datasets used in computer vision, NLP, audio processing, 3D modeling, and content production. All media types share the File Metadata extension and unlock a dedicated metadata tab.

Available Types

Video

Video datasets, recordings, and collections.

Icon: Video · Color: #e11d48 (rose)
Use cases: Training data for action recognition, object tracking, autonomous driving, content production

Image

Image datasets and collections.

Icon: Image · Color: #0891b2 (cyan)
Use cases: Computer vision training data, satellite imagery, medical imaging, design assets

Audio

Audio datasets and recordings.

Icon: Music · Color: #7c3aed (violet)
Use cases: Speech recognition training, music analysis, environmental sound classification

Text Collection

Text corpora and document collections.

Icon: FileText · Color: #059669 (emerald)
Use cases: NLP training data, document archives, knowledge bases, translation corpora

3D Model

3D models, point clouds, and spatial datasets.

Icon: Box · Color: #d97706 (amber)
Use cases: Autonomous driving LiDAR data, architectural models, game assets, medical 3D scans

File Metadata Tab

All media types unlock the File Metadata tab, which replaces the standard Schema tab. This tab is organized into several sections:

Common Properties

These properties are shared across all media types:

Property	Description	Example
Media Format	Primary file format	JPEG, MP4, WAV, USDZ
Total Files	Number of files in the dataset	50,000
Total Size	Combined dataset size	2.4 TB
File Metadata	Technical metadata (codec, bitrate, resolution)	`{"codec": "H.264", "resolution": "1920x1080"}`
EXIF Metadata	Embedded camera/device metadata	`{"camera_model": "Canon EOS R5", "iso": 400}`

Folder Structure

Define the dataset's directory layout with descriptions and file patterns:

train/
  ├── *.jpg          — Training images
  └── labels.json    — COCO-format annotations
val/
  ├── *.jpg          — Validation images
  └── labels.json    — COCO-format annotations
metadata/
  └── stats.csv      — Dataset statistics

Each folder entry supports:

Path — Directory path within the dataset
Description — What this folder contains
Files — Specific files or file patterns (*.jpg, annotations.json)

Annotations

Track what kinds of annotations are available in the dataset:

Field	Description
Annotation Type	Label, bounding box, segmentation mask, keypoint, etc.
Count	Number of annotations of this type

Annotation Manifest

Define the schema for annotation manifest files:

Has Manifest — Whether the dataset includes a structured manifest
Manifest Schema — JSON schema definition linked from the Schema Registry

Type-Specific Properties

Video Properties

Property	Description	Values
Audio Track	Whether video files include audio	Yes / No
Closed Captions	CC availability	Yes / No
Subtitles	Subtitle track availability	Yes / No
Subtitle Languages	Supported subtitle languages	Required/optional per language
Audio Languages	Supported audio languages	Required/optional per language
Supported Resolutions	Available video resolutions	Required/optional per resolution

Languages and resolutions use a record-based format where each entry can be marked as required or optional:

[
  {"value": "en", "required": true},
  {"value": "fr", "required": false},
  {"value": "de", "required": false}
]

Image Properties

Property	Description	Values
Color Space	Color model	sRGB, Adobe RGB, ProPhoto RGB, CMYK
Bit Depth	Bits per channel	8, 16, 32
Alpha Channel	Transparency support	Yes / No

Audio Properties

Property	Description	Values
Sample Rates	Supported sample rates	Required/optional per rate
Channels	Audio channel configurations	Required/optional per config
Lyrics	Lyrics availability	Yes / No
Transcription	Transcription availability	Yes / No

Sample rates and channels use the same required/optional record format as video languages.

Tab Availability

Feature	Media Types	Notes
Overview	✅	Standard overview
Schema	❌	Replaced by File Metadata
File Metadata	✅	Type-specific properties, folder structure, annotations
Data Profiling	❌	No columnar data to profile
Env Diff	❌	Not applicable for unstructured data
Quality Health	✅	Standard quality checks
Lineage	✅	Track data pipeline dependencies
Governance	✅	Full governance suite
Versions	✅	Dataset version tracking

File/Dataset hybrid type

The File/Dataset type also supports the File Metadata tab but retains the standard Schema tab — making it ideal for structured file formats (CSV, Parquet) where you need both column definitions and file-level metadata.

Content Tab Group

For media types, the Schema tab group is relabeled to Content in the navigation, reflecting the shift from columnar schema to content-oriented metadata.

Available Types​

Video​

Image​

Audio​

Text Collection​

3D Model​

File Metadata Tab​

Common Properties​

Folder Structure​

Annotations​

Annotation Manifest​

Type-Specific Properties​

Video Properties​

Image Properties​

Audio Properties​

Tab Availability​

Content Tab Group​