Skip to main content

Media & Content Types

Media and content types are designed for managing rich, unstructured datasets used in computer vision, NLP, audio processing, 3D modeling, and content production. All media types share the File Metadata extension and unlock a dedicated metadata tab.

Available Types

Video

Video datasets, recordings, and collections.

  • Icon: Video · Color: #e11d48 (rose)
  • Use cases: Training data for action recognition, object tracking, autonomous driving, content production

Image

Image datasets and collections.

  • Icon: Image · Color: #0891b2 (cyan)
  • Use cases: Computer vision training data, satellite imagery, medical imaging, design assets

Audio

Audio datasets and recordings.

  • Icon: Music · Color: #7c3aed (violet)
  • Use cases: Speech recognition training, music analysis, environmental sound classification

Text Collection

Text corpora and document collections.

  • Icon: FileText · Color: #059669 (emerald)
  • Use cases: NLP training data, document archives, knowledge bases, translation corpora

3D Model

3D models, point clouds, and spatial datasets.

  • Icon: Box · Color: #d97706 (amber)
  • Use cases: Autonomous driving LiDAR data, architectural models, game assets, medical 3D scans

File Metadata Tab

All media types unlock the File Metadata tab, which replaces the standard Schema tab. This tab is organized into several sections:

Common Properties

These properties are shared across all media types:

PropertyDescriptionExample
Media FormatPrimary file formatJPEG, MP4, WAV, USDZ
Total FilesNumber of files in the dataset50,000
Total SizeCombined dataset size2.4 TB
File MetadataTechnical metadata (codec, bitrate, resolution){"codec": "H.264", "resolution": "1920x1080"}
EXIF MetadataEmbedded camera/device metadata{"camera_model": "Canon EOS R5", "iso": 400}

Folder Structure

Define the dataset's directory layout with descriptions and file patterns:

train/
├── *.jpg — Training images
└── labels.json — COCO-format annotations
val/
├── *.jpg — Validation images
└── labels.json — COCO-format annotations
metadata/
└── stats.csv — Dataset statistics

Each folder entry supports:

  • Path — Directory path within the dataset
  • Description — What this folder contains
  • Files — Specific files or file patterns (*.jpg, annotations.json)

Annotations

Track what kinds of annotations are available in the dataset:

FieldDescription
Annotation TypeLabel, bounding box, segmentation mask, keypoint, etc.
CountNumber of annotations of this type

Annotation Manifest

Define the schema for annotation manifest files:

  • Has Manifest — Whether the dataset includes a structured manifest
  • Manifest Schema — JSON schema definition linked from the Schema Registry

Type-Specific Properties

Video Properties

PropertyDescriptionValues
Audio TrackWhether video files include audioYes / No
Closed CaptionsCC availabilityYes / No
SubtitlesSubtitle track availabilityYes / No
Subtitle LanguagesSupported subtitle languagesRequired/optional per language
Audio LanguagesSupported audio languagesRequired/optional per language
Supported ResolutionsAvailable video resolutionsRequired/optional per resolution

Languages and resolutions use a record-based format where each entry can be marked as required or optional:

[
{"value": "en", "required": true},
{"value": "fr", "required": false},
{"value": "de", "required": false}
]

Image Properties

PropertyDescriptionValues
Color SpaceColor modelsRGB, Adobe RGB, ProPhoto RGB, CMYK
Bit DepthBits per channel8, 16, 32
Alpha ChannelTransparency supportYes / No

Audio Properties

PropertyDescriptionValues
Sample RatesSupported sample ratesRequired/optional per rate
ChannelsAudio channel configurationsRequired/optional per config
LyricsLyrics availabilityYes / No
TranscriptionTranscription availabilityYes / No

Sample rates and channels use the same required/optional record format as video languages.

Tab Availability

FeatureMedia TypesNotes
OverviewStandard overview
SchemaReplaced by File Metadata
File MetadataType-specific properties, folder structure, annotations
Data ProfilingNo columnar data to profile
Env DiffNot applicable for unstructured data
Quality HealthStandard quality checks
LineageTrack data pipeline dependencies
GovernanceFull governance suite
VersionsDataset version tracking
File/Dataset hybrid type

The File/Dataset type also supports the File Metadata tab but retains the standard Schema tab — making it ideal for structured file formats (CSV, Parquet) where you need both column definitions and file-level metadata.

Content Tab Group

For media types, the Schema tab group is relabeled to Content in the navigation, reflecting the shift from columnar schema to content-oriented metadata.