Managing Datasets
This guide covers the day-to-day operations for working with master data datasets: editing schemas, managing versions, importing and exporting data, and configuring governance.
Schema Editing
Each dataset has a typed schema that defines its columns:
Adding Columns
- Open the dataset and navigate to the Schema tab
- Click Add Column
- Provide a column name and select a type (text, number, boolean, date, etc.)
- Optionally mark as required
Modifying Columns
- Rename a column by clicking its header
- Change the column type (data is coerced where possible)
- Reorder columns with drag-and-drop
Removing Columns
- Click the column menu (⋮) and select Delete Column
- Existing data in that column is removed on next publish
Documentation and Tags
Each dataset supports both a short and long form of documentation:
- Short documentation is the concise summary shown in list views and headers
- Long documentation is the richer, multiline narrative for business context, stewardship notes, and usage guidance
You can also assign tags directly on the dataset detail page to support search, filtering, and policy-driven classification. Tags are shared across the catalog, so reusing existing names keeps the taxonomy consistent.
Data Entry
Edit data directly in the browser with a spreadsheet-style interface:
- Inline editing — Click a cell to edit
- Row operations — Add, duplicate, or delete rows
- Bulk paste — Paste tabular data from a spreadsheet
All edits are saved to the current draft — they don't affect the published version until you explicitly publish.
Versioning
Publishing a Version
- Click Publish when your draft is ready
- Add an optional version note describing the changes
- The draft becomes a new immutable version (e.g., v3)
- A new draft is automatically created for future edits
Comparing Versions
Click Version History to see all previous versions, then select two versions to compare:
- Added rows — highlighted in green
- Modified rows — highlighted in yellow, with per-cell diffs
- Deleted rows — highlighted in red
- Schema changes — column additions, removals, and type changes
Reverting
To revert to a previous version, open it from the version history and click Restore as Draft. This replaces the current draft with the selected version's data.
Import & Export
Importing Data
Supported formats: CSV, XLSX, JSON
- Click Import on the dataset toolbar
- Upload a file or drag-and-drop
- Map columns (auto-matched by header name)
- Preview the import and confirm
Imports replace the current draft data. The published version is unaffected until you publish.
Exporting Data
- Click Export on the dataset toolbar
- Choose format (CSV, XLSX, or JSON)
- Select sync mode:
- Full snapshot — exports all rows
- Incremental — exports only changes since the last export
- Configure frequency: on-demand, on-change, hourly, or daily
Scheduled exports run automatically via the platform's background job system.
Governance
Governance Mode
Each dataset has a governance mode that controls how changes are handled:
| Mode | Behaviour |
|---|---|
| Direct Edit | Anyone with edit access can modify and publish |
| Approval Required | Changes require approval before publishing |
Switch modes from the Governance tab on the dataset detail page.
Role Assignments
Datasets support the same role-based governance as data products:
- Owner — Full control, can change governance mode
- Steward — Can edit schema and data, approve changes
- Viewer — Read-only access
Chatter
Each dataset has a built-in discussion thread (chatter) for team collaboration:
- Comment on data quality, schema decisions, or upcoming changes
- @mention teammates for notifications
- Full comment history is preserved with the dataset
Audit Trail
All dataset operations are logged in the platform's audit log:
- Schema changes
- Data modifications
- Publish events
- Governance mode changes
- Export operations
Access the audit log from the dataset detail page or from the global Audit Log in admin settings.