Skip to main contentDatasets
Datasets are the foundation of every fine-tune. LLMTune supports conversational, document, and classification data through JSONL, CSV, or text uploads.
Upload Workflow
- Drag-and-drop files or pick from cloud storage (S3, GCS) if connected.
- Choose a dataset name and optional description.
- LLMTune runs schema detection to identify roles, prompts, responses, and metadata.
- Review profiling output:
- Sample rows
- Token estimates
- Schema anomalies
- Each upload becomes a version. You can rollback or compare changes.
- Add tags (e.g.,
priority:high, channel:support) to filter subsets later.
- Use the dataset editor to annotate, redact, or merge records.
Blending Sources
During fine-tuning you can blend multiple datasets by assigning weights. For example, mix customer support conversations with policy documents to enforce tone.
Quality Controls
- Flag samples: Mark problematic records for follow-up.
- Mask fields: Replace sensitive data (emails, account numbers) with placeholders before training.
- Evaluate coverage: Use the analytics to check label distribution and conversation depth.