Overview
VerbalisAI’s transcription API provides high-quality speech-to-text conversion with advanced features like timestamps, confidence scores, and AI-powered analysis. This guide covers everything you need to know to get the most out of the transcription service.Basic Transcription
Simple Upload and Transcribe
The most basic use case is uploading an audio file and getting back the transcribed text:Advanced Features
Language Detection and Specification
VerbalisAI can automatically detect the language or you can specify it for better accuracy:Model Selection
Choose between accuracy and speed:Timestamps and Confidence Scores
Get detailed timing and confidence information:Working with Results
Understanding the Response Structure
A complete transcription response includes:Basic Information
Basic Information
text
: Complete transcriptionlanguage
: Detected/specified languageconfidence
: Overall confidence scoreduration
: Audio length in seconds
Detailed Breakdown
Detailed Breakdown
sentences
: Sentence-level segments with timestampswords
: Individual words with precise timingsummary
: AI-generated insights (entities, topics, sentiment)
Metadata
Metadata
processing_time
: How long transcription tookmodel_used
: Which model processed the audiocredits_used
: Cost of the transcription
Sentence-Level Processing
For applications like subtitle generation or content analysis:Real-World Use Cases
Meeting Transcription
Podcast Processing
Best Practices
File Preparation
Audio Quality
- Use clear, high-quality audio (16kHz+)
- Minimize background noise
- Ensure good microphone placement
- Consider noise reduction preprocessing
File Format
- MP3 or WAV for best compatibility
- FLAC for highest quality
- Avoid heavily compressed formats
- Keep files under 1GB
Performance Optimization
Error Handling
Troubleshooting
Common Issues
Low Confidence Scores
Low Confidence Scores
Causes:
- Poor audio quality
- Background noise
- Multiple speakers
- Strong accents
- Use noise reduction
- Try different language settings
- Use accurate model
- Process in smaller segments
Slow Processing
Slow Processing
Causes:
- Large file size
- High server load
- Accurate model selected
- Use fast model for quick results
- Process during off-peak hours
- Break large files into chunks
- Use appropriate file compression
Missing Words
Missing Words
Causes:
- Soft speech
- Technical terms
- Poor audio quality
- Increase input volume
- Use accurate model
- Specify correct language
- Add custom vocabulary (coming soon)
Rate Limits and Credits
- Upload limit: 20 files per 15 minutes
- Credit usage: ~1 credit per minute of audio
- Processing time: Usually 10-30% of audio duration