Overview
VerbalisAI’s transcription API provides high-quality speech-to-text conversion with advanced features like timestamps, confidence scores, and AI-powered analysis. This guide covers everything you need to know to get the most out of the transcription service.
Basic Transcription
Simple Upload and Transcribe
The most basic use case is uploading an audio file and getting back the transcribed text:
const formData = new FormData ();
formData . append ( 'file' , audioFile );
const response = await fetch ( 'https://api.verbalisai.com/v1/transcript' , {
method: 'POST' ,
headers: {
'Authorization' : 'Bearer YOUR_API_KEY'
},
body: formData
});
const result = await response . json ();
console . log ( 'Transcription:' , result . data . text );
Advanced Features
Language Detection and Specification
VerbalisAI can automatically detect the language or you can specify it for better accuracy:
// Auto-detect language
const formData = new FormData ();
formData . append ( 'file' , audioFile );
// Or specify language
formData . append ( 'language' , 'en' ); // English
formData . append ( 'language' , 'es' ); // Spanish
formData . append ( 'language' , 'fr' ); // French
Model Selection
Choose between accuracy and speed:
// High accuracy (slower)
formData . append ( 'model' , 'accurate' );
// Fast processing (less accurate)
formData . append ( 'model' , 'fast' );
Timestamps and Confidence Scores
Get detailed timing and confidence information:
formData . append ( 'include_timestamps' , 'true' );
formData . append ( 'include_confidence' , 'true' );
// Response includes detailed word-level data
const result = await response . json ();
result . data . words . forEach ( word => {
console . log ( ` ${ word . word } : ${ word . start } s - ${ word . end } s (confidence: ${ word . confidence } )` );
});
Working with Results
Understanding the Response Structure
A complete transcription response includes:
sentences
: Sentence-level segments with timestamps
words
: Individual words with precise timing
summary
: AI-generated insights (entities, topics, sentiment)
Sentence-Level Processing
For applications like subtitle generation or content analysis:
// Get detailed sentence data
const sentencesResponse = await fetch (
`https://api.verbalisai.com/v1/transcript/id/ ${ transcriptionId } /sentences` ,
{
headers: { 'Authorization' : 'Bearer YOUR_API_KEY' }
}
);
const sentences = await sentencesResponse . json ();
// Generate subtitles
sentences . data . sentences . forEach (( sentence , index ) => {
console . log ( ` ${ index + 1 } ` );
console . log ( ` ${ formatTime ( sentence . start ) } --> ${ formatTime ( sentence . end ) } ` );
console . log ( sentence . text );
console . log ( '' );
});
function formatTime ( seconds ) {
const hours = Math . floor ( seconds / 3600 );
const minutes = Math . floor (( seconds % 3600 ) / 60 );
const secs = Math . floor ( seconds % 60 );
const ms = Math . floor (( seconds % 1 ) * 1000 );
return ` ${ hours . toString (). padStart ( 2 , '0' ) } : ${ minutes . toString (). padStart ( 2 , '0' ) } : ${ secs . toString (). padStart ( 2 , '0' ) } , ${ ms . toString (). padStart ( 3 , '0' ) } ` ;
}
Real-World Use Cases
Meeting Transcription
async function transcribeMeeting ( audioFile , meetingInfo ) {
const formData = new FormData ();
formData . append ( 'file' , audioFile );
formData . append ( 'language' , 'en' );
formData . append ( 'model' , 'accurate' );
formData . append ( 'include_timestamps' , 'true' );
const response = await fetch ( 'https://api.verbalisai.com/v1/transcript' , {
method: 'POST' ,
headers: { 'Authorization' : 'Bearer YOUR_API_KEY' },
body: formData
});
const result = await response . json ();
// Process meeting summary
const summary = {
title: meetingInfo . title ,
date: new Date (). toISOString (),
duration: result . data . duration ,
participants: meetingInfo . participants ,
transcript: result . data . text ,
keyTopics: result . data . summary . topics ,
actionItems: extractActionItems ( result . data . sentences )
};
return summary ;
}
function extractActionItems ( sentences ) {
return sentences
. filter ( s => s . text . toLowerCase (). includes ( 'action' ) ||
s . text . toLowerCase (). includes ( 'todo' ) ||
s . text . toLowerCase (). includes ( 'follow up' ))
. map ( s => ({
text: s . text ,
timestamp: s . start
}));
}
Podcast Processing
async function processPodcast ( audioFile ) {
// First, transcribe the audio
const transcription = await transcribeAudio ( audioFile );
// Get detailed sentences for chapters
const sentences = await fetch (
`https://api.verbalisai.com/v1/transcript/id/ ${ transcription . data . id } /sentences` ,
{ headers: { 'Authorization' : 'Bearer YOUR_API_KEY' } }
). then ( r => r . json ());
// Create chapters based on topic changes
const chapters = createChapters ( sentences . data . sentences );
// Generate show notes
const showNotes = {
title: extractTitle ( transcription . data . text ),
summary: transcription . data . summary ,
chapters: chapters ,
transcript: transcription . data . text ,
keywords: transcription . data . summary . entities
};
return showNotes ;
}
function createChapters ( sentences ) {
// Simple chapter detection based on long pauses
const chapters = [];
let currentChapter = { start: 0 , sentences: [] };
sentences . forEach (( sentence , index ) => {
if ( index > 0 ) {
const gap = sentence . start - sentences [ index - 1 ]. end ;
if ( gap > 3 ) { // 3+ second gap indicates chapter break
currentChapter . end = sentences [ index - 1 ]. end ;
currentChapter . title = generateChapterTitle ( currentChapter . sentences );
chapters . push ( currentChapter );
currentChapter = { start: sentence . start , sentences: [] };
}
}
currentChapter . sentences . push ( sentence );
});
return chapters ;
}
Best Practices
File Preparation
Audio Quality
Use clear, high-quality audio (16kHz+)
Minimize background noise
Ensure good microphone placement
Consider noise reduction preprocessing
File Format
MP3 or WAV for best compatibility
FLAC for highest quality
Avoid heavily compressed formats
Keep files under 1GB
// For large files, consider chunking
async function transcribeLargeFile ( audioFile ) {
if ( audioFile . size > 100 * 1024 * 1024 ) { // > 100MB
console . log ( 'Large file detected, this may take longer...' );
// Consider using fast model for initial pass
const quickResult = await transcribeWithModel ( audioFile , 'fast' );
// Then use accurate model if needed
if ( quickResult . data . confidence < 0.9 ) {
return await transcribeWithModel ( audioFile , 'accurate' );
}
return quickResult ;
}
return await transcribeWithModel ( audioFile , 'accurate' );
}
Error Handling
async function robustTranscribe ( audioFile , maxRetries = 3 ) {
for ( let attempt = 1 ; attempt <= maxRetries ; attempt ++ ) {
try {
const result = await transcribeAudio ( audioFile );
if ( result . success ) {
return result ;
}
throw new Error ( result . message );
} catch ( error ) {
console . log ( `Attempt ${ attempt } failed:` , error . message );
if ( attempt === maxRetries ) {
throw error ;
}
// Exponential backoff
await new Promise ( resolve =>
setTimeout ( resolve , Math . pow ( 2 , attempt ) * 1000 )
);
}
}
}
Troubleshooting
Common Issues
Causes:
Poor audio quality
Background noise
Multiple speakers
Strong accents
Solutions:
Use noise reduction
Try different language settings
Use accurate model
Process in smaller segments
Causes:
Large file size
High server load
Accurate model selected
Solutions:
Use fast model for quick results
Process during off-peak hours
Break large files into chunks
Use appropriate file compression
Causes:
Soft speech
Technical terms
Poor audio quality
Solutions:
Increase input volume
Use accurate model
Specify correct language
Add custom vocabulary (coming soon)
Rate Limits and Credits
Upload limit : 20 files per 15 minutes
Credit usage : ~1 credit per minute of audio
Processing time : Usually 10-30% of audio duration
Monitor your usage with the billing endpoints to avoid service interruptions.
Next Steps
Responses are generated using AI and may contain mistakes.