Overview

VerbalisAI’s transcription API provides high-quality speech-to-text conversion with advanced features like timestamps, confidence scores, and AI-powered analysis. This guide covers everything you need to know to get the most out of the transcription service.

Basic Transcription

Simple Upload and Transcribe

The most basic use case is uploading an audio file and getting back the transcribed text:

const formData = new FormData();
formData.append('file', audioFile);

const response = await fetch('https://api.verbalisai.com/v1/transcript', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: formData
});

const result = await response.json();
console.log('Transcription:', result.data.text);

Advanced Features

Language Detection and Specification

VerbalisAI can automatically detect the language or you can specify it for better accuracy:

// Auto-detect language
const formData = new FormData();
formData.append('file', audioFile);

// Or specify language
formData.append('language', 'en'); // English
formData.append('language', 'es'); // Spanish
formData.append('language', 'fr'); // French

Model Selection

Choose between accuracy and speed:

// High accuracy (slower)
formData.append('model', 'accurate');

// Fast processing (less accurate)
formData.append('model', 'fast');

Timestamps and Confidence Scores

Get detailed timing and confidence information:

formData.append('include_timestamps', 'true');
formData.append('include_confidence', 'true');

// Response includes detailed word-level data
const result = await response.json();
result.data.words.forEach(word => {
  console.log(`${word.word}: ${word.start}s - ${word.end}s (confidence: ${word.confidence})`);
});

Working with Results

Understanding the Response Structure

A complete transcription response includes:

Sentence-Level Processing

For applications like subtitle generation or content analysis:

// Get detailed sentence data
const sentencesResponse = await fetch(
  `https://api.verbalisai.com/v1/transcript/id/${transcriptionId}/sentences`,
  {
    headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
  }
);

const sentences = await sentencesResponse.json();

// Generate subtitles
sentences.data.sentences.forEach((sentence, index) => {
  console.log(`${index + 1}`);
  console.log(`${formatTime(sentence.start)} --> ${formatTime(sentence.end)}`);
  console.log(sentence.text);
  console.log('');
});

function formatTime(seconds) {
  const hours = Math.floor(seconds / 3600);
  const minutes = Math.floor((seconds % 3600) / 60);
  const secs = Math.floor(seconds % 60);
  const ms = Math.floor((seconds % 1) * 1000);
  
  return `${hours.toString().padStart(2, '0')}:${minutes.toString().padStart(2, '0')}:${secs.toString().padStart(2, '0')},${ms.toString().padStart(3, '0')}`;
}

Real-World Use Cases

Meeting Transcription

async function transcribeMeeting(audioFile, meetingInfo) {
  const formData = new FormData();
  formData.append('file', audioFile);
  formData.append('language', 'en');
  formData.append('model', 'accurate');
  formData.append('include_timestamps', 'true');
  
  const response = await fetch('https://api.verbalisai.com/v1/transcript', {
    method: 'POST',
    headers: { 'Authorization': 'Bearer YOUR_API_KEY' },
    body: formData
  });
  
  const result = await response.json();
  
  // Process meeting summary
  const summary = {
    title: meetingInfo.title,
    date: new Date().toISOString(),
    duration: result.data.duration,
    participants: meetingInfo.participants,
    transcript: result.data.text,
    keyTopics: result.data.summary.topics,
    actionItems: extractActionItems(result.data.sentences)
  };
  
  return summary;
}

function extractActionItems(sentences) {
  return sentences
    .filter(s => s.text.toLowerCase().includes('action') || 
                 s.text.toLowerCase().includes('todo') ||
                 s.text.toLowerCase().includes('follow up'))
    .map(s => ({
      text: s.text,
      timestamp: s.start
    }));
}

Podcast Processing

async function processPodcast(audioFile) {
  // First, transcribe the audio
  const transcription = await transcribeAudio(audioFile);
  
  // Get detailed sentences for chapters
  const sentences = await fetch(
    `https://api.verbalisai.com/v1/transcript/id/${transcription.data.id}/sentences`,
    { headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }
  ).then(r => r.json());
  
  // Create chapters based on topic changes
  const chapters = createChapters(sentences.data.sentences);
  
  // Generate show notes
  const showNotes = {
    title: extractTitle(transcription.data.text),
    summary: transcription.data.summary,
    chapters: chapters,
    transcript: transcription.data.text,
    keywords: transcription.data.summary.entities
  };
  
  return showNotes;
}

function createChapters(sentences) {
  // Simple chapter detection based on long pauses
  const chapters = [];
  let currentChapter = { start: 0, sentences: [] };
  
  sentences.forEach((sentence, index) => {
    if (index > 0) {
      const gap = sentence.start - sentences[index - 1].end;
      if (gap > 3) { // 3+ second gap indicates chapter break
        currentChapter.end = sentences[index - 1].end;
        currentChapter.title = generateChapterTitle(currentChapter.sentences);
        chapters.push(currentChapter);
        currentChapter = { start: sentence.start, sentences: [] };
      }
    }
    currentChapter.sentences.push(sentence);
  });
  
  return chapters;
}

Best Practices

File Preparation

Audio Quality

  • Use clear, high-quality audio (16kHz+)
  • Minimize background noise
  • Ensure good microphone placement
  • Consider noise reduction preprocessing

File Format

  • MP3 or WAV for best compatibility
  • FLAC for highest quality
  • Avoid heavily compressed formats
  • Keep files under 1GB

Performance Optimization

// For large files, consider chunking
async function transcribeLargeFile(audioFile) {
  if (audioFile.size > 100 * 1024 * 1024) { // > 100MB
    console.log('Large file detected, this may take longer...');
    
    // Consider using fast model for initial pass
    const quickResult = await transcribeWithModel(audioFile, 'fast');
    
    // Then use accurate model if needed
    if (quickResult.data.confidence < 0.9) {
      return await transcribeWithModel(audioFile, 'accurate');
    }
    
    return quickResult;
  }
  
  return await transcribeWithModel(audioFile, 'accurate');
}

Error Handling

async function robustTranscribe(audioFile, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const result = await transcribeAudio(audioFile);
      
      if (result.success) {
        return result;
      }
      
      throw new Error(result.message);
    } catch (error) {
      console.log(`Attempt ${attempt} failed:`, error.message);
      
      if (attempt === maxRetries) {
        throw error;
      }
      
      // Exponential backoff
      await new Promise(resolve => 
        setTimeout(resolve, Math.pow(2, attempt) * 1000)
      );
    }
  }
}

Troubleshooting

Common Issues

Rate Limits and Credits

  • Upload limit: 20 files per 15 minutes
  • Credit usage: ~1 credit per minute of audio
  • Processing time: Usually 10-30% of audio duration

Monitor your usage with the billing endpoints to avoid service interruptions.

Next Steps