Transcription Guide

Overview

VerbalisAI’s transcription API provides high-quality speech-to-text conversion with advanced features like timestamps, confidence scores, and AI-powered analysis. This guide covers everything you need to know to get the most out of the transcription service.

Basic Transcription

Simple Upload and Transcribe

The most basic use case is uploading an audio file and getting back the transcribed text:

const formData = new FormData();
formData.append('file', audioFile);

const response = await fetch('https://api.verbalisai.com/v1/transcript', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: formData
});

const result = await response.json();
console.log('Transcription:', result.data.text);

Advanced Features

Language Detection and Specification

VerbalisAI can automatically detect the language or you can specify it for better accuracy:

// Auto-detect language
const formData = new FormData();
formData.append('file', audioFile);

// Or specify language
formData.append('language', 'en'); // English
formData.append('language', 'es'); // Spanish
formData.append('language', 'fr'); // French

Model Selection

Choose between accuracy and speed:

// High accuracy (slower)
formData.append('model', 'accurate');

// Fast processing (less accurate)
formData.append('model', 'fast');

Timestamps and Confidence Scores

Get detailed timing and confidence information:

formData.append('include_timestamps', 'true');
formData.append('include_confidence', 'true');

// Response includes detailed word-level data
const result = await response.json();
result.data.words.forEach(word => {
  console.log(`${word.word}: ${word.start}s - ${word.end}s (confidence: ${word.confidence})`);
});

Working with Results

Understanding the Response Structure

A complete transcription response includes:

Basic Information

Detailed Breakdown

Metadata

Sentence-Level Processing

For applications like subtitle generation or content analysis:

// Get detailed sentence data
const sentencesResponse = await fetch(
  `https://api.verbalisai.com/v1/transcript/id/${transcriptionId}/sentences`,
  {
    headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
  }
);

const sentences = await sentencesResponse.json();

// Generate subtitles
sentences.data.sentences.forEach((sentence, index) => {
  console.log(`${index + 1}`);
  console.log(`${formatTime(sentence.start)} --> ${formatTime(sentence.end)}`);
  console.log(sentence.text);
  console.log('');
});

function formatTime(seconds) {
  const hours = Math.floor(seconds / 3600);
  const minutes = Math.floor((seconds % 3600) / 60);
  const secs = Math.floor(seconds % 60);
  const ms = Math.floor((seconds % 1) * 1000);
  
  return `${hours.toString().padStart(2, '0')}:${minutes.toString().padStart(2, '0')}:${secs.toString().padStart(2, '0')},${ms.toString().padStart(3, '0')}`;
}

Real-World Use Cases

Meeting Transcription

async function transcribeMeeting(audioFile, meetingInfo) {
  const formData = new FormData();
  formData.append('file', audioFile);
  formData.append('language', 'en');
  formData.append('model', 'accurate');
  formData.append('include_timestamps', 'true');
  
  const response = await fetch('https://api.verbalisai.com/v1/transcript', {
    method: 'POST',
    headers: { 'Authorization': 'Bearer YOUR_API_KEY' },
    body: formData
  });
  
  const result = await response.json();
  
  // Process meeting summary
  const summary = {
    title: meetingInfo.title,
    date: new Date().toISOString(),
    duration: result.data.duration,
    participants: meetingInfo.participants,
    transcript: result.data.text,
    keyTopics: result.data.summary.topics,
    actionItems: extractActionItems(result.data.sentences)
  };
  
  return summary;
}

function extractActionItems(sentences) {
  return sentences
    .filter(s => s.text.toLowerCase().includes('action') || 
                 s.text.toLowerCase().includes('todo') ||
                 s.text.toLowerCase().includes('follow up'))
    .map(s => ({
      text: s.text,
      timestamp: s.start
    }));
}

Podcast Processing

async function processPodcast(audioFile) {
  // First, transcribe the audio
  const transcription = await transcribeAudio(audioFile);
  
  // Get detailed sentences for chapters
  const sentences = await fetch(
    `https://api.verbalisai.com/v1/transcript/id/${transcription.data.id}/sentences`,
    { headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }
  ).then(r => r.json());
  
  // Create chapters based on topic changes
  const chapters = createChapters(sentences.data.sentences);
  
  // Generate show notes
  const showNotes = {
    title: extractTitle(transcription.data.text),
    summary: transcription.data.summary,
    chapters: chapters,
    transcript: transcription.data.text,
    keywords: transcription.data.summary.entities
  };
  
  return showNotes;
}

function createChapters(sentences) {
  // Simple chapter detection based on long pauses
  const chapters = [];
  let currentChapter = { start: 0, sentences: [] };
  
  sentences.forEach((sentence, index) => {
    if (index > 0) {
      const gap = sentence.start - sentences[index - 1].end;
      if (gap > 3) { // 3+ second gap indicates chapter break
        currentChapter.end = sentences[index - 1].end;
        currentChapter.title = generateChapterTitle(currentChapter.sentences);
        chapters.push(currentChapter);
        currentChapter = { start: sentence.start, sentences: [] };
      }
    }
    currentChapter.sentences.push(sentence);
  });
  
  return chapters;
}

Best Practices

File Preparation

Audio Quality

Use clear, high-quality audio (16kHz+)
Minimize background noise
Ensure good microphone placement
Consider noise reduction preprocessing

File Format

MP3 or WAV for best compatibility
FLAC for highest quality
Avoid heavily compressed formats
Keep files under 1GB

Performance Optimization

// For large files, consider chunking
async function transcribeLargeFile(audioFile) {
  if (audioFile.size > 100 * 1024 * 1024) { // > 100MB
    console.log('Large file detected, this may take longer...');
    
    // Consider using fast model for initial pass
    const quickResult = await transcribeWithModel(audioFile, 'fast');
    
    // Then use accurate model if needed
    if (quickResult.data.confidence < 0.9) {
      return await transcribeWithModel(audioFile, 'accurate');
    }
    
    return quickResult;
  }
  
  return await transcribeWithModel(audioFile, 'accurate');
}

Error Handling

async function robustTranscribe(audioFile, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const result = await transcribeAudio(audioFile);
      
      if (result.success) {
        return result;
      }
      
      throw new Error(result.message);
    } catch (error) {
      console.log(`Attempt ${attempt} failed:`, error.message);
      
      if (attempt === maxRetries) {
        throw error;
      }
      
      // Exponential backoff
      await new Promise(resolve => 
        setTimeout(resolve, Math.pow(2, attempt) * 1000)
      );
    }
  }
}

Troubleshooting

Common Issues

Low Confidence Scores

Slow Processing

Missing Words

Rate Limits and Credits

Upload limit: 20 files per 15 minutes
Credit usage: ~1 credit per minute of audio
Processing time: Usually 10-30% of audio duration

Monitor your usage with the billing endpoints to avoid service interruptions.

Next Steps

File Storage Integration

Learn how to integrate R2 storage with transcription

Billing & Credits

Understand pricing and manage your account

API Reference

Detailed API documentation

WebSocket Streaming

Real-time transcription for live audio

Getting Started

Guides

Best Practices

Overview

Basic Transcription

Simple Upload and Transcribe

Advanced Features

Language Detection and Specification

Model Selection

Timestamps and Confidence Scores

Working with Results

Understanding the Response Structure

Sentence-Level Processing

Real-World Use Cases

Meeting Transcription

Podcast Processing

Best Practices

File Preparation

Audio Quality

File Format

Performance Optimization

Error Handling

Troubleshooting

Common Issues

Rate Limits and Credits

Next Steps

File Storage Integration

Billing & Credits

API Reference

WebSocket Streaming

Getting Started

Guides

Best Practices

​Overview

​Basic Transcription

​Simple Upload and Transcribe

​Advanced Features

​Language Detection and Specification

​Model Selection

​Timestamps and Confidence Scores

​Working with Results

​Understanding the Response Structure

​Sentence-Level Processing

​Real-World Use Cases

​Meeting Transcription

​Podcast Processing

​Best Practices

​File Preparation

Audio Quality

File Format

​Performance Optimization

​Error Handling

​Troubleshooting

​Common Issues

​Rate Limits and Credits

​Next Steps

File Storage Integration

Billing & Credits

API Reference

WebSocket Streaming

Overview

Basic Transcription

Simple Upload and Transcribe

Advanced Features

Language Detection and Specification

Model Selection

Timestamps and Confidence Scores

Working with Results

Understanding the Response Structure

Sentence-Level Processing

Real-World Use Cases

Meeting Transcription

Podcast Processing

Best Practices

File Preparation

Performance Optimization

Error Handling

Troubleshooting

Common Issues

Rate Limits and Credits

Next Steps