POST
/
v1
/
transcript
curl -X POST https://api.verbalisai.com/v1/transcript \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://example.com/audio.mp3",
    "model": "mini",
    "language": "auto",
    "timestamp_style": "segment",
    "diarize": false,
    "topics": true,
    "summarization": true,
    "summary_type": "bullets"
  }'
{
  "id": "clx1234567890abcdef",
  "audio_url": "https://example.com/audio.mp3",
  "status": "completed",
  "text": "Hello, this is a sample transcription of your audio file. The quality is excellent and the AI processing has detected relevant topics.",
  "topics": ["technology", "audio processing", "artificial intelligence"],
  "summary": {
    "text": "• Discussion about audio transcription technology\n• Positive feedback on transcription quality\n• Reference to AI processing capabilities",
    "type": "bullets",
    "language": "en"
  },
  "entities": [
    {
      "type": "product",
      "text": "audio file",
      "startIndex": 45,
      "endIndex": 55
    },
    {
      "type": "organization",
      "text": "AI processing",
      "startIndex": 89,
      "endIndex": 102
    }
  ],
  "duration": 12.5,
  "segments": [
    {
      "id": 0,
      "text": "Hello, this is a sample transcription of your audio file.",
      "start": 0.0,
      "end": 5.2,
      "speaker_id": null
    },
    {
      "id": 1,
      "text": "The quality is excellent and the AI processing has detected relevant topics.",
      "start": 5.3,
      "end": 12.5,
      "speaker_id": null
    }
  ]
}
Transcribe audio from a URL and receive a complete transcription with timestamps, AI analysis, and optional features like speaker diarization, topic detection, summarization, and PII redaction.
This endpoint requires API key authentication.
audio_url
string
required
URL to the audio file to transcribe (MP3, WAV, FLAC, M4A, OGG, WEBM, MP4)
model
string
Transcription model to use (‘mini’, ‘nano’, ‘pro’). Default: ‘mini’
language
string
Language code for transcription (e.g., ‘en’, ‘es’, ‘fr’, ‘auto’). Default: ‘auto’
timestamp_style
string
Timestamp granularity (‘segment’ or ‘word’). Default: ‘segment’
diarize
boolean
Enable speaker diarization to identify different speakers. Default: false
audio_start_from
number
Start transcription from this time in seconds. Default: 0
audio_end_at
number
End transcription at this time in seconds. Default: 0 (full audio)
content_safety
boolean
Enable content safety filtering. Default: false
entity_detection
boolean
Enable entity detection (person, location, organization, etc.). Default: false
entity_types
array
Array of entity types to detect when entity_detection is enabled
topics
boolean
Enable topic detection. Default: false
summarization
boolean
Enable text summarization. Default: false
summary_language
string
Language for summary generation. Default: ‘en’
summary_type
string
Summary format (‘bullets’, ‘paragraphs’, ‘markdown’). Default: ‘bullets’
redact_pii
boolean
Enable PII (Personally Identifiable Information) redaction. Default: false
redact_pii_policies
array
Array of PII types to redact when redact_pii is enabled
redact_pii_sub
string
PII substitution method (‘hash’, ‘mask’, ‘remove’). Default: ‘hash’
wait_until_complete
boolean
Wait for complete processing before returning response. Default: false
curl -X POST https://api.verbalisai.com/v1/transcript \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://example.com/audio.mp3",
    "model": "mini",
    "language": "auto",
    "timestamp_style": "segment",
    "diarize": false,
    "topics": true,
    "summarization": true,
    "summary_type": "bullets"
  }'
{
  "id": "clx1234567890abcdef",
  "audio_url": "https://example.com/audio.mp3",
  "status": "completed",
  "text": "Hello, this is a sample transcription of your audio file. The quality is excellent and the AI processing has detected relevant topics.",
  "topics": ["technology", "audio processing", "artificial intelligence"],
  "summary": {
    "text": "• Discussion about audio transcription technology\n• Positive feedback on transcription quality\n• Reference to AI processing capabilities",
    "type": "bullets",
    "language": "en"
  },
  "entities": [
    {
      "type": "product",
      "text": "audio file",
      "startIndex": 45,
      "endIndex": 55
    },
    {
      "type": "organization",
      "text": "AI processing",
      "startIndex": 89,
      "endIndex": 102
    }
  ],
  "duration": 12.5,
  "segments": [
    {
      "id": 0,
      "text": "Hello, this is a sample transcription of your audio file.",
      "start": 0.0,
      "end": 5.2,
      "speaker_id": null
    },
    {
      "id": 1,
      "text": "The quality is excellent and the AI processing has detected relevant topics.",
      "start": 5.3,
      "end": 12.5,
      "speaker_id": null
    }
  ]
}

Response Fields

id
string
Unique transcription identifier
audio_url
string
Original audio URL that was transcribed
status
string
Transcription status (‘completed’, ‘processing’, ‘failed’)
text
string
Complete transcription text (with PII redacted if enabled)
duration
number
Audio duration in seconds
topics
array
Array of detected topics (if topic detection was enabled)
summary
object
entities
array
segments
array
redact_pii_audio
boolean
Indicates if PII redaction was applied
redact_pii_policies
array
Array of PII types that were redacted
redact_pii_sub
string
PII substitution method used (‘hash’, ‘mask’, ‘remove’)

Supported Audio Formats

FormatNotes
MP3Most common format
WAVUncompressed audio
FLACLossless compression
M4AApple audio format
OGGOpen source format
WEBMWeb optimized
MP4Video with audio track

Models Available

ModelDescriptionUse Case
nanoFastest, English-onlyQuick transcriptions, real-time applications
miniBalanced speed/accuracyGeneral purpose transcriptions
proHighest accuracyProfessional transcriptions, critical applications

Entity Types

When entity_detection is enabled, you can detect these entity types:
  • person - Names of people
  • location - Geographic locations
  • organization - Company/organization names
  • event - Events and meetings
  • product - Product names
  • date - Date references
  • phone_number - Phone numbers
  • email - Email addresses
  • url - Web URLs
  • ip_address - IP addresses
  • credit_card - Credit card numbers
  • bank_account - Bank account numbers
  • ssn - Social Security Numbers

PII Redaction Policies

When redact_pii is enabled, these PII policies are available for redaction:
  • Personal identifiers (names, SSN, etc.)
  • Contact information (email, phone, address)
  • Financial information (credit cards, bank accounts)
  • Medical information
  • And more comprehensive PII categories

Notes

  • Audio is processed from publicly accessible URLs
  • Processing time varies with file length and model choice
  • Credits are charged based on audio duration and model used
  • Automatic language detection supports 50+ languages
  • Advanced features like diarization work best with longer audio segments
  • PII redaction is applied to the entire transcription result
  • Audio slicing allows processing specific time ranges without full download