Transcribe audio from a URL and receive a complete transcription with timestamps, AI analysis, and optional features like speaker diarization, topic detection, summarization, and PII redaction.
This endpoint requires API key authentication.
URL to the audio file to transcribe (MP3, WAV, FLAC, M4A, OGG, WEBM, MP4)
Transcription model to use (‘mini’, ‘nano’, ‘pro’). Default: ‘mini’
Language code for transcription (e.g., ‘en’, ‘es’, ‘fr’, ‘auto’). Default: ‘auto’
Timestamp granularity (‘segment’ or ‘word’). Default: ‘segment’
Enable speaker diarization to identify different speakers. Default: false
Start transcription from this time in seconds. Default: 0
End transcription at this time in seconds. Default: 0 (full audio)
Enable content safety filtering. Default: false
Enable entity detection (person, location, organization, etc.). Default: false
Array of entity types to detect when entity_detection is enabled
Enable topic detection. Default: false
Enable text summarization. Default: false
Language for summary generation. Default: ‘en’
Summary format (‘bullets’, ‘paragraphs’, ‘markdown’). Default: ‘bullets’
Enable PII (Personally Identifiable Information) redaction. Default: false
Array of PII types to redact when redact_pii is enabled
PII substitution method (‘hash’, ‘mask’, ‘remove’). Default: ‘hash’
Wait for complete processing before returning response. Default: false
curl -X POST https://api.verbalisai.com/v1/transcript \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"audio_url": "https://example.com/audio.mp3",
"model": "mini",
"language": "auto",
"timestamp_style": "segment",
"diarize": false,
"topics": true,
"summarization": true,
"summary_type": "bullets"
}'
Success Response
Success Response with PII Redaction
Error Response
Credit Error Response
{
"id" : "clx1234567890abcdef" ,
"audio_url" : "https://example.com/audio.mp3" ,
"status" : "completed" ,
"text" : "Hello, this is a sample transcription of your audio file. The quality is excellent and the AI processing has detected relevant topics." ,
"topics" : [ "technology" , "audio processing" , "artificial intelligence" ],
"summary" : {
"text" : "• Discussion about audio transcription technology \n • Positive feedback on transcription quality \n • Reference to AI processing capabilities" ,
"type" : "bullets" ,
"language" : "en"
},
"entities" : [
{
"type" : "product" ,
"text" : "audio file" ,
"startIndex" : 45 ,
"endIndex" : 55
},
{
"type" : "organization" ,
"text" : "AI processing" ,
"startIndex" : 89 ,
"endIndex" : 102
}
],
"duration" : 12.5 ,
"segments" : [
{
"id" : 0 ,
"text" : "Hello, this is a sample transcription of your audio file." ,
"start" : 0.0 ,
"end" : 5.2 ,
"speaker_id" : null
},
{
"id" : 1 ,
"text" : "The quality is excellent and the AI processing has detected relevant topics." ,
"start" : 5.3 ,
"end" : 12.5 ,
"speaker_id" : null
}
]
}
Response Fields
Unique transcription identifier
Original audio URL that was transcribed
Transcription status (‘completed’, ‘processing’, ‘failed’)
Complete transcription text (with PII redacted if enabled)
Audio duration in seconds
Array of detected topics (if topic detection was enabled)
Show Summary Object (if summarization enabled)
Summary format (‘bullets’, ‘paragraphs’, ‘markdown’)
Show Entity Objects (if entity detection enabled)
Entity type (person, location, organization, etc.)
Start character index in the text
End character index in the text
Speaker identifier (if diarization enabled)
Indicates if PII redaction was applied
Array of PII types that were redacted
PII substitution method used (‘hash’, ‘mask’, ‘remove’)
Format Notes MP3 Most common format WAV Uncompressed audio FLAC Lossless compression M4A Apple audio format OGG Open source format WEBM Web optimized MP4 Video with audio track
Models Available
Model Description Use Case nano
Fastest, English-only Quick transcriptions, real-time applications mini
Balanced speed/accuracy General purpose transcriptions pro
Highest accuracy Professional transcriptions, critical applications
Entity Types
When entity_detection
is enabled, you can detect these entity types:
person
- Names of people
location
- Geographic locations
organization
- Company/organization names
event
- Events and meetings
product
- Product names
date
- Date references
phone_number
- Phone numbers
email
- Email addresses
url
- Web URLs
ip_address
- IP addresses
credit_card
- Credit card numbers
bank_account
- Bank account numbers
ssn
- Social Security Numbers
PII Redaction Policies
When redact_pii
is enabled, these PII policies are available for redaction:
Personal identifiers (names, SSN, etc.)
Contact information (email, phone, address)
Financial information (credit cards, bank accounts)
Medical information
And more comprehensive PII categories
Notes
Audio is processed from publicly accessible URLs
Processing time varies with file length and model choice
Credits are charged based on audio duration and model used
Automatic language detection supports 50+ languages
Advanced features like diarization work best with longer audio segments
PII redaction is applied to the entire transcription result
Audio slicing allows processing specific time ranges without full download