This endpoint requires API key authentication.
URL to the audio file to transcribe (MP3, WAV, FLAC, M4A, OGG, WEBM, MP4)
Transcription model to use (‘mini’, ‘nano’, ‘pro’). Default: ‘mini’
Language code for transcription (e.g., ‘en’, ‘es’, ‘fr’, ‘auto’). Default: ‘auto’
Timestamp granularity (‘segment’ or ‘word’). Default: ‘segment’
Enable speaker diarization to identify different speakers. Default: false
Start transcription from this time in seconds. Default: 0
End transcription at this time in seconds. Default: 0 (full audio)
Enable content safety filtering. Default: false
Enable entity detection (person, location, organization, etc.). Default: false
Array of entity types to detect when entity_detection is enabled
Enable topic detection. Default: false
Enable text summarization. Default: false
Language for summary generation. Default: ‘en’
Summary format (‘bullets’, ‘paragraphs’, ‘markdown’). Default: ‘bullets’
Enable PII (Personally Identifiable Information) redaction. Default: false
Array of PII types to redact when redact_pii is enabled
PII substitution method (‘hash’, ‘mask’, ‘remove’). Default: ‘hash’
Wait for complete processing before returning response. Default: false
Response Fields
Unique transcription identifier
Original audio URL that was transcribed
Transcription status (‘completed’, ‘processing’, ‘failed’)
Complete transcription text (with PII redacted if enabled)
Audio duration in seconds
Array of detected topics (if topic detection was enabled)
Indicates if PII redaction was applied
Array of PII types that were redacted
PII substitution method used (‘hash’, ‘mask’, ‘remove’)
Supported Audio Formats
Format | Notes |
---|---|
MP3 | Most common format |
WAV | Uncompressed audio |
FLAC | Lossless compression |
M4A | Apple audio format |
OGG | Open source format |
WEBM | Web optimized |
MP4 | Video with audio track |
Models Available
Model | Description | Use Case |
---|---|---|
nano | Fastest, English-only | Quick transcriptions, real-time applications |
mini | Balanced speed/accuracy | General purpose transcriptions |
pro | Highest accuracy | Professional transcriptions, critical applications |
Entity Types
Whenentity_detection
is enabled, you can detect these entity types:
person
- Names of peoplelocation
- Geographic locationsorganization
- Company/organization namesevent
- Events and meetingsproduct
- Product namesdate
- Date referencesphone_number
- Phone numbersemail
- Email addressesurl
- Web URLsip_address
- IP addressescredit_card
- Credit card numbersbank_account
- Bank account numbersssn
- Social Security Numbers
PII Redaction Policies
Whenredact_pii
is enabled, these PII policies are available for redaction:
- Personal identifiers (names, SSN, etc.)
- Contact information (email, phone, address)
- Financial information (credit cards, bank accounts)
- Medical information
- And more comprehensive PII categories
Notes
- Audio is processed from publicly accessible URLs
- Processing time varies with file length and model choice
- Credits are charged based on audio duration and model used
- Automatic language detection supports 50+ languages
- Advanced features like diarization work best with longer audio segments
- PII redaction is applied to the entire transcription result
- Audio slicing allows processing specific time ranges without full download