POST
/
v1
/
transcript
curl -X POST https://api.verbalisai.com/v1/transcript \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://example.com/audio.mp3",
    "model": "mini",
    "language": "auto",
    "timestamp_style": "segment",
    "diarize": false,
    "topics": true,
    "summarization": true,
    "summary_type": "bullets"
  }'
{
  "id": "clx1234567890abcdef",
  "audio_url": "https://example.com/audio.mp3",
  "status": "completed",
  "text": "Hello, this is a sample transcription of your audio file. The quality is excellent and the AI processing has detected relevant topics.",
  "topics": ["technology", "audio processing", "artificial intelligence"],
  "summary": {
    "text": "• Discussion about audio transcription technology\n• Positive feedback on transcription quality\n• Reference to AI processing capabilities",
    "type": "bullets",
    "language": "en"
  },
  "entities": [
    {
      "type": "product",
      "text": "audio file",
      "startIndex": 45,
      "endIndex": 55
    },
    {
      "type": "organization",
      "text": "AI processing",
      "startIndex": 89,
      "endIndex": 102
    }
  ],
  "duration": 12.5,
  "segments": [
    {
      "id": 0,
      "text": "Hello, this is a sample transcription of your audio file.",
      "start": 0.0,
      "end": 5.2,
      "speaker_id": null
    },
    {
      "id": 1,
      "text": "The quality is excellent and the AI processing has detected relevant topics.",
      "start": 5.3,
      "end": 12.5,
      "speaker_id": null
    }
  ]
}

Transcribe audio from a URL and receive a complete transcription with timestamps, AI analysis, and optional features like speaker diarization, topic detection, summarization, and PII redaction.

This endpoint requires API key authentication.

audio_url
string
required

URL to the audio file to transcribe (MP3, WAV, FLAC, M4A, OGG, WEBM, MP4)

model
string

Transcription model to use (‘mini’, ‘nano’, ‘pro’). Default: ‘mini’

language
string

Language code for transcription (e.g., ‘en’, ‘es’, ‘fr’, ‘auto’). Default: ‘auto’

timestamp_style
string

Timestamp granularity (‘segment’ or ‘word’). Default: ‘segment’

diarize
boolean

Enable speaker diarization to identify different speakers. Default: false

audio_start_from
number

Start transcription from this time in seconds. Default: 0

audio_end_at
number

End transcription at this time in seconds. Default: 0 (full audio)

content_safety
boolean

Enable content safety filtering. Default: false

entity_detection
boolean

Enable entity detection (person, location, organization, etc.). Default: false

entity_types
array

Array of entity types to detect when entity_detection is enabled

topics
boolean

Enable topic detection. Default: false

summarization
boolean

Enable text summarization. Default: false

summary_language
string

Language for summary generation. Default: ‘en’

summary_type
string

Summary format (‘bullets’, ‘paragraphs’, ‘markdown’). Default: ‘bullets’

redact_pii
boolean

Enable PII (Personally Identifiable Information) redaction. Default: false

redact_pii_policies
array

Array of PII types to redact when redact_pii is enabled

redact_pii_sub
string

PII substitution method (‘hash’, ‘mask’, ‘remove’). Default: ‘hash’

wait_until_complete
boolean

Wait for complete processing before returning response. Default: false

curl -X POST https://api.verbalisai.com/v1/transcript \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://example.com/audio.mp3",
    "model": "mini",
    "language": "auto",
    "timestamp_style": "segment",
    "diarize": false,
    "topics": true,
    "summarization": true,
    "summary_type": "bullets"
  }'
{
  "id": "clx1234567890abcdef",
  "audio_url": "https://example.com/audio.mp3",
  "status": "completed",
  "text": "Hello, this is a sample transcription of your audio file. The quality is excellent and the AI processing has detected relevant topics.",
  "topics": ["technology", "audio processing", "artificial intelligence"],
  "summary": {
    "text": "• Discussion about audio transcription technology\n• Positive feedback on transcription quality\n• Reference to AI processing capabilities",
    "type": "bullets",
    "language": "en"
  },
  "entities": [
    {
      "type": "product",
      "text": "audio file",
      "startIndex": 45,
      "endIndex": 55
    },
    {
      "type": "organization",
      "text": "AI processing",
      "startIndex": 89,
      "endIndex": 102
    }
  ],
  "duration": 12.5,
  "segments": [
    {
      "id": 0,
      "text": "Hello, this is a sample transcription of your audio file.",
      "start": 0.0,
      "end": 5.2,
      "speaker_id": null
    },
    {
      "id": 1,
      "text": "The quality is excellent and the AI processing has detected relevant topics.",
      "start": 5.3,
      "end": 12.5,
      "speaker_id": null
    }
  ]
}

Response Fields

id
string

Unique transcription identifier

audio_url
string

Original audio URL that was transcribed

status
string

Transcription status (‘completed’, ‘processing’, ‘failed’)

text
string

Complete transcription text (with PII redacted if enabled)

duration
number

Audio duration in seconds

topics
array

Array of detected topics (if topic detection was enabled)

summary
object
entities
array
segments
array
redact_pii_audio
boolean

Indicates if PII redaction was applied

redact_pii_policies
array

Array of PII types that were redacted

redact_pii_sub
string

PII substitution method used (‘hash’, ‘mask’, ‘remove’)

Supported Audio Formats

FormatNotes
MP3Most common format
WAVUncompressed audio
FLACLossless compression
M4AApple audio format
OGGOpen source format
WEBMWeb optimized
MP4Video with audio track

Models Available

ModelDescriptionUse Case
nanoFastest, English-onlyQuick transcriptions, real-time applications
miniBalanced speed/accuracyGeneral purpose transcriptions
proHighest accuracyProfessional transcriptions, critical applications

Entity Types

When entity_detection is enabled, you can detect these entity types:

  • person - Names of people
  • location - Geographic locations
  • organization - Company/organization names
  • event - Events and meetings
  • product - Product names
  • date - Date references
  • phone_number - Phone numbers
  • email - Email addresses
  • url - Web URLs
  • ip_address - IP addresses
  • credit_card - Credit card numbers
  • bank_account - Bank account numbers
  • ssn - Social Security Numbers

PII Redaction Policies

When redact_pii is enabled, these PII policies are available for redaction:

  • Personal identifiers (names, SSN, etc.)
  • Contact information (email, phone, address)
  • Financial information (credit cards, bank accounts)
  • Medical information
  • And more comprehensive PII categories

Notes

  • Audio is processed from publicly accessible URLs
  • Processing time varies with file length and model choice
  • Credits are charged based on audio duration and model used
  • Automatic language detection supports 50+ languages
  • Advanced features like diarization work best with longer audio segments
  • PII redaction is applied to the entire transcription result
  • Audio slicing allows processing specific time ranges without full download