Create Transcription

curl -X POST https://api.verbalisai.com/v1/transcript \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://example.com/audio.mp3",
    "model": "mini",
    "language": "auto",
    "timestamp_style": "segment",
    "diarize": false,
    "topics": true,
    "summarization": true,
    "summary_type": "bullets"
  }'

{
  "id": "clx1234567890abcdef",
  "audio_url": "https://example.com/audio.mp3",
  "status": "completed",
  "text": "Hello, this is a sample transcription of your audio file. The quality is excellent and the AI processing has detected relevant topics.",
  "topics": ["technology", "audio processing", "artificial intelligence"],
  "summary": {
    "text": "• Discussion about audio transcription technology\n• Positive feedback on transcription quality\n• Reference to AI processing capabilities",
    "type": "bullets",
    "language": "en"
  },
  "entities": [
    {
      "type": "product",
      "text": "audio file",
      "startIndex": 45,
      "endIndex": 55
    },
    {
      "type": "organization",
      "text": "AI processing",
      "startIndex": 89,
      "endIndex": 102
    }
  ],
  "duration": 12.5,
  "segments": [
    {
      "id": 0,
      "text": "Hello, this is a sample transcription of your audio file.",
      "start": 0.0,
      "end": 5.2,
      "speaker_id": null
    },
    {
      "id": 1,
      "text": "The quality is excellent and the AI processing has detected relevant topics.",
      "start": 5.3,
      "end": 12.5,
      "speaker_id": null
    }
  ]
}

POST

transcript

curl -X POST https://api.verbalisai.com/v1/transcript \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://example.com/audio.mp3",
    "model": "mini",
    "language": "auto",
    "timestamp_style": "segment",
    "diarize": false,
    "topics": true,
    "summarization": true,
    "summary_type": "bullets"
  }'

{
  "id": "clx1234567890abcdef",
  "audio_url": "https://example.com/audio.mp3",
  "status": "completed",
  "text": "Hello, this is a sample transcription of your audio file. The quality is excellent and the AI processing has detected relevant topics.",
  "topics": ["technology", "audio processing", "artificial intelligence"],
  "summary": {
    "text": "• Discussion about audio transcription technology\n• Positive feedback on transcription quality\n• Reference to AI processing capabilities",
    "type": "bullets",
    "language": "en"
  },
  "entities": [
    {
      "type": "product",
      "text": "audio file",
      "startIndex": 45,
      "endIndex": 55
    },
    {
      "type": "organization",
      "text": "AI processing",
      "startIndex": 89,
      "endIndex": 102
    }
  ],
  "duration": 12.5,
  "segments": [
    {
      "id": 0,
      "text": "Hello, this is a sample transcription of your audio file.",
      "start": 0.0,
      "end": 5.2,
      "speaker_id": null
    },
    {
      "id": 1,
      "text": "The quality is excellent and the AI processing has detected relevant topics.",
      "start": 5.3,
      "end": 12.5,
      "speaker_id": null
    }
  ]
}

Transcribe audio from a URL and receive a complete transcription with timestamps, AI analysis, and optional features like speaker diarization, topic detection, summarization, and PII redaction.

This endpoint requires API key authentication.

audio_url

string

required

URL to the audio file to transcribe (MP3, WAV, FLAC, M4A, OGG, WEBM, MP4)

model

string

Transcription model to use (‘mini’, ‘nano’, ‘pro’). Default: ‘mini’

language

string

Language code for transcription (e.g., ‘en’, ‘es’, ‘fr’, ‘auto’). Default: ‘auto’

timestamp_style

string

Timestamp granularity (‘segment’ or ‘word’). Default: ‘segment’

diarize

boolean

Enable speaker diarization to identify different speakers. Default: false

audio_start_from

number

Start transcription from this time in seconds. Default: 0

audio_end_at

number

End transcription at this time in seconds. Default: 0 (full audio)

content_safety

boolean

Enable content safety filtering. Default: false

entity_detection

boolean

Enable entity detection (person, location, organization, etc.). Default: false

entity_types

array

Array of entity types to detect when entity_detection is enabled

topics

boolean

Enable topic detection. Default: false

summarization

boolean

Enable text summarization. Default: false

summary_language

string

Language for summary generation. Default: ‘en’

summary_type

string

Summary format (‘bullets’, ‘paragraphs’, ‘markdown’). Default: ‘bullets’

redact_pii

boolean

Enable PII (Personally Identifiable Information) redaction. Default: false

redact_pii_policies

array

Array of PII types to redact when redact_pii is enabled

redact_pii_sub

string

PII substitution method (‘hash’, ‘mask’, ‘remove’). Default: ‘hash’

wait_until_complete

boolean

Wait for complete processing before returning response. Default: false

curl -X POST https://api.verbalisai.com/v1/transcript \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://example.com/audio.mp3",
    "model": "mini",
    "language": "auto",
    "timestamp_style": "segment",
    "diarize": false,
    "topics": true,
    "summarization": true,
    "summary_type": "bullets"
  }'

{
  "id": "clx1234567890abcdef",
  "audio_url": "https://example.com/audio.mp3",
  "status": "completed",
  "text": "Hello, this is a sample transcription of your audio file. The quality is excellent and the AI processing has detected relevant topics.",
  "topics": ["technology", "audio processing", "artificial intelligence"],
  "summary": {
    "text": "• Discussion about audio transcription technology\n• Positive feedback on transcription quality\n• Reference to AI processing capabilities",
    "type": "bullets",
    "language": "en"
  },
  "entities": [
    {
      "type": "product",
      "text": "audio file",
      "startIndex": 45,
      "endIndex": 55
    },
    {
      "type": "organization",
      "text": "AI processing",
      "startIndex": 89,
      "endIndex": 102
    }
  ],
  "duration": 12.5,
  "segments": [
    {
      "id": 0,
      "text": "Hello, this is a sample transcription of your audio file.",
      "start": 0.0,
      "end": 5.2,
      "speaker_id": null
    },
    {
      "id": 1,
      "text": "The quality is excellent and the AI processing has detected relevant topics.",
      "start": 5.3,
      "end": 12.5,
      "speaker_id": null
    }
  ]
}

Response Fields

string

Unique transcription identifier

audio_url

string

Original audio URL that was transcribed

status

string

Transcription status (‘completed’, ‘processing’, ‘failed’)

text

string

Complete transcription text (with PII redacted if enabled)

duration

number

Audio duration in seconds

topics

array

Array of detected topics (if topic detection was enabled)

summary

object

Show Summary Object (if summarization enabled)

text

string

Generated summary text

type

string

Summary format (‘bullets’, ‘paragraphs’, ‘markdown’)

language

string

Summary language code

entities

array

Show Entity Objects (if entity detection enabled)

type

string

Entity type (person, location, organization, etc.)

text

string

Entity text

startIndex

number

Start character index in the text

endIndex

number

End character index in the text

segments

array

Show Segment Objects

number

Segment identifier

text

string

Segment text

start

number

Start time in seconds

end

number

End time in seconds

speaker_id

string

Speaker identifier (if diarization enabled)

redact_pii_audio

boolean

Indicates if PII redaction was applied

redact_pii_policies

array

Array of PII types that were redacted

redact_pii_sub

string

PII substitution method used (‘hash’, ‘mask’, ‘remove’)

Supported Audio Formats

Format	Notes
MP3	Most common format
WAV	Uncompressed audio
FLAC	Lossless compression
M4A	Apple audio format
OGG	Open source format
WEBM	Web optimized
MP4	Video with audio track

Models Available

Model	Description	Use Case
`nano`	Fastest, English-only	Quick transcriptions, real-time applications
`mini`	Balanced speed/accuracy	General purpose transcriptions
`pro`	Highest accuracy	Professional transcriptions, critical applications

Entity Types

When entity_detection is enabled, you can detect these entity types:

person - Names of people
location - Geographic locations
organization - Company/organization names
event - Events and meetings
product - Product names
date - Date references
phone_number - Phone numbers
email - Email addresses
url - Web URLs
ip_address - IP addresses
credit_card - Credit card numbers
bank_account - Bank account numbers
ssn - Social Security Numbers

PII Redaction Policies

When redact_pii is enabled, these PII policies are available for redaction:

Personal identifiers (names, SSN, etc.)
Contact information (email, phone, address)
Financial information (credit cards, bank accounts)
Medical information
And more comprehensive PII categories

Notes

Audio is processed from publicly accessible URLs
Processing time varies with file length and model choice
Credits are charged based on audio duration and model used
Automatic language detection supports 50+ languages
Advanced features like diarization work best with longer audio segments
PII redaction is applied to the entire transcription result
Audio slicing allows processing specific time ranges without full download

Oauth Get Transcription

Overview

Authentication

Transcription

File Storage

Usage & Analytics

Response Fields

Supported Audio Formats

Models Available

Entity Types

PII Redaction Policies

Notes

Overview

Authentication

Transcription

File Storage

Usage & Analytics

​Response Fields

​Supported Audio Formats

​Models Available

​Entity Types

​PII Redaction Policies

​Notes

Response Fields

Supported Audio Formats

Models Available

Entity Types

PII Redaction Policies

Notes