whisper

Model ID: @cf/openai/whisper

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

Properties

Task Type: Automatic Speech Recognition

Code Examples

Workers - Typescript

export interface Env {
  AI: Ai;
}

export default {
  async fetch(request, env): Promise<Response> {
    const res = await fetch(
      "https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/samples/cpp/windows/console/samples/enrollment_audio_katie.wav"
    );
    const blob = await res.arrayBuffer();

    const input = {
      audio: [...new Uint8Array(blob)],
    };

    const response = await env.AI.run(
      "@cf/openai/whisper",
      input
    );

    return Response.json({ input: { audio: [] }, response });
  },
} satisfies ExportedHandler<Env>;

curl

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/openai/whisper  \
  -X POST  \
  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN"  \
  --data-binary "@talking-llama.mp3"

Response

Automatic speech recognition responses return both a single string text property with the audio transcription and an optional array of words with start and end timestamps if the model supports that.

{
  "text": "It is a good day",
  "word_count": 5,
  "words": [
    {
      "word": "It",
      "start": 0.5600000023841858,
      "end": 1
    },
    {
      "word": "is",
      "start": 1,
      "end": 1.100000023841858
    },
    {
      "word": "a",
      "start": 1.100000023841858,
      "end": 1.2200000286102295
    },
    {
      "word": "good",
      "start": 1.2200000286102295,
      "end": 1.3200000524520874
    },
    {
      "word": "day",
      "start": 1.3200000524520874,
      "end": 1.4600000381469727
    }
  ]
}

API Schema

The following schema is based on JSON Schema

Input JSON Schema

{
  "oneOf": [
    {
      "type": "string",
      "format": "binary"
    },
    {
      "type": "object",
      "properties": {
        "audio": {
          "type": "array",
          "items": {
            "type": "number"
          }
        }
      },
      "required": [
        "audio"
      ]
    }
  ]
}

Output JSON Schema

{
  "type": "object",
  "contentType": "application/json",
  "properties": {
    "text": {
      "type": "string"
    },
    "word_count": {
      "type": "number"
    },
    "words": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "word": {
            "type": "string"
          },
          "start": {
            "type": "number"
          },
          "end": {
            "type": "number"
          }
        }
      }
    },
    "vtt": {
      "type": "string"
    }
  },
  "required": [
    "text"
  ]
}

Cloudflare Dashboard Discord Community Learning Center Support Portal