Hume AI Batch API
0.1.0

The Batch API provides access to Hume models through an asynchronous job-based interface. You can submit a job to have many different files processed in parallel. The status of a job can then be checked with the job ID. Email notifications are available to alert on completed jobs.

This is the documentation for version 0.1.0 of the API. Last update on Mar 16, 2023.

Base URL
https://api.hume.ai

List jobs

GET /v0/batch/jobs

List the IDs of all jobs that have been run. If provided, at most max_results job IDs will be included in the response. Use page_token to paginate through the entire set of job IDs.

Responses

  • 200 object
    • Provide next_page_token as the value of page_token in a subsequent request in order to return the next page of job IDs.
      If this is the last page, then next_page_token will be omitted from the response.

    • jobs array[object] Required

      The list of jobs.

      • request object Required
        • models object Required
          • face object
            • fps_pred number(float)

              Number of frames per second to process. Other frames will be omitted from the response.

              Minimum value is 0.0. Default value is 3.0.

            • prob_threshold number(float)

              Face detection probability threshold. Faces detected with a probability less than this threshold will be omitted
              from the response.

              Minimum value is 0.0, maximum value is 1.0. Default value is 0.9900000095367432.

            • Whether to return identifiers for faces across frames. If true, unique identifiers will be assigned to face
              bounding boxes to differentiate different faces. If false, all faces will be tagged with an "unknown" ID.

              Default value is false.

            • save_faces boolean

              Whether to extract and save the detected faces to the artifacts directory included in the response.

              Default value is false.

            • min_face_size number(float)

              Minimum bounding box side length in pixels to treat as a face. Faces detected with a bounding box side length in
              pixels less than this threshold will be omitted from the response.

              Minimum value is 0.0. Default value is 60.0.

            • facs object

              Configuration for FACS predictions. If missing or null, no FACS predictions will be generated.

            • Configuration for Descriptions predictions. If missing or null, no Descriptions predictions will be generated.

          • burst object
          • prosody object
            • Whether to return identifiers for speakers over time. If true, unique identifiers will be assigned to spoken
              words to differentiate different speakers. If false, all speakers will be tagged with an "unknown" ID.

              Default value is false.

            • language string

              The BCP-47 tag (see above) of the language spoken in your media samples; If missing or null, it will be automatically detected.

          • language object
            • Whether to return identifiers for speakers over time. If true, unique identifiers will be assigned to spoken
              words to differentiate different speakers. If false, all speakers will be tagged with an "unknown" ID.

              Default value is false.

            • Configuration for sentiment predictions. If missing or null, no sentiment predictions will be generated.

            • toxicity object

              Configuration for toxicity predictions. If missing or null, no toxicity predictions will be generated.

            • language string

              The BCP-47 tag (see above) of the language spoken in your media samples; If missing or null, it will be automatically detected.

            • The granularity at which to generate predictions.

              Values are word, sentence, or passage. Default value is word.

            • Whether to generate predictions for speech utterances (rather than the user specified granularity) for text created from audio transcripts.

              Default value is true.

          • ner object
            • Whether to return identifiers for speakers over time. If true, unique identifiers will be assigned to spoken
              words to differentiate different speakers. If false, all speakers will be tagged with an "unknown" ID.

              Default value is false.

            • language string

              The BCP-47 tag (see above) of the language spoken in your media samples; If missing or null, it will be automatically detected.

        • urls array[string] Required

          URLs to the media files to be processed.
          Each must be a valid public URL to a media file (see recommended input filetypes) or an archive (zip, tar.gz, tar.bz2, tar.xz) of media files.
          To process more than 100 individual files per job, you can include a URL to an archive containing an arbitrary number of files.

          At least 1 but not more than 100 elements.

        • notify boolean

          Whether to send a notification to the user upon job completion/failure.

          Default value is false.

      • status string Required

        Values are QUEUED, IN_PROGRESS, FAILED, or COMPLETED.

      • failed object
      • creation_timestamp integer(int64) Required
      • completion_timestamp integer(int64)
GET /v0/batch/jobs
curl \
 -X GET https://api.hume.ai/v0/batch/jobs
Response example (200)
{
  "next_page_token": "string",
  "jobs": [
    {
      "request": {
        "models": {
          "face": {
            "fps_pred": 3.0,
            "prob_threshold": 0.9900000095367432,
            "identify_faces": false,
            "save_faces": false,
            "min_face_size": 60.0,
            "facs": {},
            "descriptions": {}
          },
          "burst": {},
          "prosody": {
            "identify_speakers": false,
            "language": "string"
          },
          "language": {
            "identify_speakers": false,
            "sentiment": {},
            "toxicity": {},
            "language": "string",
            "granularity": "word",
            "use_existing_partition": true
          },
          "ner": {
            "identify_speakers": false,
            "language": "string"
          }
        },
        "urls": [
          "string"
        ],
        "notify": false
      },
      "status": "QUEUED",
      "failed": {
        "message": "string"
      },
      "completed": {
        "predictions_url": "string",
        "errors_url": "string",
        "artifacts_url": "string",
        "num_predictions": 42,
        "num_errors": 42
      },
      "creation_timestamp": 42,
      "completion_timestamp": 42
    }
  ]
}

Start job

POST /v0/batch/jobs

Start a new batch job.

Facial Expression:
Analyzes human facial expressions in images and videos. Results will be provided per frame for video files.
Recommended input filetypes: png, jpeg, mp4

Vocal Burst:
Vocal bursts, also called non-verbal exclamations, are any sounds you make that express emotion and aren't words.
Recommended input filetypes: wav, mp3, mp4

Speech Prosody:
Speech prosody includes the intonation, stress, and rhythm of spoken word.
Recommended input filetypes: wav, mp3, mp4

Language:
Analyzes passages of text. This also supports audio and video files by transcribing and then directly analyzing the transcribed text.
Recommended input filetypes: txt, mp3, wav, mp4

NER (Named-entity Recognition):
Identifies real-world objects and concepts in passages of text. This also supports audio and video files by transcribing and then directly analyzing the transcribed text.
Recommended input filetypes: txt, mp3, wav, mp4


By default, we use an automated language detection method for our Speech Prosody, Language, and NER models.
However, if you know what language is being spoken in your media samples, you can specify it via its BCP-47 tag in the optional language field and potentially obtain more accurate results.
You can specify any of the following languages:

  • Chinese: zh
  • Danish: da
  • Dutch: nl
  • English: en
  • English (Australia): en-AU
  • English (India): en-IN
  • English (New Zealand): en-NZ
  • English (United Kingdom): en-GB
  • French: fr
  • French (Canada): fr-CA
  • German: de
  • Hindi: hi
  • Hindi (Roman Script): hi-Latn
  • Indonesian: id
  • Italian: it
  • Japanese: ja
  • Korean: ko
  • Norwegian: no
  • Polish: pl
  • Portuguese: pt
  • Portuguese (Brazil): pt-BR
  • Portuguese (Portugal): pt-PT
  • Russian: ru
  • Spanish: es
  • Spanish (Latin America): es-419
  • Swedish: sv
  • Tamil: ta
  • Turkish: tr
  • Ukrainian: uk

Body Required

  • models object Required
    • face object
      • fps_pred number(float)

        Number of frames per second to process. Other frames will be omitted from the response.

        Minimum value is 0.0. Default value is 3.0.

      • prob_threshold number(float)

        Face detection probability threshold. Faces detected with a probability less than this threshold will be omitted
        from the response.

        Minimum value is 0.0, maximum value is 1.0. Default value is 0.9900000095367432.

      • Whether to return identifiers for faces across frames. If true, unique identifiers will be assigned to face
        bounding boxes to differentiate different faces. If false, all faces will be tagged with an "unknown" ID.

        Default value is false.

      • save_faces boolean

        Whether to extract and save the detected faces to the artifacts directory included in the response.

        Default value is false.

      • min_face_size number(float)

        Minimum bounding box side length in pixels to treat as a face. Faces detected with a bounding box side length in
        pixels less than this threshold will be omitted from the response.

        Minimum value is 0.0. Default value is 60.0.

      • facs object

        Configuration for FACS predictions. If missing or null, no FACS predictions will be generated.

      • Configuration for Descriptions predictions. If missing or null, no Descriptions predictions will be generated.

    • burst object
    • prosody object
      • Whether to return identifiers for speakers over time. If true, unique identifiers will be assigned to spoken
        words to differentiate different speakers. If false, all speakers will be tagged with an "unknown" ID.

        Default value is false.

      • language string

        The BCP-47 tag (see above) of the language spoken in your media samples; If missing or null, it will be automatically detected.

    • language object
      • Whether to return identifiers for speakers over time. If true, unique identifiers will be assigned to spoken
        words to differentiate different speakers. If false, all speakers will be tagged with an "unknown" ID.

        Default value is false.

      • Configuration for sentiment predictions. If missing or null, no sentiment predictions will be generated.

      • toxicity object

        Configuration for toxicity predictions. If missing or null, no toxicity predictions will be generated.

      • language string

        The BCP-47 tag (see above) of the language spoken in your media samples; If missing or null, it will be automatically detected.

      • The granularity at which to generate predictions.

        Values are word, sentence, or passage. Default value is word.

      • Whether to generate predictions for speech utterances (rather than the user specified granularity) for text created from audio transcripts.

        Default value is true.

    • ner object
      • Whether to return identifiers for speakers over time. If true, unique identifiers will be assigned to spoken
        words to differentiate different speakers. If false, all speakers will be tagged with an "unknown" ID.

        Default value is false.

      • language string

        The BCP-47 tag (see above) of the language spoken in your media samples; If missing or null, it will be automatically detected.

  • urls array[string] Required

    URLs to the media files to be processed.
    Each must be a valid public URL to a media file (see recommended input filetypes) or an archive (zip, tar.gz, tar.bz2, tar.xz) of media files.
    To process more than 100 individual files per job, you can include a URL to an archive containing an arbitrary number of files.

    At least 1 but not more than 100 elements.

  • notify boolean

    Whether to send a notification to the user upon job completion/failure.

    Default value is false.

Responses

POST /v0/batch/jobs
curl \
 -X POST https://api.hume.ai/v0/batch/jobs \
 -H "Content-Type: application/json" \
 -d '{"models":{"face":{"fps_pred":3.0,"prob_threshold":0.9900000095367432,"identify_faces":false,"save_faces":false,"min_face_size":60.0,"facs":{},"descriptions":{}},"burst":{},"prosody":{"identify_speakers":false,"language":"string"},"language":{"identify_speakers":false,"sentiment":{},"toxicity":{},"language":"string","granularity":"word","use_existing_partition":true},"ner":{"identify_speakers":false,"language":"string"}},"urls":["string"],"notify":false}'
Request example
{
  "models": {
    "face": {
      "fps_pred": 3.0,
      "prob_threshold": 0.9900000095367432,
      "identify_faces": false,
      "save_faces": false,
      "min_face_size": 60.0,
      "facs": {},
      "descriptions": {}
    },
    "burst": {},
    "prosody": {
      "identify_speakers": false,
      "language": "string"
    },
    "language": {
      "identify_speakers": false,
      "sentiment": {},
      "toxicity": {},
      "language": "string",
      "granularity": "word",
      "use_existing_partition": true
    },
    "ner": {
      "identify_speakers": false,
      "language": "string"
    }
  },
  "urls": [
    "string"
  ],
  "notify": false
}
Response example (200)
{
  "job_id": "string"
}

Get job

GET /v0/batch/jobs/{job_id}

Get the request details and status of a given batch job.

Responses

  • 200 object

    Job Request Details and Status

    • request object Required
      • models object Required
        • face object
          • fps_pred number(float)

            Number of frames per second to process. Other frames will be omitted from the response.

            Minimum value is 0.0. Default value is 3.0.

          • prob_threshold number(float)

            Face detection probability threshold. Faces detected with a probability less than this threshold will be omitted
            from the response.

            Minimum value is 0.0, maximum value is 1.0. Default value is 0.9900000095367432.

          • Whether to return identifiers for faces across frames. If true, unique identifiers will be assigned to face
            bounding boxes to differentiate different faces. If false, all faces will be tagged with an "unknown" ID.

            Default value is false.

          • save_faces boolean

            Whether to extract and save the detected faces to the artifacts directory included in the response.

            Default value is false.

          • min_face_size number(float)

            Minimum bounding box side length in pixels to treat as a face. Faces detected with a bounding box side length in
            pixels less than this threshold will be omitted from the response.

            Minimum value is 0.0. Default value is 60.0.

          • facs object

            Configuration for FACS predictions. If missing or null, no FACS predictions will be generated.

          • Configuration for Descriptions predictions. If missing or null, no Descriptions predictions will be generated.

        • burst object
        • prosody object
          • Whether to return identifiers for speakers over time. If true, unique identifiers will be assigned to spoken
            words to differentiate different speakers. If false, all speakers will be tagged with an "unknown" ID.

            Default value is false.

          • language string

            The BCP-47 tag (see above) of the language spoken in your media samples; If missing or null, it will be automatically detected.

        • language object
          • Whether to return identifiers for speakers over time. If true, unique identifiers will be assigned to spoken
            words to differentiate different speakers. If false, all speakers will be tagged with an "unknown" ID.

            Default value is false.

          • Configuration for sentiment predictions. If missing or null, no sentiment predictions will be generated.

          • toxicity object

            Configuration for toxicity predictions. If missing or null, no toxicity predictions will be generated.

          • language string

            The BCP-47 tag (see above) of the language spoken in your media samples; If missing or null, it will be automatically detected.

          • The granularity at which to generate predictions.

            Values are word, sentence, or passage. Default value is word.

          • Whether to generate predictions for speech utterances (rather than the user specified granularity) for text created from audio transcripts.

            Default value is true.

        • ner object
          • Whether to return identifiers for speakers over time. If true, unique identifiers will be assigned to spoken
            words to differentiate different speakers. If false, all speakers will be tagged with an "unknown" ID.

            Default value is false.

          • language string

            The BCP-47 tag (see above) of the language spoken in your media samples; If missing or null, it will be automatically detected.

      • urls array[string] Required

        URLs to the media files to be processed.
        Each must be a valid public URL to a media file (see recommended input filetypes) or an archive (zip, tar.gz, tar.bz2, tar.xz) of media files.
        To process more than 100 individual files per job, you can include a URL to an archive containing an arbitrary number of files.

        At least 1 but not more than 100 elements.

      • notify boolean

        Whether to send a notification to the user upon job completion/failure.

        Default value is false.

    • status string Required

      Values are QUEUED, IN_PROGRESS, FAILED, or COMPLETED.

    • failed object
    • creation_timestamp integer(int64) Required
    • completion_timestamp integer(int64)
  • 404 object

    Job Not Found

GET /v0/batch/jobs/{job_id}
curl \
 -X GET https://api.hume.ai/v0/batch/jobs/{job_id}
Response example (200)
{
  "request": {
    "models": {
      "face": {
        "fps_pred": 3.0,
        "prob_threshold": 0.9900000095367432,
        "identify_faces": false,
        "save_faces": false,
        "min_face_size": 60.0,
        "facs": {},
        "descriptions": {}
      },
      "burst": {},
      "prosody": {
        "identify_speakers": false,
        "language": "string"
      },
      "language": {
        "identify_speakers": false,
        "sentiment": {},
        "toxicity": {},
        "language": "string",
        "granularity": "word",
        "use_existing_partition": true
      },
      "ner": {
        "identify_speakers": false,
        "language": "string"
      }
    },
    "urls": [
      "string"
    ],
    "notify": false
  },
  "status": "QUEUED",
  "failed": {
    "message": "string"
  },
  "completed": {
    "predictions_url": "string",
    "errors_url": "string",
    "artifacts_url": "string",
    "num_predictions": 42,
    "num_errors": 42
  },
  "creation_timestamp": 42,
  "completion_timestamp": 42
}
Response example (404)
{
  "message": "string"
}