IBM Speech to Text

info

Available only in PAM version R24.0101 (Python 3.7) and earlier versions.

IBM Speech to Text

Primary Features

This plugin calls IBM Watson Speech-to-text technology.
It activates the AI based STT by sending the audio file and other required parameters such as compression type and language of choice.

Need help?

Technical contact to tech@argos-labs.com

May you search all operations,

CAUTION

API Keys created under IBM Cloud resource regions other than “Dallas” may cause authentication error with error code 403.

IMPORTANT NOTES

1. This is a commercial API and end user will be charged by the supplier of this API after a certain amount of free usage.
2. The user license contract must be entered directly between the supplier of this API and the End User.
3. ARGOS LABS will not be responsible for any consequences either tangible or non-tangible that have resulted from usage of this API.

Contents

How to set the parameters

Advanced Settings

When checked, the plugin returns a confidence measure in the range of 0.0 to 1.0 for each word. When unchecked, no word confidence measures are returned.
When checked, the plugin converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations. For US English, the plugin also converts certain keyword strings to punctuation symbols. This applies to US English, Japanese, and Spanish transcription only.
When checked, it specifies the duration of the pause interval at which the service ends the processing. Silence indicates a point at which the speaker pauses between spoken words or phrases. Specify a value for the pause interval in the range of 0.0 to 120.0 The default pause interval for most languages is 0.8 seconds. The default for Chinese is 0.6 seconds.
Use this parameter to suppress side conversations or background noise. Specify a value between 0.0 and 1.0:

0.0 (the default) provides no suppression (background audio suppression is disabled).
0.5 provides a reasonable level of audio suppression for general usage.
1.0 suppresses all audio (no audio is transcribed).