Let’s Explore eSpeak-ng

4 min readOct 2, 2021

In this writeup we explore various options of eSpeak NG.

What is eSpeak-ng?

The eSpeak-ng is a compact open source software text-to-speech synthesizer for Linux, Windows, Android and other operating systems. It supports more than 100 languages and accents.

Features:

Includes different Voices, whose characteristics can be altered.
Can produce speech output as a WAV file.
SSML (Speech Synthesis Markup Language) is supported (not complete), and also HTML.
Compact size. The program and its data, including many languages, totals about few Mbytes.
Can be used as a front-end to MBROLA diphone voices. eSpeak NG converts text to phonemes with pitch and length information.
Can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.
Potential for other languages. Several are included in varying stages of progress. Help from native speakers for these or other languages is welcome.
Written in C.

How to install:

For Debian and Ubuntu based distributions:

sudo apt-get install espeak-ng

For RHEL, Fedora, and CentOS based distributions:

sudo yum install espeak-ng

If you already installed it, you can check its version by this command :

espeak-ng --version

Command to speak the text using the default English voice:

espeak-ng "any text"

Command to speak the content of text file using the default English voice.

espeak-ng -f <file_name with Path>

cat <file_name with Path> | espeak-ng

List all voices supported by eSpeak NG.

espeak-ng --voices

Speak the word “Hello” using the default English voice, and
print the phonemes that were spoken.

espeak-ng -x hello

Command to speak the text using the Hindi voice:

espeak-ng -v hi -s 100 "text"

In this command you can see -s 100, this is speed you can adjust that how fast or how slow you want to listen.

Other Options:

-h, --help
              Show summary of options.

       --version
              Prints the espeak library version and the location of the espeak voice data.

       -f <text file>
              Text file to speak.

       --stdin
              Read text input from stdin instead of a file.

       If  neither  -f  nor --stdin are provided, <words> are spoken, or if no words are provided
       then text is spoken from stdin a line at a time.

       -d <device>
              Use the specified device to speak the audio on. If not specified, the default audio
              device is used.

       -q     Quiet, don´t produce any speech (may be useful with -x).

       -a <integer>
              Amplitude, 0 to 200, default is 100.

       -g <integer>
              Word gap. Pause between words, units of 10ms at the default speed.

       -k <integer>
              Indicate  capital  letters  with: 1=sound, 2=the word "capitals", higher values = a
              pitch increase (try -k20).

       -l <integer>
              Line length. If not zero (which is the default),  consider  lines  less  than  this
              length as end-of-clause.

       -p <integer>
              Pitch adjustment, 0 to 99, default is 50.

       -s <integer>
              Speed in words per minute, default is 160.

       -v <voice name>
              Use  voice file of this name from espeak-ng-data/voices. A variant can be specified
              using voice+variant, such as af+m3.

       -w <wave file name>
              Write output to this WAV file, rather than speaking it directly.

       --split=<minutes>
              Used with -w to split the audio output into <minutes> recorded chunks.

       -b     Input text encoding, 1=UTF8, 2=8 bit, 4=16 bit.

       -m     Indicates that the text contains SSML (Speech Synthesis Markup  Language)  tags  or
              other  XML  tags.  Those SSML tags which are supported are interpreted. Other tags,
              including HTML, are ignored, except that some HTML tags such as <hr> <h2> and  <li>
              ensure a break in the speech.

       -x     Write phoneme mnemonics to stdout.

       -X     Write  phonemes mnemonics and translation trace to stdout. If rules files have been
              built with --compile=debug, line numbers will also be displayed.

       -z     No final sentence pause at the end of the text.

       --stdout
              Write speech output to stdout.

       --compile=voicename
              Compile  the  pronunciation  rules  and  dictionary  in  the   current   directory.
              =<voicename< is optional and specifies which language is compiled.

       --compile-debug=voicename
              Compile  the  pronunciation rules and dictionary in the current directory as above,
              but include line numbers, that get shown when -X is used.

       --ipa  Write phonemes to stdout using International Phonetic Alphabet. --ipa=1  Use  ties,
              --ipa=2 Use ZWJ, --ipa=3 Separate with _.

       --tie=<character>
              The character to use to join multi-letter phonemes in -x and --ipa output.

       --path=<path>
              Specifies the directory containing the espeak-ng-data directory.

       --pho  Write mbrola phoneme data (.pho) to stdout or to the file in --phonout.

       --phonout=<filename>
              Write output from -x -X commands and mbrola phoneme data to this file.

       --punct="<characters>"
              Speak  the  names  of  punctuation  characters during speaking. If =<characters> is
              omitted, all punctuation is spoken.

       --sep=<character>
              The character to separate phonemes from the -x and --ipa output.

       --voices[=<language code>]
              Lists the available voices. If =<language code> is present then only  those  voices
              which are suitable for that language are listed.

       --voices=<directory>
              Lists the voices in the specified subdirectory.

Thank you!

Let’s Explore eSpeak-ng

What is eSpeak-ng?

Features:

How to install:

Other Options:

Written by Kishan kumar