Let’s Explore eSpeak-ng

Kishan kumar
4 min readOct 2, 2021

In this writeup we explore various options of eSpeak NG.

What is eSpeak-ng?

The eSpeak-ng is a compact open source software text-to-speech synthesizer for Linux, Windows, Android and other operating systems. It supports more than 100 languages and accents.

Features:

  • Includes different Voices, whose characteristics can be altered.
  • Can produce speech output as a WAV file.
  • SSML (Speech Synthesis Markup Language) is supported (not complete), and also HTML.
  • Compact size. The program and its data, including many languages, totals about few Mbytes.
  • Can be used as a front-end to MBROLA diphone voices. eSpeak NG converts text to phonemes with pitch and length information.
  • Can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.
  • Potential for other languages. Several are included in varying stages of progress. Help from native speakers for these or other languages is welcome.
  • Written in C.

How to install:

For Debian and Ubuntu based distributions:

sudo apt-get install espeak-ng

For RHEL, Fedora, and CentOS based distributions:

sudo yum install espeak-ng

If you already installed it, you can check its version by this command :

espeak-ng --version

Command to speak the text using the default English voice:

espeak-ng "any text"

Command to speak the content of text file using the default English voice.

espeak-ng -f <file_name with Path>

or

cat <file_name with Path> | espeak-ng

List all voices supported by eSpeak NG.

espeak-ng --voices

Speak the word “Hello” using the default English voice, and
print the phonemes that were spoken.

espeak-ng -x hello

Command to speak the text using the Hindi voice:

espeak-ng -v hi -s 100 "text"
In this command you can see -s 100, this is speed you can adjust that how fast or how slow you want to listen.

Other Options:

-h, --help
Show summary of options.

--version
Prints the espeak library version and the location of the espeak voice data.

-f <text file>
Text file to speak.

--stdin
Read text input from stdin instead of a file.

If neither -f nor --stdin are provided, <words> are spoken, or if no words are provided
then text is spoken from stdin a line at a time.

-d <device>
Use the specified device to speak the audio on. If not specified, the default audio
device is used.

-q Quiet, don´t produce any speech (may be useful with -x).

-a <integer>
Amplitude, 0 to 200, default is 100.

-g <integer>
Word gap. Pause between words, units of 10ms at the default speed.

-k <integer>
Indicate capital letters with: 1=sound, 2=the word "capitals", higher values = a
pitch increase (try -k20).

-l <integer>
Line length. If not zero (which is the default), consider lines less than this
length as end-of-clause.

-p <integer>
Pitch adjustment, 0 to 99, default is 50.

-s <integer>
Speed in words per minute, default is 160.

-v <voice name>
Use voice file of this name from espeak-ng-data/voices. A variant can be specified
using voice+variant, such as af+m3.

-w <wave file name>
Write output to this WAV file, rather than speaking it directly.

--split=<minutes>
Used with -w to split the audio output into <minutes> recorded chunks.

-b Input text encoding, 1=UTF8, 2=8 bit, 4=16 bit.

-m Indicates that the text contains SSML (Speech Synthesis Markup Language) tags or
other XML tags. Those SSML tags which are supported are interpreted. Other tags,
including HTML, are ignored, except that some HTML tags such as <hr> <h2> and <li>
ensure a break in the speech.

-x Write phoneme mnemonics to stdout.

-X Write phonemes mnemonics and translation trace to stdout. If rules files have been
built with --compile=debug, line numbers will also be displayed.

-z No final sentence pause at the end of the text.

--stdout
Write speech output to stdout.

--compile=voicename
Compile the pronunciation rules and dictionary in the current directory.
=<voicename< is optional and specifies which language is compiled.

--compile-debug=voicename
Compile the pronunciation rules and dictionary in the current directory as above,
but include line numbers, that get shown when -X is used.

--ipa Write phonemes to stdout using International Phonetic Alphabet. --ipa=1 Use ties,
--ipa=2 Use ZWJ, --ipa=3 Separate with _.

--tie=<character>
The character to use to join multi-letter phonemes in -x and --ipa output.

--path=<path>
Specifies the directory containing the espeak-ng-data directory.

--pho Write mbrola phoneme data (.pho) to stdout or to the file in --phonout.

--phonout=<filename>
Write output from -x -X commands and mbrola phoneme data to this file.

--punct="<characters>"
Speak the names of punctuation characters during speaking. If =<characters> is
omitted, all punctuation is spoken.

--sep=<character>
The character to separate phonemes from the -x and --ipa output.

--voices[=<language code>]
Lists the available voices. If =<language code> is present then only those voices
which are suitable for that language are listed.

--voices=<directory>
Lists the voices in the specified subdirectory.

Thank you!

--

--

Kishan kumar
0 Followers

Security Researcher | Ex-Intern GPCSSI 2021 | Gate 2021 Qualified