Nov 7, 2009

Text to Speech using Java

Introduction

        In this article we will see how to convert the text input into Speech using Java. This can be achieved in two ways. They are
JSAPI 1.0 (Java Speech Application Programming Interface)
Free TTS   (Free Text To Speech)

JSAPI 1.0

        JSAPI 1.0 is developed by Sun Microsystems. Jsapi contains two core technologies, viz. Speech Synthesis and Speech Recognition.

       The Speech Synthesis is a speech engine that converts the text into speech.

       The Speech Recognition is used for Speech to text conversion.

FreeTTS

        FreeTTS is an open source package. It is entirely written by making use of Java programming language. This package can also be used to convert the text into speech.

Necessity for the conversion of Text into Speech

      Speech synthesis of a text in a word processor is an aid to proof-reading. It is easier to detect grammatical and stylistic problems. In the TTS, if we save the file in the audio format, the size of the file will be larger than that of a text file. It may be more useful in mobile phones in which we can hear the message that we have received instead of reading the SMS.

Methods to convert Text to Speech

    Structure analysis:  Structure analysis processes the input text to determine where paragraphs, sentences and other structures start and end.

    Text pre-processing: This expands any abbreviations or other forms of shorthand.

  E.g. St. John PG becomes Saint John PG.

    Text –to-phoneme conversion:  It converts each word to phonemes which refers to a basic unit of sound in a language.

    Prosody analysis: It processes the sentence structure, words and phonemes to determine appropriate prosody for the sentence.

    Waveform production: Finally, the phonemes and prosody information are used to produce the audio waveform for each sentence.

Requirements for Text to Speech:

The following jar files are very important

  • Cmu_us_kal.jar,
  • Cmulex.jar
  • en_us.jar
  • freetts.jar
  • jsapi.jar

The above files are available in FreeTTS-1.2.1-bin. FreeTTS is an open source package. Download freetts-1.2.1-bin.zip from http://sourcefore.net/projects/freetts/

Unzip the freeTTS binary package and check inside the lib directory, that all the above jar files are available except jsapi.jar. The jsapi.exe file will be available.

Run jsapi.exe, and you will get jsapi.jar

Copy all the jar files mentioned above to the working folder or

C:\Program Files\java\jdk1.6.0_03\jre\lib\ext

Important classes in javax.speech package:

import javax.speech.*;
import javax.speech.synthesis.*;


Engine:





  The Engine interface is available inside the speech package. The Engine interface is the parent interface for all speech engines including Recognizer and Synthesizer. “Speech engine” is the generic term for a system designed to deal with either speech input or speech output.



The package is import javax.speech.Engine;



The basic processes for using a speech engine in an application are as follows.



Identify the application’s functional requirements for an engine (e.g. language or dictation          capability.

Locate and create an engine that meets those functional requirements.


Allocate the resources for the engine.


Set up the engine.


Begin operation to allocate the engine and resume it.


Use the engine. Deallocate the resources of the engine.



Central



   The Central class is the initial access point to all speech input and output capabilities. Central provides the ability to locate, select and create speech recognizers and speech synthesizers.



SynthesizerModeDesc



   SynthesizerModeDesc extends the EngineModeDesc with the properties that are specific to speech synthesizers. A SynthesizerModeDesc inherits the engine name, mode name locale and running properties from EngineModeDesc. SynthesizerModeDesc adds two properties: List of voices provided by the synthesizer Voice to be loaded when the synthesizer is started.



Synthesizer



   The Synthesizer interface provides primary access to speech synthesis capabilities.



Voice



  A description of one output voice of a speech synthesizer is as follows.



  Voice objects can be used in selection of synthesis engines (through the SynthesizerModeDesc). The current speaking voice of a Synthesizer can be changed during the operation with the setVoice method of the SynthesizerProperties object.



  Create a simple program using jsapi speech synthesis.



 import javax.speech.*;
import javax.speech.synthesis.*;


Step 1:  We first import the above packages.



 Step 2:  Create a Synthesizer.



Synthesizer syn = Central.createSynthesizer(null); 


   This method creates a default Synthesizer. This Synthesizer gets the default locale.



 SynthesizerModeDesc desc = new SynthesizerDesc();
desc.setLocale(new Locale(“de”, ””));
desc.addVoice(new Voice(null, GENDER_FEMALE, AGE_DONT_CARE, null));
Synthesizer synthesizer = Central.createSynthesizer(desc);



   The above code is to select a particular Locale and particular voice.



Step 3: The following code is to allocate and resume the synthesizer.



synthesizer.allocate();
synthesizer.resume();


Step 4: 
  



Voice[] voices = desc.getVoices(); 


Get the available voices:



The getVoices() method returns all the available voices from the selected Synthesizer.



synthesizer.getSynthesizerProperties().setVoice(voice);



Step 5:



   The setVoice() method sets a particular value.



Step 6:



    Speak the text.



synthesizer.speakPlainText(speaktext, null);
synthesizer.waitEngineState(Synthesizer.QUEUE_EMPTY);


  The speakPlainText(speaktext, null) method speaks the given text until queue is empty.



Step 7:



Deallocate the Synthesizer.



 synthesizer.deallocate(); 



Demo Programs:



 Demo 1:



    Our first Demo program uses the jsapi.jar. We will call it as Demojsapi.java



 import javax.speech.*;
import java.util.*;
import javax.speech.synthesis.*;

public class Demojsapi
{
String speaktext;
public void doSpeak(String speak, String voicename)
{
speaktext = speak;
String voiceName = voicename;
try
{
SynthesizerModeDesc desc = new SynthesizerModeDesc(null, “general, “Locale.US, null, null);
Synthesizer synthesizer = Central.createSynthesizer(desc);
synthesizer.allocate();
synthesizer.resume();
desc = (SynthesizerModeDesc)synthesizer.getEngineModeDesc();
Voice[] voices = desc.getVoices();
Voice voice = null;
for(int i = 0; i< voices. length; i++)
{
if(voices[i].getName().equals(voiceName))
{
voice = voices[i];
break;
}
}
synthesizer.getSynthesizerProperties().setVoice(voice);
synthesizer.speakPlainText(speaktext, null);
synthesizer.waitEngineState(Synthesizer.QUEUE_EMPTY);
synthesizer.deallocate();
}
catch (Exception e)
{
String message = “missing speech properties in “ + System.getProperty(“user.home”) + “\n”;
System.out.println (“”+e);
}
}
public static void main (String[]args)
{
Demojsapi obj = new Demojsapi();
obj.doSpeak(args[0],”Kelvin16”);
}
}


Demo 2:

     



Our second program uses freetts.jar



We will call it as Demofreetts.java



 import com.sun.speech.freetts.*;
import java.util.*;

public class Demofreetts
{
private String speaktext;
public void doSpeak(String speak, String voice)
{
speaktext = speak;
try
{
VoiceManager voiceManager = VoiceManager.getInstance();
Voice voices = voiceManager.getVoice(voice);
Voice sp = null;
if(voices! = null)
sp = voices;
else
System.out.println(“No Voice Available”);
//==================================================
sp.allocate();
sp.speak(speaktext);
sp.deallocate();
//==================================================
}
catch(Exception e)
{
e.printStackTrace();
}
}
public static void main(String[]args)
{
Demofreetts obj = new Demofreetts();
obj.doSpeak(args[0],”Kelvin16”;
}
}


Running Procedure:



The running procedure is same for both the programs. Here we will see the running procedure for the first program.



Create a folder named texttospeech.

Copy all the jars (jsapi.jar, freetts.jar, cmu_time_awb.jar, cmu_us_kal.jar, etc) to that folder or C:\Program Files\java\ jdk1.6.0_03\jre\lib\ext


Create the program named Demojsapi.java


If all the jar files are in your working folder, then set the class path as



set classpath =      cmu_us_kal.jar;en_us.jar;freetts.jar;cmulex.jar;jsapi.jar;



If all the jar files do not exist in your working folder, set class path as



Set classpath = C:\Program Files\java\jdk1.6.0\jre\lib; C:\Program Files\java\jdk1.6.0\jre\lib\cmu_us_kal.jar; C:\Program Files\java\jdk1.6.0\jre\lib\en_us.jar; C:\Program Files\java\jdk1.6.0\jre\lib\freetts.jar; C:\Program Files\java\jdk1.6.0\jre\lib\cmulex.jar; C:\Program Files\java\jdk1.6.0\jre\lib\jsapi.jar



set the java path



           E.g. set path = C:\windows\system32;C:\jdk1.6\bin;



Compile the program



           javac Demojsapi.java



Run the program



           java Demojsapi.java “Web Technology I/O Blog Welcomes You”



      (If you get the error “missing speech.properties in user home: C:\Documents and Settings\User”, copy the speech.properties from freetts1.2.1 folder and paste it into user home (if we use windows XP, the user home is C:\Documents and settings\User)).



You will get the required result.



By following the above 8 steps, we can run the second program as well.



Summary



In this artilce we saw a simple program on how to convert the text input into Speech using Java. But an effective speech application is one that uses speech to enhance a user's performance of a task or enable an activity that cannot be done without it. Designing an application with speech in mind from the outset is a key success factor.



For More Information





  • The following sources provide additional information on speech user interface design.

    http://java.sun.com/products/java-media/speech/forDevelopers/jsapi-guide/index.html




  • Fraser, N.M. and G.N. Gilbert, "Simulating Speech Systems," Computer Speech and Language, Vol. 5, Academic Press Limited, 1991.




  • Raman, T.V. Auditory User Interfaces: Towards the Speaking Computer. Kluwer Academic Publishers, Boston, MA, 1997.




  • Roe, D.B. and N.M. Wilpon, editors. Voice Communication Between Humans and Machines. National Academy Press, Washington D.C., 1994.




  • Schmandt, C. Voice Communication with Computers: Conversational Systems . Van Nostrand Reinhold, New York, 1994.


9 comments:

fell0206 said...

Hi, I learn you comment, and do my self, but when I run it, I got the error message:
System property "mbrola.base" is undefined.
I search in google, but I still not correct it, can you tell me what happen? Thank you!

ganesh said...

I like ur way of approach...I have used ur guide that how to execute speech recognition program.I was worked perfectly.My suggestion is ,u can include some visual demo also.

madhulatha said...

hii
i have tried this procedure but i'm getting the error as
"Error: Could not find or load main class demojsapi.java"

Priyanka said...

hello,
i tried to run your programe but getting error as package javax.speech does not exist... so what i need to do for that?...
if possible pls send me that package...asap
Thanks..

Priyanka said...

Hello,
i tried to run your programe but getting error as package javax.speech does not exist...so what i need to do for that?
if possible pls send me that package..as son as possible...

Anonymous said...

just change the name Kelvin16 to kevin16 (k is lower case and no l )

Anonymous said...

just change voicename Kelvin16 to kevin16

Anonymous said...

I am getting the error in compiling the first program Demojsapi.java. Error is at the line SysthesizerModeDesc desc = new SysthesizerModeDesc(null,"general","Locale",null,null)
plz help as soon as possible

Anonymous said...

hai sir,i am tried to execute these, but it shows NullPointerException.i didn't know how could these solution to be solved.please give me suggestion for solveing these problem.

Text Widget

Copyright © Vinay's Blog | Powered by Blogger

Design by | Blogger Theme by