With the continuous advancements in information technology development, information and web services and applications can now be accessed anytime and nearly anywhere instantaneously via a wireless connection. The most commonly used devices in accessing the web are tablets and smartphones. However, accessing content is only via web browsers that are operated via graphical user interfaces(GUIs) that are traditional. Advance paradigms that are used on human-machine interaction such as the ones that have been proposed by Smart Environments and Ambient Intelligence are user-friendly, user-empowerment, support human interaction and they offer services support that is more efficient.

What are spoken dialogue systems?

Spoken dialogue system are developed computer programs that are meant to interact with users using speech so as to provide them with automated services that are specific. This kind of interaction is carried out through dialogue turns means, with the aim of making it as similar as possible to the interaction between humans in terms of intelligence, naturalness and affective content. To ensure such an intelligent and natural interaction, there is a need for the provision of safe, transparent, easy and effective interaction between the system and the user.

Over the past few years, there has been an increasing attempt in easing and enhancing computer to human interactions and human to human interaction via spoken dialogue systems. The major goal of these systems is ensuring that the speech-based technologies are made more usable and accessible to both the elderly and the disabled. Initially, SDS were used to making interactions in simple tasks such as providing air travel information easy but currently they are used in scenarios such as I car applications, personal assistants and Intelligent Environment which are more complex. 

In order to develop SDS, there are technologies and designs that have been applied. The setting up of SD is very complex as a result of the technologies which have been employed in order to process the human language and that is a very complex task. These technologies have been deployed in implementing numerous module systems and characterization is based on a number of factors such as the manual possibility of defining the module’s behaviours, the capacity of obtaining modules from sample training automatically and the goal of the module.  Below are some of the technologies which have been used in building the SDS.

1. Automatic speech recognition (ASR)

A speech recognizer is a module that has been used to implement ASR and its major goal is to acquire the speech of the user and then a recognition hypothesis is generated as the output and this is mostly the sequence of the word that corresponds to what was said by the user. In most cases, the recognition hypothesis generated may contain errors in terms of the deleted or inserted or substituted words. For most SDS, they tend to use the N-best recognition method where a list of N recognition hypothesis is generated by the recognizer.

2. Spoken language understanding (SLU)

In SLU, the output which is generated by the speech recognizer becomes the input for the module of spoken language understanding. The main goal of the module is to acquire the input’s semantic representation which is stored in one or more frames form. Due to inherent difficulties in natural language processing such as ellipsis and ambiguity, the SLU module task is very challenging.

3. Dialogue management (DM)

The output that is generated by SLU is the input for dialogue management and the main goal of this module is to make a decision on what must be done next in relation to the user’s input such as the provision of information to the user or to prompt the user to do a sentence rephrasing or confirmation of some words that are uncertain to the system.

4. Natural language generation (NLG)

Whatever has been decided to be done next by the dialogue manager is the input to the natural language generation module. The decision generated is abstractly represented and hence the goal of the NLG module is the transformation into one or more sentences that are semantically and grammatically correct and they are also coherent the dialogue current status in a text format.

5. Text to speech synthesis (TTS)

The final stage of the SDS is the output that is generated by NLG as the input for text to speech module which then transforms the sentences into dialogue system speech. The TTS allows for the transformation of the speech into the various arbitrary text as opposed to other speech synthesis method and this aids in avoiding the necessity for pre-recording of the words in the sentences.

Spoken dialogue system are computer programs that are designed in a way that they have the ability to accept speech as the input and then generate an output of speech and this engages the user in conversion as per the given task.