Short and mid-term databases for applications


Database requirements for teleservices (LRE-63314-D1.1)

The objectives of this report is twofold:
To analyse the market demands in the range of teleservices and their impact on the design of the speech database.
To specify the impact from current and near-future speech recognition technology on the design of the speech database.

Existing data multimedia catalogue (LRE-63314-D1.2.1)

This catalogue describes some speech databases in 4 languages that should be delivered to ELRA to be distributed. For some of them speech samples are embedded in the document.

Procedure for exchanging existing data (LRE-63314-D1.2.2)

This document describes two different procedures that can be followed to exchange the six speech databases described in D1.2.1. The central approach via ELRA is recommended but also a draft contract for bilateral exchanges is supplied.

Specification of short/mid-term databases (LRE-63314-D1.3.1&1.3.2)

This deliverable report documents the specification of an agreed and prioritised set of databases needed in the short to mid-term within Europe.

Reduced needs and specifications for the database (LRE-63314-D1.4.1)

This document provides a specification of the telephone speech databases to be collected by the industrial partners in the SpeechDat project in 8 languages - Danish, English, French, Swiss French, German, Italian, Portuguese and Spanish.


Working standards, distribution and production of SLR


Working standards for speech databases directed towards short and medium term applications (LRE-63314-D3.1.1.1)

Topics adressed: physical recordings, physical conditions, linguistic contents, database and storage issues, transcription and validation, assessment.

General Working Standards: The EAGLES Handbook of Standards and Resources for Spoken Language Systems. (LRE-63314-D3.1.1.2)

As a statement of general working standards, this report presents the EAGLES handbook of standards and resources for spoken language systems which is approaching its draft form for wide dissemination to the European spoken language R&D community.
The report describes the background to the general EAGLES activities, explaining the organisational structure and outlining the workplan for the project. In particular, attention is also drawn to the newly created European Language Resources Association.

Computer-coding the IPA

What follows is a proposed keyboard-compatible coding for the entire set of IPA symbols. It covers everything on the 1993 IPA Chart, including diacritics and tone marks, and is put forward as a proposed standard way to transmit IPA-transcribed material by e-mail and for similar purposes. These proposals are fully set out with a reasoned explanation in a 7000-word draft article "Computer-coding the IPA: a proposed extension of SAMPA".

The future of the Speech Assessment Methodology tools (LRE-63314-D3.1.2.2)

A review of the current state of the SAM multi-lingual speech input/output assessment tools and ways of supporting them in the future. The Speechdat corpora and the SAM tools are fundamental and essential resources for those working to keep European speech and language technology advancing ahead of its competitors. The future of the SAM tools is vital to the successful use of the Speechdat corpora and to the promotion of European standards of assessment.

Feasibility of automatic annotation and building pronunciation lexica from corpus material (LRE-63314-D3.1.2.3)

This report discusses the feasibility of automatic annotation and presents the PHONYP and PHONSEG applications as an example of an automatic segmentation and labelling system.

Validation of databases (LRE-63314-D3.1.3)

This document presents a list of guidelines for validation procedures to be carried out in order to ascertain a certain quality standard of spoken language resources to be distributed by the ELRA. The methods proposed are chosen such that they are a good balance between achievable quality standards and associated costs of the validation procedure.

Advanced distribution means for spoken language corpora (LRE-63314-D3.1.4)


This report outlines the distribution of Spoken Language Corpora on traditional CD-ROM media and a new approach via network. High capacity CD-ROMs are being introduced, but this is only a marginal improvement in respect to the distribution of SLC. Network access however offers many opportunities: customised SLC, on-line access, and a high degree of protection. However, for network access to be feasible, the bandwidth of existing networks will have to be increased.

Tasks of a European Center for Spoken Language Resources (LRE-63314-D3.2.1.1)



Organisational Form and Launching of ELRA (LRE-63314-D3.2.1.2&D3.2.1.3)


Relations of a European Center for Spoken Language Resources (ECSLR) with on-going projects. (LRE-63314-D3.2.2)