NewsBizkoot.com

BUSINESS News for MILLENIALAIRES

IIT Hyderabad’s Language Tech Centre Celebrates Silver Jubilee

5 min read

9 December 2024: International Institute of Information Technology Hyderabad’s (IIITH) Language Technologies Research Centre (LTRC) celebrated the 25th year of its inception with a two-day event over the weekend that included several talks and panel discussions on the vision, mission, evolution and future of language technologies. The event highlighted the importance of academic research connected with real-life problems, the need for collaboration in technology development, and the role of language technology across regional languages for social good.

To mark the occasion LTRC launched free to download BhashaVerse language resources machine translation models. The multitask encoder-decoder model can translate from any of the 36 languages into any other including Tulu, Bodo, Bhojpuri, Magahi, Santhali etc. It helps with machine translation evaluation, error identification and automatic post editing of 10 billion sentence pairs.

The Centre also announced the development of BhashaVerse LLM decoder model for 36 Indian languages that can be used for summarisations, Q&A etc with some fine tuning. In addition to these models, LTRC released synthetically generated and curated 10 billion Bhashik datasets for Indian language to Indian language pairs; a generic dataset, one in the Education domain that works across 17 different fields in English and 5 Indian languages and one for the Health domain in English and 8 Indian languages. For the first time in Indian languages, a dataset for automatic post editing of text and machine translation evaluation has also been made available.

Established in October 1999 in response to the preeminent scientific challenge of enabling machines to read, understand and derive meaning from human languages especially in the Indian context and Indian languages, it was, perhaps, the first such research theme focused centre in the country. Today LTRC is the largest academic centre of speech and language technology in South Asia.

Initially focused on areas like Machine Translation, Semantic Parsing, and Information Extraction, the centre has expanded its research portfolio to include Speech Recognition, Text Generation, Sentiment Analysis, Dialogue Systems, and more. Over the years, LTRC has built a thriving ecosystem of researchers, students, and collaborators who have carried forward the centre’s pioneering work in various directions, many of which had their origins at LTRC itself.

The goodwill that the Centre has garnered over the years was self-evident by the active participation of several notable figures in the field of language technologies from academia and industry, as well as faculty, students, alumni and collaborators at the event.

Commenting on LTRC and its impact, Prof Vasudeva Varma, Head, LTRC said, “As the first natural language processing centre in the country we have pioneered several aspects of research and education. We have trained brilliant minds who are leading advances worldwide. Our lasting contribution in research, open datasets, tools and technologies have made huge impact. Our successful technology transfers have brought industry and academic closer. We look forward to continuing to push the boundaries and our legacy of innovation”.

A brief overview of LTRC’s trajectory

1999 – India’s pioneering NLP team, Akshar Bharati moves to IIITH
2001-2004 – Machine Translation Pipeline shaped up under Shakti (Analysis-Transfer-Generate-based) MT project and Shiva (Example-based ) MT project
2002 – NLP Association of India was founded with active participation from LTRC
– First International Conference on Natural Language Processing (ICON) was organised; in its 21st year now
– First academic offering of PhD in Computational Linguistics was introduced.
2004 – MTech in Computational Linguistics funded by MeiTy introduced
– MS by Research in CL introduced
2005-2008IIITH led a consortium of 12 institutes on the Sampark project (Indian language-Indian language Machine Translation)
– Only non-US institute to participate in a Multi-layer Hindi-Urdu Treebank project by the US National Science Foundation
– IREL participated in a multi-institute project called CLIA to create India’s first cross-lingual search engine (Web Khoj); the technology was later transferred to Rediff.com
– Multi-document summarization technology developed; ranked world’s best during 2006-10
– First IIITH startup, SETU was founded by Prof. Vasudeva Varma and Prasad Pingali, 1st PhD student of LTRC
– Speech database and speech recognizer technology transferred to HP Labs, India
– High quality Text to Speech developed and transferred to Nokia, China
– Reading aid for visually impaired – a screen reader in Indian languages developed under a Govt. of India initiative
2005 – MPhil in CL for Humanities students; 1 year PG Diploma in CL to train translators and language resource developers
2006 – Summer School for NLP was launched

2009-2014 – Phase 2 of SAMPARK project got underway

– Indian language Treebank multi-institute project was launched with IIITH as lead

– Basic tools and corpora for 9 Indian languages were launched

– Personalised search engine technology was developed and transferred to Nokia Labs, Finland

LTRC Akshar Speech startup founded

– Text to Speech and Automatic Speech Recognition in Telugu was developed

2009 – First academic program exposing +2 students to Computational Linguistics launched; a dual degree program of BTech in Computer Science and MS by Research in Computational Linguistics

– Panini Linguistics Olympiad (PLO) was held in collaboration with Microsoft India, exposing high school students to computational linguistics

2014 – IIITH began hosting the PLO training camp for qualification to the International Linguistics Olympiad

2015-18 – MT/NLP research resulted in 18 SAMPARK systems

– Indian language Treebanks were developed

– IREL launched a technology suite, ‘Fake-o-Meter’ to tackle fake news; To identify hate speech and to repurpose content in social media

– Speech Technologies developed a recognition algorithm for airborne systems – used by HAL; language identification in practical environments; noise reduction algorithm for speech recognition

2019-2024– Solutions for social needs in the areas of healthcare, education and the judiciary were developed

– 80 Swayam courses were transcribed, translated and subtitled into 8 Indian languages

– Speech-to-Speech Machine Translation Project of Indian languages, HiMangY, led by IIITH was kicked off; a pipeline comprising automatic speech recognition, machine translation and text-to-text speech was created as part of the MeiTy-funded National Language Translation Mission, titled Bhashini.

– Project Angel launched to understand toxicity in social media content and to create awareness among girls on cyber bullying, sexism and body shaming

– IndicWiki project launched to create encyclopaedic content in various Indian languages

– Startup SubtlAI founded for Q&A of unstructured data

– Stutter detection platform was created for All India Institute of Speech and Hearing, Mysore

– Speech-to-Speech translation and performance measurement platform was created for broadcast speeches and talks