Shahmukhi to Gurmukhi Transliteration

Punjabi University Patiala, India, Website http://www.universitypunjabi.org

http://www.universitypunjabi.org/sangam/

http://www.advancedcentrepunjabi.org/intro1.asp

http://www.apdip.net/projects/ictrnd/2005/185/index_html/view



	Home Page

	Project Background

	Objectives

	Project beneficiaries

	Project Time-Line

	Team Members

	Project Progress

Starting Date of the Project: March 2006

Month of the Year	Progress

First Year -2006

March-06	A detailed study of the various standards and formats like INPAGE, UNICODE, Nastalik based fonts has been performed. Language text and structure analysis of both Shahmukhi and Gurmukhi scripts performed. System Design completed.
April-06	Inpage to Unicode Converter In order to develop the corpus for Shahmukhi it is necessary to have a converter for Inpage to Unicode formats, as majority of source text (Shahmukhi) is available in InPage only. A utility for conversion of inPage text to unicode format has been developed. Selection of 25,000 most frequently used Shahmukhi-Gurmukhi terms completed, based on frequency analysis of Shahmukhi corpus. Design of Lexical entry interface completed and 5000 Shahmukhi-Gurmukhi entries digitized
May-06	Phonetic based mapping table for transliteration from Shahmukhi to Gurmukhi text finalised. Knowlwdge base of Shahmukhi-Gurmukhi tansliteration rules created. 5000 more Shahmukhi-Gurmukhi entries digitized.
June-06	Shahmukhi Corpus A Corpus of Shahmukhi has been created having 5 Lakh Total Words. Tools for Corpus Analysis Corpus analysis tools have been developed to perform the various analysis like Word Frequency, Bi-Gram and Tri-Gram on the corpus. In total 15,000 Shahmukhi-Gurmukhi entries digitized.
July-06	The size of Shahmukhi Corpus has been increased form 5 Lakh to 10 lakh (Total Words). 70% work has been performed to generate a rule based primitive version of the Transliteration software. In total 20,000 Shahmukhi-Gurmukhi entries digitized.
August-06	11 lakh (Total Words) Shahmukhi Corpus Ready. A rule based primitive version of the Transliteration software generated. 5000 more Shahmukhi-Gurmukhi entries digitized.
September-06	The size of Shahmukhi Corpus increased to 12 lakh total words. Testing of rule based primitive version completed. Working on Integration of Shahmukhi Gurmukhi dictionary with primitive version.
October-06	Manual data entry for Shahmukhi Corpus has been started due to lack of available soft data. Unigam, bi-gram and Tri-Gram Results generated from existing corpus.
November-06	Integration of Shahmukhi Gurmukhi dictionary with primitive version completed. Testing of this primitive version completed.
December-06	Web based version of InPage to Unicode Converter created. Working on Shahmukhi and Gurmukhi Corpus Analysis. The initial design of the Morphological Analyzer, being developed with linguistic experts.
January-07	Web based version of Shahmukhi to Gurmukhi Transliteration started. 70% of Shahmukhi and Gurmukhi Corpus Analysis completed.
February-07	Front end and back end design of Web based version completed. Shahmukhi and Gurmukhi Corpus Analysis Completed. 10,000 Shahmukhi words of Morphological Analyzer, digitised with linguistic experts.
March-07	60% of Integration of Web based version completed. In total 20,000 Shahmukhi words of Morphological Analyzer, digitised with linguistic experts.
April-07	Integration of Web based version completed and preparing the first Beta version. More 10,000 Shahmukhi words of Morphological Analyzer, digitised.
May-07	Beta version of Shahmukhi to Gurmukhi Transliteration Solution for Networking launched
June-07	Performed the following Enhancements in beta version: Frontend visualization improved. Addition of Roman Pad along with Gurmukhi and Shamukhi Pads. Backend support of Dictionary improved Online Shahmukhi Web Page Transliteration^beta