Punjabi GrammarChecker

Punjabi University Patiala,India,Website http://www.universitypunjabi.org

http://www.universitypunjabi.org/sangam/

http://www.advancedcentrepunjabi.org/intro1.asp




	Project Background

	Objectives

	Application

	Project Time-Line

	Team Members

The Project

Development of a grammar checking software for Punjabi, capable of detecting various grammatical errors and providing suggestions to correct those errors, if possible.
A grammar checker of a language is a system that detects various grammatical errors in a given text based on the grammar of that particular language, and reports those errors to the user along with a list of helpful suggestions to rectify those errors.
The input text will be first given to a preprocessor, which will break the input text into sentences and words. Then the tokenized text will be passed on to a morphological analyzer, which will provide grammatical information for each word in the given text. Then a POS tagger will perform part of speech tagging. Then this POS tagged text will be passed on to a phrase chunker to mark phrase and clause boundaries. Then in the last stage, syntax/agreement checks will be performed based on the POS tag information at the phrase level and then at the clause level. Any discrepancy found will be reported to the user along with suggested corrections and detailed error information.

Project Background

Grammar checking is one of the widely used tools within language engineering. For the past few years, commonly used word processors provide the grammar checkers for most of the foreign languages. However, no such system is available for any of the Indian languages. The use of computer is gaining popularity in the day-to-day tasks of word processing, writing reports, and printing official documents etc. Moreover, all these tasks demand text to be grammatically correct. Therefore, a grammar checking system is the obvious requirement in such a situation. Recently, Microsoft has released a Hindi version of its popular word processing product, Microsoft Office. It is a commercial product and details of the grammar checker in it (if any) will not be made open. Therefore, to the best of our knowledge this work will be the first of its kind for Indian languages, in general and Punjabi, in particular. Indian languages have many things in common, so the present work could be well extendable for other Indian languages too.

Objectives

The objectives of this research work include the following:
To adapt and enhance the existing morphological analysis and generation, part of speech tagging, and phrase chunking systems
To develop the tools for parsing and error detection for compound and complex sentences
To assemble these tools and develop a complete grammar checking system for Punjabi language that will detect the maximum number of possible errors and will provide suggestions for rectifying those errors, wherever possible.

Applications

This system could be used with other information processing systems where the input needs to be corrected grammatically before processing.
Parts of this system like morphological analyzer, morphological generator, part of speech tagger, phrase chunker etc. could well be used at various stages in machine translation systems, after making slight modifications.
This system could be used for checking essays, formal reports, and letters etc., written in Punjabi.

Second language learners can use this system as a language aid to learn grammatical categories functioning in Punjabi sentences along with grammatical structure of Punjabi.

This system as a whole can also be used as a post editor for various other systems like machine translation system and optical character recognition system for Punjabi.
Technology of this system could be used to develop grammar checking systems for other languages sharing grammatical features with Punjabi.

Project Time-line

After 06 months

Creating a corpus of ten thousand sentences for training and testing
Adapting the existing morphological analyzer and part of speech tagger

(developed by other NLP groups)

After 12 months

Enhancing phrasal or multiword expression database
Enhancing the performance of phrase chunker or shallow parser
Digitizing Punjabi grammar rules
Developing a parser for PUNJABI

After 18 months

System integration
Continuing the development of parser for PUNJABI
Digitizing new error rules to improve accuracy for simple sentences including detection of some style errors

After 21 months

Beta version of the System
Extending the error coverage to compound and complex sentences
Final Testing and evaluation

Team Member

The project would be implemented by “Ministry of Communication & Information Technology” (MC&IT) , New Delhi.
The Project Member at PUNJABI UNIVERSITY are:

Chief Investigator

Dr. Gurpreet Singh Lehal

Project Linguist

Dr. Harvinder Pal Kaur

System Analyst

Mr. C.P. Kamboj (Chhinder Paul)

Lexical Entry Operator

Miss. Mandeep Kaur
Mr. Sandeep Malhotra