Assignment - Haskell Program for Regular Expression Matching
Your assignment is to modify the slowgrep.hs Haskell program presented in class and the online notes, according to the instructions below. You may carry out this assignment in groups of at most 3 students, with the number of items required to complete dependent on group size.
Glushkov Algorithm
You are to modify the matching logic of slowgrep.hs to use the Glushkov NFA method of regular expression search rather than the simpleton search method presented initially. Your modified program should be called hgrep.hs and implement precisely the same command line interface as slowgrep.hs.
Search vs. Match
You are to modify your program to search within lines rather than just find lines that match the regular expression. Thus a line should be selected if there is a match to the regular expression anywhere on the line.
Additional Regular Expression Features
For each of the components identified below, you are to perform the following tasks:
Extend the definition of the RE data type (as necessary) to incorporate the new feature.
Extend the regular expression parser to correctly parse instances of the new feature into the correct AST structures.
Extend the matching logic to correctly deal with the feature.
Depending on your group size you complete additional tasks as follows:
For a student working alone or in groups of 2, you and your group complete Features 1 through 4.
For students working in groups of 3, you and your group complete Features 1 through 5.
Feature 1: Escaped Metacharacters
The initial definition of Haskell regular expressions did not permit matches to metacharacters such as *, (, ), or |. Instead these characters were reserved for their role as metacharacters to indicate particular types of regular expression formation.
In this part, you are to extend the RE system to use the backslash (\) as an escape metacharacter. When the backslash metacharacter is used, the following character loses its meaning as a metacharacter and is instead interpreted as a literal character, instead. Thus the regular expression "ab\*d" stands for the regular expression which matches the 4-character string "ab*d". Backslashes themselves can be escapsed; the regular expression \\\\\\ stands for the 3-character string \\\.
Feature 2: Any Metacharacter
In this part, you are to extend the RE system to use the "." (period or dot) character as a metacharacter that means "any character". For example, , the regular expression a.e will match 3-character strings so long as they begin a and end with e, with any character in between them.
Feature 3: Option Metacharacter
In this part, you are to extend the RE system to use "?" (question mark) as a metacharacter that means zero or one occurrences of the previous item. For example, the regular expression "ab?c" matches both the strings "abc" and "ac". The option metacharacter should have the same precedence and associativity as the "*" metacharacter.
Feature 4: Plus Metacharacter
In this part, you are to extend the RE system to use "+" (plus sign) as a metacharacter that means one or more occurrences of the previous item. For example, the regular expression "ab+c" matches both the strings "abc", "abbc", abbbc and so on. The plus metacharacter should have the same precedence and associativity as the "*" metacharacter.
Feature 5: Character Classes
In this part, you are to extend the RE system to include character classes with the following syntax:
::= ("[" | "[^") "]"
::= {- |
- "-"
- }
- ::= | "\"
A character class matches any single character in the class. A range is a set of consecutive characters according to their Unicode codepoint values. If the opening delimiter is "[^", the class is negated, that is, it consists of all characters not explicitly listed.
Attachment:- Assignment Files.rar