|
Hi, I am trying to create a language for parsing a text file which has regularly structured text lines interleaved with text strings that may have arbitrary structure. I need of course to parse only regular information and ignore all the rest. Here is a sample file: ............... INFO: 43 INFO: 56 blah-blah INFO: 90 some other arbitrary text ............... The bottom line is I know how to create a grammar for the strings like "INFO: XX". What I don't know is how to tell the M parser to skip those arbitrary text in between. Is it possible? Thank you, -Greg
| | khgreg | Hi,
It is surely possible, you may find different approaches but here is the first onethat came into my mind:
1) Create a rule to capture the structure of that arbitrary text. 2) Add this rule as one of the possible derivations in an "interleave" rule.
For 1), you may have different scenarios appart from the one you posted here. I considered every line that does not start with "INFO:" is arbitrary text. This would be one possible grammar:
module M { language L { syntax Main = line+; syntax line = infoLine | arbitraryText; syntax infoLine = info number; token info = "INFO:"; token number = '0'..'9'+; //syntax arbitraryText = "blah-blah" | "some other arbitrary text"; syntax arbitraryText = any; interleave skippable = arbitraryText |"\r" | "\n" | " "; } }
Note you can either specify a set of strings to be skipped, a more defined set of patterns(numbers,random text, etc)or a wildcard such as "any".
Hope this helps, M. - Proposed As Answer byMichael Wolbert Tuesday, June 16, 2009 8:58 AM
- Marked As Answer bykhgreg Tuesday, June 16, 2009 3:02 PM
-
| | Miguel.Llopis | Hi,
I'm not at my Oslo dev pc at the moment, so this is a wild guess without any testing. ;-)
I think you should use the interleave statement in a way like this: interleave = ^("INFO:" Digits); // ignore anything BUT INFO..
Hope this helps you,
Michael | | Michael Wolbert | Hi,
It is surely possible, you may find different approaches but here is the first onethat came into my mind:
1) Create a rule to capture the structure of that arbitrary text. 2) Add this rule as one of the possible derivations in an "interleave" rule.
For 1), you may have different scenarios appart from the one you posted here. I considered every line that does not start with "INFO:" is arbitrary text. This would be one possible grammar:
module M { language L { syntax Main = line+; syntax line = infoLine | arbitraryText; syntax infoLine = info number; token info = "INFO:"; token number = '0'..'9'+; //syntax arbitraryText = "blah-blah" | "some other arbitrary text"; syntax arbitraryText = any; interleave skippable = arbitraryText |"\r" | "\n" | " "; } }
Note you can either specify a set of strings to be skipped, a more defined set of patterns(numbers,random text, etc)or a wildcard such as "any".
Hope this helps, M. - Proposed As Answer byMichael Wolbert Tuesday, June 16, 2009 8:58 AM
- Marked As Answer bykhgreg Tuesday, June 16, 2009 3:02 PM
-
| | Miguel.Llopis | Hi Michael,
That would be an elegant approach, the only problem is that,at this moment, operator "^" cannot be applied to text expressions longer than one char.
There is a way to represent that ^("INFO:"), character by character, and could be something like this:
token arbitraryValue = (^'I' | 'I' ^'N' | 'I' 'N' ^'F' | 'I' 'N' 'F' ^'O' | 'I' 'N' 'F' 'O' ^':')+;
IMHO, this issue complicates a little bit the solution using ^.Nice observation, btw. :)
Thanks, M. | | Miguel.Llopis | Hi Michael,
That would be an elegant approach, the only problem is that,at this moment, operator "^" cannot be applied to text expressions longer than one char.
There is a way to represent that ^("INFO:"), character by character, and could be something like this:
token arbitraryValue = (^'I' | 'I' ^'N' | 'I' 'N' ^'F' | 'I' 'N' 'F' ^'O' | 'I' 'N' 'F' 'O' ^':')+;
IMHO, this issue complicates a little bit the solution using ^.Nice observation, btw. :)
Thanks, M.
Hi Miguel,
Ah, I missed that one. Like you said it really does complicate the solution: pretty unreadable.
Is this one on the feature wishlist of mg? :-)
Cheers,
Michael | | Michael Wolbert | Miguel, Thanks a lot! That's exactly what I need. I've been playing with "any" keyword but didn't realize that I have to place arbitraryText both into interleave and syntax for "line". -Greg
| | khgreg | Note that this has the same result (and demonstrates something about the !greediness of "syntax" vs the greediness of "token"): module MSW_Test239 { language MSW_Test239 { syntax Main = (any* "INFO: " res:Number any* => res)+; token Number = '0'..'9'+; } }
'blog: http://diakopter.blogspot.com/ JSMeta: http://jsmeta.org/- Edited byMatthew Wilson _diakopter_ Saturday, July 11, 2009 4:54 PMgrammaro
-
| | Matthew Wilson _diakopter_ |
|