.NET Framework Bookmark and Share   
 index > Microsoft Codename 'Oslo' > Skip lines
 

Skip lines

Hi,

I am trying to create a language for parsing a text file which has regularly structured text lines interleaved with text strings that may have arbitrary structure. I need of course to parse only regular information and ignore all the rest. Here is a sample file:
...............
INFO: 43
INFO: 56
blah-blah
INFO: 90
some other arbitrary text
...............
The bottom line is I know how to create a grammar for the strings like "INFO: XX". What I don't know is how to tell the M parser to skip those arbitrary text in between.
Is it possible?

Thank you,
-Greg


khgreg

Hi,

It is surely possible, you may find different approaches but here is the first onethat came into my mind:

1) Create a rule to capture the structure of that arbitrary text.
2) Add this rule as one of the possible derivations in an "interleave" rule.

For 1), you may have different scenarios appart from the one you posted here. I considered every line that does not start with "INFO:" is arbitrary text. This would be one possible grammar:

module M
{
language L
{
syntax Main = line+;
syntax line = infoLine | arbitraryText;
syntax infoLine = info number;
token info = "INFO:";
token number = '0'..'9'+;
//syntax arbitraryText = "blah-blah" | "some other arbitrary text";
syntax arbitraryText = any;
interleave skippable = arbitraryText |"\r" | "\n" | " ";
}
}

Note you can either specify a set of strings to be skipped, a more defined set of patterns(numbers,random text, etc)or a wildcard such as "any".

Hope this helps,
M.

  • Proposed As Answer byMichael Wolbert Tuesday, June 16, 2009 8:58 AM
  • Marked As Answer bykhgreg Tuesday, June 16, 2009 3:02 PM
  •  
Miguel.Llopis
Hi,

I'm not at my Oslo dev pc at the moment, so this is a wild guess without any testing. ;-)

I think you should use the interleave statement in a way like this: interleave = ^("INFO:" Digits); // ignore anything BUT INFO..

Hope this helps you,
Michael
Michael Wolbert

Hi,

It is surely possible, you may find different approaches but here is the first onethat came into my mind:

1) Create a rule to capture the structure of that arbitrary text.
2) Add this rule as one of the possible derivations in an "interleave" rule.

For 1), you may have different scenarios appart from the one you posted here. I considered every line that does not start with "INFO:" is arbitrary text. This would be one possible grammar:

module M
{
language L
{
syntax Main = line+;
syntax line = infoLine | arbitraryText;
syntax infoLine = info number;
token info = "INFO:";
token number = '0'..'9'+;
//syntax arbitraryText = "blah-blah" | "some other arbitrary text";
syntax arbitraryText = any;
interleave skippable = arbitraryText |"\r" | "\n" | " ";
}
}

Note you can either specify a set of strings to be skipped, a more defined set of patterns(numbers,random text, etc)or a wildcard such as "any".

Hope this helps,
M.

  • Proposed As Answer byMichael Wolbert Tuesday, June 16, 2009 8:58 AM
  • Marked As Answer bykhgreg Tuesday, June 16, 2009 3:02 PM
  •  
Miguel.Llopis
Hi Michael,

That would be an elegant approach, the only problem is that,at this moment, operator "^" cannot be applied to text expressions longer than one char.

There is a way to represent that ^("INFO:"), character by character, and could be something like this:

token arbitraryValue = (^'I' | 'I' ^'N' | 'I' 'N' ^'F' | 'I' 'N' 'F' ^'O' | 'I' 'N' 'F' 'O' ^':')+;

IMHO, this issue complicates a little bit the solution using ^.Nice observation, btw. :)

Thanks,
M.
Miguel.Llopis
Hi Michael,

That would be an elegant approach, the only problem is that,at this moment, operator "^" cannot be applied to text expressions longer than one char.

There is a way to represent that ^("INFO:"), character by character, and could be something like this:

token arbitraryValue = (^'I' | 'I' ^'N' | 'I' 'N' ^'F' | 'I' 'N' 'F' ^'O' | 'I' 'N' 'F' 'O' ^':')+;

IMHO, this issue complicates a little bit the solution using ^.Nice observation, btw. :)

Thanks,
M.

Hi Miguel,

Ah, I missed that one. Like you said it really does complicate the solution: pretty unreadable.

Is this one on the feature wishlist of mg? :-)

Cheers,
Michael
Michael Wolbert
Miguel,
Thanks a lot! That's exactly what I need. I've been playing with "any" keyword but didn't realize that I have to place arbitraryText both into interleave and syntax for "line".

-Greg
khgreg
Note that this has the same result (and demonstrates something about the !greediness of "syntax" vs the greediness of "token"):

module MSW_Test239 {
language MSW_Test239 {
syntax Main = (any* "INFO: " res:Number any* => res)+;
token Number = '0'..'9'+;
}
}
'blog: http://diakopter.blogspot.com/ JSMeta: http://jsmeta.org/
Matthew Wilson _diakopter_

You can use google to search for other answers

Custom Search

More Threads

• How to assign an extent to an field?
• Utilizing the tree view in Quadrant, what tree structures can i use?
• Foreign key constraints across modules..?
• mg Inline Rules Cause Duplicate Productions in Generated CS Code?
• VS Editor Features that I am missing in IntelliPad
• M Grammer default output; dropping token type
• MGrammar: nest syntax and line termination
• Understanding Oslo's Mindset
• Error: Expected one token, but found many
• Is there a way to extract modules and types from M File ?