.NET Framework Bookmark and Share   
 index > Microsoft Codename 'Oslo' > Is there any Antlr ---> MGrammar translator around?
 

Is there any Antlr ---> MGrammar translator around?

...in any shape or form?
  • Changed TypeKraig BrockschmidtMSFT, ModeratorThursday, July 09, 2009 3:40 PMThe thread itself is progressing into a discussion given that the initial answer to the question is a basic "no".
  • Edited byCeyhun Ciper Monday, July 06, 2009 7:15 PM
  •  
Ceyhun Ciper
Not that I'm aware of but that sounds interesting. I'd be interested in seeing your ANTLR grammar to see how comparable the two are.
justncase80
bengillis has already done it manually @ http://bengillis.wordpress.com/2009/04/01/converting-an-antlr-grammar-to-mgrammar/.

But the real point of interest here is to get to other souped-up grammars via an automated Antlr--->MGrammar translator and thus leverage the ones found @ http://antlr.org/grammar/list.

Ceyhun Ciper
sure, it would be straightforward to create such a tool... either antlr or M could host the translator. Which would you prefer? M I assume?
Matthew Wilson _diakopter_
M, of course.

But the real challenge here is the inline ASTconstruction in Antlr via code blocks (Java, C#, C...) à la Yacc.

Example (Java):

expr : ID
          {
               Integer v = (Integer)myDictionary.get($ID.text);
               $value = v.intValue();
          }

Maybe one could store those tree construction segments as expression trees (provided we use C# in Antlr)and interpret them on second pass?
Ceyhun Ciper
Okay. :) I just readhttp://jnb.ociweb.com/jnb/jnbJun2008.html and I got a good overview of the system.

I bet it's similar to OMetaSharp's implementation of (and OMeta's creator's original JavaScript implementation of) OMeta, wherein they just include the tree/projection code in the generated code output.

OMeta calls them "host expressions", and they can also be used in the rhs of the rule itself as predicates (as seems in antlr also). Sorry, I'm just a lot more familiar with OMeta than antlr.

My first thought on how to implement this on top of M is to create a postprocessor step (in a second pass as you mentioned), with the nodes marked in the MGraph as to how they should be postprocessed (which original code block to use)
and run a program created from the user ast code that processes those nodes. They could be C# lambda expressions, if you wanted the optional implied "return" statement (instead of assigning to $value or something, though I bet that's b/c there are other side-effects it can effect).

My understanding is that currently, when M's parser runtime detects a final/commit point, it rolls up the projections staged to perform for that branch.

It would be a nice feature if that were able to be hooked with arbitrary user C# code at parse time though.

Presumably the M module {} (or just the language {} member?) would need some "using " statements so the code generator (or just template processor) could resolve symbols properly.

I recommend posting such "host expressions in projections" as a suggestion on the M feedback site. I'll rate it 5 if/when you do...

Also, another (related) nice feature that would help make M grammars as expressive as antlr and/or ometa grammars is (as I mentioned above) the ability
to use the results of the ast generation (or even inline predicates) to determine how the parse proceeds. But I'm biased, since afaik this is much more straightforward in a top-down approach. :) But again, I recommend posting "inline host expression predicates" as a suggestion on Connect.... I'll do it myself if you like... :)
Matthew Wilson _diakopter_
Thank you very much for the deep interpretation that you bring on the subject.

I agree with you that "host expressions" would give us, lay developers,an order of magnitude more power than presently available.

That's also why they should not be provided, or better yet, be avoided; because they are dangerous in the hands of developers like me.

Even as it is, MGrammar is already very powerful; maybe even too much so...

For me, Mgrammar today is passablydeclarative (except for parameters); if it is kept declarative, it can take advantage of the
new concurrency opportunities, otherwise it will confuse ordinary DSL developers with obscure ambiguities and errors. (By concurrency I don't
mean parallel compilation but stuff like editor tasks such as statement completion, intellisense, etc.)

Without any irreverenceto your work, I would like to state that Microsoft should keep it simple for would-be DSL developers (like yours truly)
to provide a foolproof way of developing simple language processors; remember, not every developerhas your expertise in
debugging one of those. So the more restrictive it is, the more discipline itimposeson thinking about the grammar up-front.

As the first task in designing a DSL is getting the grammar right, if a lay developergenerates aquick& dirtygrammar of about 500 lines,
but all intractable, what will happen to the rest of the project? After all, grammar development is not thefinal goal.

Curious to hear your thoughts,
Ceyhun

ps: Sounds like I did the 500 lines, doesn't it?
pps: In Intellipad we see the AST, in Antlr (I meanAntlrWorks) we don't.
Ceyhun Ciper
Sure, but hopefully the q&d 500-line grammar can be prevented with blatant/obtrusive warnings about the complexity of the generated grammar. There's another (doable, imho!) feature request for Intellipad: "show me this grammar's Big-O and the Big-Omega expression [for this particular input], and whether there's a Big-Theta expression". A simpler version would be "Show ambiguities" except, "Highlight pathological productions" (pattern-patterns that could *probably* be rewritten far more efficiently if the user had in mind what we think they had in mind).

Anyway, enough dreaming. :)

Here's how I break down Intellipad DSL view users in my mind:
  1. Users writing a grammar for an existing "input format". In other words, samples of the language already exist, and the format is already defined in some form, written or otherwise, degenerate or otherwise, ambiguous or otherwise. This covers users who would currently use a handwritten parser in a language designed for such things (e.g. Perl), the BizTalk mapper, the BizTalk Flat-file Schema Wizard (other than Perl, my favorite as of 2006 R2), or some other parser generator. The use cases here are for *data* input, where the purpose of the parsing is to *store* the data (and perhaps also analyze/correlate/warehouse it). These users need an easy way to store their existing input instances as test cases for the grammar they are writing, and a way to persist the MGraph output of those parses as "the intended/successful output" of the grammar, for a given revision/checkpoint, so that they can have a baseline against which to test later revisions to the grammar.
  2. Users writing a grammar for a data "input format" they are designing. This is the same as #1, except the user probably needs more feedback about possible ambiguities in their grammar... and perhaps analysis as to whether (and which) information in the input can ever be "lost" in the parse, such as discarded tokens, in case the user did not intend such a thing. Whether the user is designing "yet another .ini format" or complex markup, the fact that it's merely a data input (or serialization/marshaling) format is determined by the language not containing internal (named or relative) references and the language not containing any form of evaluation or computation expression. That is, it's just *data*, meant to be stored.
  3. Users writing a grammar for a language (DSL or general purpose programming language) they are *designing* with expressions and/or control flow . Samples of the input language may or not already exist, and the purpose of the language may or may not already be defined. The language's instances may need symbol resolution or type checking on top of recognition in order to be considered valid, but the difference is that instances of the language denote computations/evaluations, or in other words, express logic. Use cases here are the typical use cases demoed for DSLs, where the grammar developer wants to provide the user a constrained subset of a general purpose programming language (such as the Excel formula box as a subset of VBA, or embedding a mini IronPython script editor in an ERP configuration form, except constraining the input to particular classes and syntactic forms). Yes, the tree/graph output is still going to be "just data" (especially if it's ending up in a Repository), but the ideal design-time parsing/grammar UI features are quite different from a user describing a data serialization format, since (while designing the language) the grammar specifier also has to think about how particular input instances will be interpreted semantically as well as just parsing to a graph. An interesting example of this category is a X0,000-line .mg file a fellow customer showed me at Lang.NET this year, a complete grammar for a (nearly-)general purpose programming language his employer maintains.

My point is that the MGrammar user needs to understand clearly into which category he/she falls, and it would be really helpful for them to have a handbook explaining particular token/syntax pattern-patterns that are typically used in the various use cases, because they are very different.

If users falls into category 1 or 2, they are likely to want/need to use (something like) Oslo's Repository, since the input format they are implementing is probably for point-in-time (or aggregate) data about some business event. The grammar patterns they will use will be extremely tied to their MSchema formats, of course. But if they are in category 3, their use case/story probably doesn't include persisting the parse result in an RDBMS (or OODBMS or whatever); they are probably writing a front-end for some code-generator or compiler, so their end result doesn't necessarily need the MGraph, since it was just a "throwaway" intermediate representation.

However, here's an example of crossover between the categories (code as data): If, say, a customer is planning to implement an M repository as the backend for an internal source code management/analysis application, they might want their source code to be stored not only in revisioned text form, but also in revisioned AST form as instances of M models (where the M models are modeling source code and the software the source code produces). The application could also integrate with M's database of the CLR assemblies, and analyze dependencies and such.

  • "For me, Mgrammar today is passablydeclarative (except for parameters); if it is kept declarative, it can take advantage of the new concurrency opportunities, otherwise it will confuse ordinary DSL developers with obscure ambiguities and errors. (By concurrency I don't mean parallel compilation but stuff like editor tasks such as statement completion, intellisense, etc.)"
You're describing asynchronous responses to mostly synchronous event streams, not-quite-real-time responsiveness to dynamic user input events. I think I see what you're saying; I agree that it keeps it plainer and simpler to leave out host expressions. My point is that it makes it a lot less elegant (cleanly readable and organized, imho) to denote the grammar (because much more of the semantics are hidden out-of-sight (and therefore also possibly out-of-mind) in another source file). I'll agree with your point if restated as "users who don't need certain features should be able to disable their availability" (such as users designing a data input format not necessarily needing functionality to support complex intra-instance analysis/recognition). But I disagree that keeping it more declarative makes it less complex to implement decent synchronous/interactive language services.

"As the first task in designing a DSL is getting the grammar right"... I disagree. In my opinion, language design and grammar (parser generator input) "design" (really, "implementation") are very different tasks, primarily because it's very easy to write a very badly implemented grammar for a given language (or set of input instances). Yes, a language services design surface such as Intellipad enables one to perform the tasks simultaneously, but this can easily become a detriment, because much of the grammar can be written before the rest of the grammar is thought through. You didn't say whether the (hypothetical?) 500-line grammar writer was designing the input language simultaneous to writing the grammar for it, but it kinda sounds like it. Designing a DSL should start with example input instances, and then, depending on the eventual destination of the parse result (interpretation, code-generation, compilation, re-serialization, or RDBMS persistence), the output schema should be designed. Only then should one begin to think about how to implement the grammar.

Yes, such EBNF-derived grammar grammars do provide a way to visually (in text) model the patterns of and abstractions in the language being specified, but for anything other than the most trivial languages (for which such tools probably shouldn't be used anyway), these patterns can easily obscure the actual recognition and transformation that needs to occur, because it's too easy to fall into the trap of over-engineering by creating needless abstractions and distinctions. The bigger the grammar gets before the language has been fully thought through will only make it more difficult to refactor when the language designer realizes he/she has (probably) overcomplicated the task at hand (designing a denotational format for some data or a simple script language).
  • Without any irreverenceto your work, I would like to state that Microsoft should keep it simple for would-be DSL developers (like yours truly)
    to provide a foolproof way of developing simple language processors; remember, not every developerhas your expertise in
    debugging one of those. So the more restrictive it is, the more discipline itimposeson thinking about the grammar up-front.
I'll go ahead and assert that the proper way to do that ("provide a foolproof way of...") is to provide lots of templates/patterns for various categories of languages (and parse output destinations). It would be much easier for a user to modify an existing .ini (or even nested markup) (or especially a logic or expression) language format that it would be to create one from scratch. Designing a language with a syntax or semantics more complicated than the most trivial languages from scratch... in my opinion.... shouldn't be attempted for a production project by someone who hasn't already worked with parsers for quite a while. That's just begging for trouble. Or a learning experience. Or whatever you want to call it. Note that I'm not claiming to be in that category.

"So the more restrictive it is, the more discipline itimposeson thinking about the grammar up-front." - Yes, which is why it's not the right scenario for users needing a foolproof ("handheld"?) experience, since it imposes too high of a cost in mental effort for designing the language (and then implementing the grammar). It's my opinion that a "lay developer" will deign to step through a language design wizard if he/she has a reasonable level of confidence that at the Finish of the wizard he/she will have a sane starting point from which to build his/her own private (either constrained-edition-of or mangled-edition-of, choose one) SGML or XAML or JSON or TCP packets or PGP packets or VBA, or even C#.

So, I guess I'm taking your point of "they ought to restrict the expressiveness of MGrammar to prevent complexity and undecipherable errors" and saying instead "they ought to increase the expressive power of MGrammar, but provide lots more language templates and wizards for all of the various categories of languages. Maybe on the order of 25-50. And warnings should be issued when users deviate very far from the provided templates/patterns. Yes, one can define arbitrarily complex parsers using antlr, ometa, or mgrammar, but one will need to click the mouse several more times to do it." :)

<Rant> Software code is itself a model of [a model of [a model of [a model of ...]]] the [operation of the] software it's intended to produce... which is why I bristle when I read the term Model-Driven-Development (because all software creation/design is already modeling something at every level, in my view). Labeling a particular programming style more "model-driven" than another detracts from this fact, which in my opinion should always be kept in mind. Here, I'll quote Martin Fowler: "There's often an attitude amongst modelers that generating code is a trivial implementation issue - once the modeling is done, all the hard work is done. Yet getting the generators sorted out is key to making language oriented programming work, because generators effectively define the semantics of DSLs. The tendency to play down generators is a major reason why so many programmers don't take modelers seriously. The UML communities disinterest in providing any kind of mapping between UML and common target language in any form other than hand waving is a good example of this gap." I agree with that; without good software engineers involved, model-generated software is awfully easy to implement poorly/wrongly. I don't see it as a good idea to encourage software developers to ignore mentally the chains of intermediate representations, because (of course) the software (I mean models) they are going to be produce will be embarrassingly inefficient and/or needlessly complex/overengineered and/or wrong. I'm not saying that hiding and abstractions (and delegations to users by capability constraining) are categorically bad, but it's definitely very easy to make bad ones. <Ha; you thought I was going to close the Rant tag. Never.>

Well, that's kindof a can(s)-of-worms post.... sorry for the length/verbosity... I'm sure I glossed over too many points and left out too many transitional explanations...
Matthew Wilson _diakopter_
I consider youhad the last good words.

As Kraig points out, "the answer to the original question is a simple 'no'".

Thanks & regards,
Ceyhun
Ceyhun Ciper

You can use google to search for other answers

Custom Search

More Threads

• New Oslo Refresh dated 3/2/2009 Does Not Install Models or Repository
• "M" syntax from the newest language specification sometimes doesn't work ?
• Let m.exe generate only DML
• Graph Mode for IntelliPad
• Questions about new CTP
• Triggers that execute .NET code in the Repository after Update e.g. to generate artifacts
• "M" source for Microsoft.Samples.mx
• MGrammar: Weird issue or am i doing something wrong?
• Why don't you build a Grammar Repository?
• NONE of the Menu Buttons in Quadrant or Intellipad from Oslo May 2009 CTP do anything