Let's Make a Language: C♭ - Part 2

Tuesday, December 13th 2011

The content of this article is sorely out of date. An update may come along at any time, but it won't be today, most likely.

I don’t know how far we’ll get in this part. But, let’s you and I find out. Since we’re working with language design and “compilers,” I’d probably consider this a more advanced topic with which to deal. That doesn’t mean you can’t do it. You just may have to learn a little something along the way.

Setting up a project to follow along

You have a couple of choices. Download or set it up manually. Select the former if you’re impatient, don’t care about learning some MSBuild/Monodevelop, or would just like to get started.

The easy way out

If you want to download a solution file to poke through, head on over to cflat tags on github and download the one for this post: 20111213.zip.

The long and boring manual way

I know most of you who will follow along with this will use Visual Studio 2010. For little projects like this that involve no GUI, I like to use Monodevelop because it runs on my MBP. Because I took the time to set it up for both environments, I’ll make sure to point out differences along the way.

Create a “C# Console Application” named “cflat” with your IDE.
Download gppg. Unzip it into some directory somewhere. I put it in a “libs” directory in the solution directory created by my IDE.
Add a reference to <wherever_you_put_gppg>/binaries/QUT.ShiftReduceParser.dll to your project.
Create an empty text file named “lexer.l” in the root of the project. This will contain our lexer definition.
Create an empty text file named “parser.y” in the root of the project. This will contain our parser definition.
Create an new directory in your project named “Generated”.
Create a new “C# Class” in the “Generated” directory named “lexer.cs”. Delete its contents.
Create a new “C# Class” in the “Generated” directory named “parser.cs”. Delete its contents.
Modify your project to generate the “lexer.cs” and “parser.cs” files.

For Monodevelop : Add two “Before Build” actions to the project, all with the ProjectDir as the working directory.
```
 mono ../libs/gppg/binaries/Gplex.exe /out:Generated/lexer.cs lexer.l
 mono ../libs/gppg/binaries/gppg.exe /gplex /out:Generated/parser.cs parser.y
```
For Visual Studio 2010 : First, follow the instructions here in Jeff_Evans’ post at the botom. Then, follow the instructions here to get it to build the files before hand. Or, download the ZIP file and look at the stuff at the end of the cflat.csproj file.

If you build it now, you should get some error from gplex and gppg because the lexer.l and parser.y files don’t have any content and that’s bad. So, let’s make these build.

gplex and gppg

gplex is a .NET port of lex/flex that adds some features specifically for the .NET platform. gppg is the same but for yacc/bison.

A lexer (gplex) identifies strings and returns an identifier for that string while setting some state for the parser. A parser generator (gppg) generates a parser that matches patterns of tokens and allows you to use that state to do something interesting. Like, for example, build an abstract syntax tree to translate a language into an executable format.

The compiler fails to compile a file that contains less than 300 lines

I don’t have much more time, today, so let’s knock out an easy requirement.

Basically, we want to count the number of new-line characters in the file. If it is less than 300, then we need to make sure the parser fails and the program reports the error.

Counting the number of new-line characters

When using a lex and yacc clone like we have with gplex and gppg and we want to count characters, the lexer has a natural fit for that kind of responsibility. Because, a lexer matches characters.

We need some bookkeeping. We’ll declare a line count variable to help us keep track. (Line 4, below.)

We need to increment it every time we get a new-line character. (Line 9, below.)

Then, at the end of the file, we report to the parser if the file was long enough or not. (Lines 11 - 13, below.)

%namespace cflat

%{
private int totalLines;
%}

%%

\n        { totalLines += 1; }
<<EOF>>   {
            yylval.integer = totalLines;
            if(totalLines < 300) return (int) Tokens.NOT_LONG_ENOUGH;
            else return (int) Tokens.EOF;
          }

Make the parser fail

We introduced the NOT_LONG_ENOUGH token in the lexer, so we need to declare it in our parser. (Line 150, below.)

We also are setting the totalLines value to the integer slot on the value type passed from the parser. (Line 147, below.)

Finally, we want to save an error message and error code when the file’s not long enough and abort the processing. (Lines 156 - 158 and lines 165 - 166 below.)

%namespace cflat

%union {
  public int integer;
}

%token<integer> NOT_LONG_ENOUGH

%%

program : EOF               { /* Nothing, yet, but all is good. */ }
        | NOT_LONG_ENOUGH   {
                              ErrorMessage = "File only " + $1 + " lines long.";
                              ErrorCode = 1;
                              YYABORT;
                            }
        ;

%%

internal Parser(Scanner lex) : base(lex) {}
public string ErrorMessage { get; private set; }
public int ErrorCode { get; private set; }

Have the program respond to the file

Finally, we want the program to take a file name as a command-line parameter. (Lines 190 - 194, below.)

Then, we want to pass its contents to the parser. (Lines 195 - 196, below.)

If the parser fails, we want it to report the error message and return the error code. (Lines 197 - 199, below.)

using System;
using System.IO;

namespace cflat
{
  class MainClass
  {
    public static int Main (string[] args)
    {
      if (args.Length == 0) {
        Console.Error.WriteLine ("Specify a file, dude.");
        return 1;
      }
      using (Stream input = File.Open(args[0], FileMode.Open)) {
        Scanner scanner = new Scanner (input);
        Parser parser = new Parser (scanner);
        if (!parser.Parse ()) {
          Console.Error.WriteLine (parser.ErrorMessage);
          return parser.ErrorCode;
        }
      }
      return 0;
    }
  }
}

More later

Now that we’ve got the first requirement out of the way, I can feel the momentum building! Soon, we’ll have C♭ to use in all of our business-facing software!