Let's Make a Language: C♭ - Part 2
I don’t know how far we’ll get in this part. But, let’s you and I find out. Since we’re working with language design and “compilers,” I’d probably consider this a more advanced topic with which to deal. That doesn’t mean you can’t do it. You just may have to learn a little something along the way.
Setting up a project to follow along
You have a couple of choices. Download or set it up manually. Select the former if you’re impatient, don’t care about learning some MSBuild/Monodevelop, or would just like to get started.
The easy way out
If you want to download a solution file to poke through, head on over to cflat tags on github and download the one for this post: 20111213.zip.
The long and boring manual way
I know most of you who will follow along with this will use Visual Studio 2010. For little projects like this that involve no GUI, I like to use Monodevelop because it runs on my MBP. Because I took the time to set it up for both environments, I’ll make sure to point out differences along the way.
Create a “C# Console Application” named “cflat” with your IDE.
Download gppg. Unzip it into some directory somewhere. I put it in a “libs” directory in the solution directory created by my IDE.
Add a reference to <wherever_you_put_gppg>/binaries/QUT.ShiftReduceParser.dll to your project.
Create an empty text file named “lexer.l” in the root of the project. This will contain our lexer definition.
Create an empty text file named “parser.y” in the root of the project. This will contain our parser definition.
Create an new directory in your project named “Generated”.
Create a new “C# Class” in the “Generated” directory named “lexer.cs”. Delete its contents.
Create a new “C# Class” in the “Generated” directory named “parser.cs”. Delete its contents.
Modify your project to generate the “lexer.cs” and “parser.cs” files.
For Monodevelop : Add two “Before Build” actions to the project, all with the ProjectDir as the working directory.
mono ../libs/gppg/binaries/Gplex.exe /out:Generated/lexer.cs lexer.l mono ../libs/gppg/binaries/gppg.exe /gplex /out:Generated/parser.cs parser.y
For Visual Studio 2010 : First, follow the instructions here in Jeff_Evans’ post at the botom. Then, follow the instructions here to get it to build the files before hand. Or, download the ZIP file and look at the stuff at the end of the cflat.csproj file.
If you build it now, you should get some error from gplex and gppg because the lexer.l and parser.y files don’t have any content and that’s bad. So, let’s make these build.
gplex and gppg
gplex is a .NET port of lex/flex that adds some features specifically for the .NET platform. gppg is the same but for yacc/bison.
A lexer (gplex) identifies strings and returns an identifier for that string while setting some state for the parser. A parser generator (gppg) generates a parser that matches patterns of tokens and allows you to use that state to do something interesting. Like, for example, build an abstract syntax tree to translate a language into an executable format.
The compiler fails to compile a file that contains less than 300 lines
I don’t have much more time, today, so let’s knock out an easy requirement.
Basically, we want to count the number of new-line characters in the file. If it is less than 300, then we need to make sure the parser fails and the program reports the error.
Counting the number of new-line characters
When using a lex and yacc clone like we have with gplex and gppg and we want to count characters, the lexer has a natural fit for that kind of responsibility. Because, a lexer matches characters.
We need some bookkeeping. We’ll declare a line count variable to help us keep track. (Line 4, below.)
We need to increment it every time we get a new-line character. (Line 9, below.)
Then, at the end of the file, we report to the parser if the file was long enough or not. (Lines 11 - 13, below.)
1 | %namespace cflat |
Make the parser fail
We introduced the NOT_LONG_ENOUGH
token in the lexer, so we need to declare
it in our parser. (Line 150, below.)
We also are setting the totalLines
value to the integer
slot on the value
type passed from the parser. (Line 147, below.)
Finally, we want to save an error message and error code when the file’s not long enough and abort the processing. (Lines 156 - 158 and lines 165 - 166 below.)
1 | %namespace cflat |
Have the program respond to the file
Finally, we want the program to take a file name as a command-line parameter. (Lines 190 - 194, below.)
Then, we want to pass its contents to the parser. (Lines 195 - 196, below.)
If the parser fails, we want it to report the error message and return the error code. (Lines 197 - 199, below.)
1 | using System; |
More later
Now that we’ve got the first requirement out of the way, I can feel the momentum building! Soon, we’ll have C♭ to use in all of our business-facing software!