分享

JSBasic - A BASIC to JavaScript Compiler

 quasiceo 2017-10-11

JSBasic - A BASIC to JavaScript Compiler

, 20 Jan 2013
   4.93 (51 votes)
In this C# project, BASIC source code is compiled to JavaScript and run in a browser.
Is your email address OK? You are signed up for our newsletters but your email address is either unconfirmed, or has not been reconfirmed in a long time. Please click here to have a confirmation email sent so we can confirm your email address and start sending you newsletters again. Alternatively, you can update your subscriptions.

Introduction

In this article, a translator is written which takes a program written in BASIC and converts it to JavaScript. The purpose of this project is not just for the nostalgia of being able to write programs in BASIC again, but also to demonstrate how one can go about writing a compiler in C#.

Being able to define a language and write a compiler for it will not only give you insight into how the programming languages you use work, but it's useful in a number of real-world scenarios, such as writing a custom rules engine for a business application, allowing things such as levels in a game to be describable in a simple language, writing IDE add-ins, or writing a domain-specific language.

Demonstration

There are two demonstrations available for this project. In the first one, you can play a little game I wrote, and in the second, you can write your own BASIC program, have it converted to JavaScript, and then run it in your browser.

JSBasic/spacewar.jpg

Background

Many programmers who grew up in the 80s and 90s learnt programming in a language called BASIC, whether it was using a Commodore 64/VIC-20/etc. plugged into their TV, or programming in GW-BASIC on their 286. For my part, I was given a Dick Smith VZ200 at the age of 8, which came with a stunning 8KB of RAM and loaded programs from cassette tapes. The games it came with very quickly/instantly became horribly boring, but the ability to program the machine made it infinitely interesting. The first BASIC program I and most people learnt generally looked something like this:

Hide   Copy Code
10 print "Hello world"
20 goto 10

This would generate something like the following output (and would never stop):

JSBasic/helloworld.jpg

Side note: "Hello world" would often be replaced by something a little more whimsical, such as "You suck", especially when the 10-year-old programmer was faced with a bank of Commodore 64s and Amigas at the local electronics shop.

The main points to note from such a simple basic program is that each line is numbered, flow is controlled using harmful "goto" statements, and there is a print statement for printing to the console. There are no classes, and no type declarations. A variable is appended with the '$' character, and there are other statements such as input to get user input, and if-then-else statements, which cannot span more than one line. Comments are created by starting the line with "REM" (short for "remark", not a nod to a certain musical group, nor a phenomenon which we all regularly experience several times a night). Using this information, we can write the following simple program:

Hide   Copy Code
10 print "Hello"
20 rem Get the user to enter their name, and hold the result in the variable a$
30 input "Please enter your name" a$
40 rem Quit if the user entered "quit"
50 if a$ = "quit" then goto 80
60 print "Hello " a$
70 goto 30
80 print "Goodbye"

What this program does should be pretty obvious, but feel free to see it in action here. Using the constructs here, and a few other statements such as for and while loops, and the gosub/return statements, reasonably sophisticated text-based games could be made. The turning point for me, though, was when I found out about the LOCATE statement, which allows you to place the cursor anywhere on the screen (as opposed to the next line, when using the print statement), and inkey$, which allows you to get the last key the user pressed:

Hide   Copy Code
10 locate 10,4
20 print "Down here"
30 locate 2, 10
40 print "Up here"

JSBasic/locatedemo.jpg

Run it

Hide   Copy Code
10 a$ = inkey$
20 if a$ <> "" then print "You pressed " a$
30 goto 10

JSBasic/inkeydemo.jpg

Run it

With this knowledge, I was able to draw a spaceship (>=-) at a certain position, and when the 'A' button was pressed, remove the ship from its current location by printing three spaces over the top of where it was, LOCATEing the cursor one row above the last position, and then redrawing the ship. Same-deal-different-direction for the 'Z' key. Animation was created!

I drew a second ship, and using for loops, allowed them to fire bullets (full-stops: '.') at each other. I wrote this game, calling it "Space War", when I was nine, and spent hours on it. Unfortunately, the cassette player on my computer could only load programs; not save them. So, at the end of the day, I had to eventually turn off my computer, and it was lost forever. I later remade more elaborate versions using QBASIC when I had a PC, and fairly recently - for reasons I'm sure only a psychologist versed in computer science can answer - I wanted to recreate it by writing it in old-fashioned BASIC and converting the code to JavaScript. When I read about the Irony .NET Compiler Construction Kit, it was like seeing a vision: fate could not be fought, and the JSBasic compiler project was born.

The implementation of this project can be nicely split into two halves: compiling a BASIC program, and generating JavaScript.

Compiling a BASIC Program

In order to read BASIC source code and generate script from that, the source code must be parsed to pick up line numbers, keywords, variable names, strings, loops, if statements, and everything else in the language. Once parsed, an abstract syntax tree is constructed, which is to say the source code is converted into a tree, with the root node representing the whole program. You can think of each line as a child of the root node, where each line has more child nodes (for example, an if statement might be a child node of a line, and the condition of the if statement and the then part of the if statement are then children of the if statement node).

Once the abstract syntax tree is built, the tree can be traversed and code can be generated.

There are a lot of tools available for this, but to me, it has always seemed a little bit too much work to get the tools together in a .NET project and generate a parser and do whatever else is needed. And then, one day I read about Irony on The Code Project, which allows you to write the grammar of the language in C#, and then it takes care of parsing and building the tree. I immediately saw the benefit in this, and realised that this is the sort of project which could enable the implementation of the Interpreter pattern to become much more widespread, what with its barrier-removing-goodness.

The key to language implementation is to understand the aforementioned "grammar" of the language. This requires you to describe how the language you are writing the compiler for works, and requires quite a bit of recursive thinking. In terms of BASIC though, we can say the following:

  • A PROGRAM is made up of LINES
  • Each LINE starts with a number, followed by a STATEMENT
  • A STATEMENT can be any of the following:
    • a PRINT statement, which is followed by something to print
    • an INPUT statement, which is followed by something to print and then a VARIABLE
    • an IF statement, which looks like this: IF something THEN a statement ELSE a statement (where the ELSE part is optional)
    • a GOTO statement: go to a line number
    • an ASSIGNMENT statement: variable = something
  • An EXPRESSION. An expression is what each of the "something"s are in the examples above, for example, PRINT EXPRESSION. An EXPRESSION can be things such as:
    • A string or number
    • A binary comparison, e.g., something < somethingElse
    • A variable, such as PRINT a$

Using the descriptions above, the following program...

Hide   Copy Code
10 print "Hello world"
20 goto 10

...can be represented as a tree as follows:

JSBasic/simpleast.jpg

Terminology-time: The orange nodes above are "Terminals" as they come at the end of each branch in the tree; the blue nodes are Non-Terminals.

This is somewhat of a simplification, as a line can actually be a statement list, where each statement is separated by a colon, and goto and print have their own non-terminals (GotoStmt and PrintStmt), but you get the idea.

The problem that Irony solves is how to convert a source code into a tree, such as that above. The first step is to describe the grammar formally. A common way to describe a grammar is using the Backus-Naur form (BNF), which is not a required step for using Irony, but it is helpful to do this first. The above descriptions in BNF would be as follows:

Hide   Copy Code
<program> ::= <lines>
<lines> ::= <lines> <line> | <line>
<line> ::= <number> <statement>
<statement> ::= "print" <expression>
| "input" <expression> <variable>
| "if" <expression> "then" <statement>
| "if" <expression> "then" <statement> "else" <statement>
| "goto" <number>
| <variable> "=" <expression>
<expression> ::= <string> 
| <number>
| <variable>
| <expression> <binary operator> <expression>
<binary operator> ::= "=" | "<>" | "<" | ">" | "<=" | ">="

This next section explains the background to the <lines> node above. This is a recursive definition which essentially states that a program can have one or more lines. This section is not necessary when using Irony however, as a much more natural construct exists which allows you to bypass this step. However, the following explanation has been left to give further background to those who are interested.

When converting from English to BNF, the biggest difference is in the way a line like "A PROGRAM is made up of LINES" is written. In the first line, we say that a <Program> is made up of <Lines>. <Lines> is then followed by a <Lines>, or a <Line>: this is where the recursive thinking is required. If you think of a program with one <Line>, then you can see that one <Line> can be considered as a <Lines> node. If we now add a second line to this imaginary program, then we can say that the first <Line> matches (and gets represented as) a <Lines> node, and then that <Lines> node with the second <Line> matches the "<lines> <line>" rule, and that can get reduced to a <Lines> node.

This, in fact, means that when defined this way, the <Program> node will have one child node, <Lines>, which in turn will have only two child nodes for any program over one line: the first child will be a <Lines> node, the second child will be a <Line> node. The above diagram was therefore a simplification; the "Hello World" program would actually look like the following:

JSBasic/fullast.jpg

This may seem a roundabout way to represent the idea that a node can have one-or-more of a certain type of child nodes, but that's the way it's traditionally been done. The good news is that you do not need to use this type of construct when using Irony; instead, you can use the much more intuitive concept that a program has one or more line nodes. Therefore, the first tree shown above is in fact the correct representation when using Irony.

Once you have defined the grammar of your language, you need to write this as a C# class. Using Irony.NET, you create a new class which extends the abstract Grammar class, and in the constructor, build the equivalent of the above Backus-Nauf Form description of the language. You need to declare each part of the language, such as:

Hide   Copy Code
NonTerminal PROGRAM = new NonTerminal("PROGRAM");
NonTerminal LINE = new NonTerminal("LINE");
// etc

And also declare the terminals:

Hide   Copy Code
Terminal number = new NumberTerminal("NUMBER");
Terminal stringLiteral = new StringLiteral("STRING", "\"", ScanFlags.None);
Terminal variable = new VariableIdentifierTerminal();

In this case, NumberTerminal and StringLiteral are classes supplied with Irony to identify numbers and quoted strings "like this" in the source code, and VariableIdentifierTerminal is a class written by myself which extends Terminal and is used to match strings of alphanumeric characters ending with the '$' character (i.e., BASIC variables).

Once we have those formalities out of the way, we can get down to business. The following is a subset and simplification of what ended up in the final grammar of JSBasic:

Hide   Copy Code
PROGRAM.Rule = MakePlusRule(PROGRAM, null, LINE);
LINE.Rule = number + STATEMENT + NewLine;

STATEMENT.Rule = EXPR | ASSIGN_STMT | PRINT_STMT | INPUT_STMT | IF_STMT |
    | IF_ELSE_STMT | BRANCH_STMT;

ASSIGN_STMT.Rule = variable + "=" + EXPR;
PRINT_STMT.Rule = "print" + EXPR_LIST;
INPUT_STMT.Rule = "input" + EXPR_LIST + variable;
IF_STMT.Rule = "if" + CONDITION + "then" + STATEMENT;
IF_ELSE_STMT.Rule = "if" + CONDITION + "then" + 
                    STATEMENT + "else" + STATEMENT;

EXPR.Rule = number | variable | EXPR + BINARY_OP + EXPR | stringLiteral;

Remember, this is all compilable C#, with operator overloading provided by Irony to allow a pretty close translation from BNF to C#. It's really a nice way to define a language's grammar.

MakePlusRule and MakeStarRule

Note the first line in the code above, which is how you specify that a NonTerminal (in this case, PROGRAM) has one or more LINE nodes, which is one of my favourite features of Irony, as it simplifies the generated tree. There is also a MakeStarRule for zero or more nodes. When compiled, the Program node in the abstract syntax tree will have a collection of Line nodes.

Important note: when using MakePlusRule or MakeStarRule, you cannot have anything else in the rule.

Case Insensitivity

Making your language case-insensitive couldn't be much easier. In the constructor of your grammar, you just set a property, and then all keywords such as "if" will work nO maTTer WHAT the CaSiNg.

Hide   Copy Code
this.CaseSensitive = false;

Line-breaks

BASIC lines are ended with line breaks. Languages, like JavaScript, which are ended with characters such as the semi-colon are generally easier to handle; simply add ";" to the end of the rule. However, if the language you are implementing has lines which end with line-breaks, then the following steps are required:

1: End the rule with "NewLine", which is defined in the base Grammar class:

Hide   Copy Code
LINE.Rule = number + STATEMENT + NewLine;

2: Because Irony ignores whitespace (including line breaks) when scanning the source code, you need a way to resurrect the line breaks when the Abstract Syntax Tree is being created. Fortunately, this can be done with the following line, which can go anywhere in your grammar's constructor:

Hide   Copy Code
this.TokenFilters.Add(new CodeOutlineFilter(false));

The CodeOutlineFilter can also help if you need to know the indentation of the source code, and I believe it can (or will in a future release) allow you to handle languages like VB which use characters to join lines together (e.g., by using an underscore). See the parsers included in Irony for examples.

Comments

Irony provides a terminal which matches comments. To match BASIC comments (i.e., statements starting with "REM" and ending with a line break), the following declaration was used:

Hide   Copy Code
Terminal comment = new CommentTerminal("Comment", "REM", "\n");

And, this was then used in the statement rule:

Hide   Copy Code
COMMENT_STMT.Rule = comment;

This was required by JSBasic as I wanted to re-print the BASIC comments as JavaScript comments in the generated JavaScript. Normally, you just want to ignore comments, as defining the fact that, for example, in C# the /* */ comments can appear anywhere would really bloat your grammar definition if you didn't just ignore it! So, if you want to ignore comments, define the comment terminal, and rather than putting it in one of your rules, just add it to the non-terminals list:

Hide   Copy Code
base.NonGrammarTerminals.Add(comment);

Compiling Source Code

There are a few little details I've skipped above (see the source code for the full story), but once the Grammar is defined, you can get a string containing the source code and compile it into an abstract syntax tree like so:

Hide   Copy Code
string sourceCode = "10 print \"hello world\"";
BasicGrammar basicGrammer = new BasicGrammar();
LanguageCompiler compiler = new LanguageCompiler(basicGrammer);
AstNode rootNode = compiler.Parse(sourceCode);

OK, so now we've got a tree in memory. Next step: traversing the tree and generating the code.

Generating JavaScript from an Abstract Syntax Tree

There is more than one way to do this, but a rather elegant way to generate code is to create a class extending the Irony-defined AstNode. This means there are classes such as ProgramNode, LineNode, StatementNode, PrintStmtNode, etc. Earlier, we saw that the <Program> node will have a collection of <Line> node objects, which in C# means ProgramNode has the following property:

Hide   Copy Code
public IEnumerable<LineNode> Lines { get; }

As another example, the IfElseStmtNode would have three properties:

Hide   Copy Code
public ExpressionNode Condition { get; private set};
public StatementNode ThenExpression { get; private set};
public StatementNode ElseExpression { get; private set};

In general, these properties should be set in the constructor of the class. This gets called by Irony, and so you need to copy-and-paste basically the same thing into each node class, with a few different name-changes depending on the number of child nodes:

Hide   Copy Code
public class IfElseStmtNode : AstNode
    public IfElseStmtNode(AstNodeArgs args)
        : base(args)
    {
        // We ignore the "if", "then" and "else" keywords.
        // args.ChildNodes[0] is terminal containing "if"
        Condition = (ExpressionNode)args.ChildNodes[1];
        // args.ChildNodes[2] is terminal containing "then"
        ThenExpression = (StatementNode)args.ChildNodes[3];
        // args.ChildNodes[4] is terminal containing "else"
        ElseExpression = (StatementNode)args.ChildNodes[5];
        // This class assumes that the "else" part is mandatory
    }
}

The base constructor for AstNode has a property ChildNodes which it sets to args.ChildNodes, which is useful for traversing the tree of nodes later.

Note that when you are in a node created using MakePlusRule or MakeStarRule, you cannot set the property in the constructor, as at that point args.ChildNodes is empty (it gets populated during the creation of the tree). Therefore, it is recommended to cast the nodes in the property. For the ProgramNode class, its Lines property looks like this:

Hide   Copy Code
public IEnumerable<LineNode> Lines {
    get { // Note: this uses the .NET 3.5 Cast<T> method:
        return base.ChildNodes.Cast<LineNode>();
    }
}

When Irony is creating the abstract syntax tree, it needs to know that, for example, <LineNode> nodes should be instantiated as instances of LineNode classes. This is achieved by using a different constructor for the LineNode non-terminal above:

Hide   Copy Code
NonTerminal LINE = new NonTerminal("LINE", typeof(LineNode));

This is done for all the other types, and now when the code is compiled, the abstract syntax tree will consist of the AstNode classes that you have defined. Generating script is now relatively easy: you just need to get each class to write itself as the target code. To achieve this, I created the following interface which all my AstNode classes implemented:

Hide   Copy Code
public interface IJSBasicNode
{
    void GenerateJavaScript(JSContext context, TextWriter textWriter);
}

The JSContext was just to recreate the code indentation, to make the JavaScript look pretty, but isn't really necessary. Now, each node can print the bits it needs to, and call GenerateJavaScript on all its child nodes. Here's what the IfStmtNode's implementation of this method could look like:

Hide   Copy Code
public override void GenerateJavaScript(JSContext context,
        TextWriter textWriter)
{
    textWriter.Write("if (");
    Condition.GenerateJavaScript(context, textWriter);
    textWriter.WriteLine(") {");
    ThenStatement.GenerateJavaScript(context, textWriter);
    textWriter.WriteLine("} else {");
    ElseStatement.GenerateJavaScript(context, textWriter);
    textWriter.WriteLine("}");
}

The Condition, ThenStatement and ElseStatement nodes can each be complicated, nested, recursive nodes, but because each node just writes the bits that are particular to itself, each class' GenerateJavaScript method is pretty straight-forward. Once every node has its GenerateJavaScript method defined, everything just automagically works. Just compile some source code, create a text writer, and ask the ProgramNode to generate the JavaScript.

I must admit that I've left quite a few details out here, but hopefully, it's given you a general understanding of how to write a compiler and generate code using Irony. Unfortunately, converting BASIC to JavaScript wasn't so straight-forward for every node type, as the next section describes.

Jamming a Square Peg into a Round Hole

A JavaScript program written to run in a browser and a BASIC program written for a console are two completely different beasts. This section describes the problems which needed to be overcome to complete this project.

The Console

In order to handle print, input, and locate statements in the same way as an old computer console, I wrote JSConsole. See that article for more information, but in a nutshell, it allows you to treat an HTML element such as a DIV in the same way as a console, for example:

Hide   Copy Code
console.cls(); // clear the screen
var userName = console.input('What is your name?');
console.println('Hello ' + userName);

Replicating the "goto" Statement

JavaScript does not have a "goto" branch statement, which made things very difficult. The idea is somewhat similar to calling functions, and so my first thought was to wrap each line into its own function, and at the end of each function, the next function to call is called. We will use the following program again as an example:

Hide   Copy Code
10 print "Hello world"
20 goto 10

This would look something like this, then:

Hide   Copy Code
function line10() {
    console.println("Hello world");
    line20();
}
function line20() {
    line10();
}

If you run this though, the function stack will very quickly overflow, and the program will crash (e.g., after "Hello world" has printed 1000 times, 1000 line10 and 1000 line20 function calls will be on the stack, and the program will quickly die). To solve this problem, instead of having each line call the next function directly, it returns the next-function-to-call to a base function (or null to end). This now looks something like this:

Hide   Copy Code
function line10() {
    console.println("Hello world");
    return line20;
}
function line20() {
    return line10;
}
function run() {
    var currentLine = line10;
    while (currentLine != null) {
        currentLine = currentLine();
    }
}

This will now successfully run without blowing the call-stack, because the stack will always consist of either only run, run and line10, or run and line20. This seems like a good solution, until you run it and the browser stops responding. Welcome to the next problem...

Simulating Thread.Sleep in JavaScript

The problem is that the little program above will run using 100% of the CPU, and will not release any CPU time to the browser. The browser will become stuck (or the browser will think the JavaScript has got into an infinite loop, and let you stop it). Ideally, we need the run() function to look something like the following:

Hide   Copy Code
function run() {
    var currentLine = line10;
    while (currentLine != null) {
        currentLine = currentLine();
        // give the browser some CPU
        Thread.Sleep(10);
    }
}

Of course, there is no Thread.Sleep in JavaScript. The closest is the setTimeout function, which lets you execute code after a certain number of milliseconds. The problem with setTimeout is that it essentially acts as an asynchronous call, and so you cannot return anything from it, and we need to know the next function to call.

The trick is to move everything outside of the while loop to become global variables, and turn the while loop into a function, with the condition becoming an if statement, like so:

Hide   Copy Code
var curLine;
function run() {
    curLine = line10;
    mainLoop();
}
function mainLoop() {
    if (curLine == null) {
        return;
    }
    curLine = curLine();
    setTimeout('mainLoop()', 10);
}

This will now run the deceptively simple two-line basic program forever, while keeping the browser responsive. BASIC also supports gosub and return statements, which are essentially function calls:

Hide   Copy Code
10 gosub 40
20 print "Hello from line 20"
30 end
40 print "Hello from line 40"
50 return

Are you ready for the output?

Hide   Copy Code
Hello from line 40
Hello from line 20

This was a lot more simple to implement than the goto statement, as it just gets converted to simple function calls. Note that there is a limitation to this technique of returning function pointers to the main loop: you cannot have goto statements within a sub-routine.

The main-loop pattern is very useful for any situation where you need to execute a loop in JavaScript which will last more than a few seconds (such as animations). At a more general level, whenever you have a JavaScript program such as:

Hide   Copy Code
function animatePage() {
    // initialisation
    while (someCondition) {
        // ...
        // code to execute
        // ...
    }
    // cleanup code
}

You can convert it to this:

Hide   Copy Code
function initialse() {
    // initialisation
    mainLoop();
}
function mainLoop() {
    if (!condition) {
        cleanUp();
        return;
    }
    // ...
    // code to execute
    // ...
    setTimeout(mainLoop, 10);
}
function cleanUp() {
    // cleanup code
}

The mainLoop() code can be found in JSBasic.js, and it actually has a little more code for error handling, and a couple of global functions for strings, and the implementation of inkey$, which simply listens for key-press events on the window and saves the last-pressed key.

Conclusion

My goal in this project was both to create a BASIC compiler and evaluate the usefulness of Irony. Using Irony was a real pleasure, and I wouldn't hesitate to drop the Irony DLL into a project and define a little grammar for a domain-specific language, should the need ever arise.

I hope that this project may be useful to others as an example of how to create a compiler in .NET. Thanks for reading!

History and Acknowledgements

  • 9 April, 2008: Initial article released
  • 22 April, 2008: Updated to include newer version of Irony, better handling of new-lines, line-sensitivity, and the MakePlusRule
  • 23 April, 2009: Updated solution
  • 20 January, 2013: Put code on GitHub and changed hosting to AppHarbor

A big thank you to the creator of Irony, Roman Ivantsov, who made major improvements to my initial implementation of JSBasic and answered my many queries. His help has improved the usefulness of this article a great deal. Have a read of the CodeProject Irony article and get the latest release of Irony from CodePlex.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多