JSBasic - A BASIC to JavaScript CompilerIn this C# project, BASIC source code is compiled to JavaScript and run in a browser.
Is your email address OK? You are signed up for our
newsletters but your email address is either unconfirmed, or has not been
reconfirmed in a long time. Please click here to have a confirmation email
sent so we can confirm your email address and start sending you
newsletters again. Alternatively, you can
update your subscriptions.
IntroductionIn this article, a translator is written which takes a program written in BASIC and converts it to JavaScript. The purpose of this project is not just for the nostalgia of being able to write programs in BASIC again, but also to demonstrate how one can go about writing a compiler in C#. Being able to define a language and write a compiler for it will not only give you insight into how the programming languages you use work, but it's useful in a number of real-world scenarios, such as writing a custom rules engine for a business application, allowing things such as levels in a game to be describable in a simple language, writing IDE add-ins, or writing a domain-specific language. DemonstrationThere are two demonstrations available for this project. In the first one, you can play a little game I wrote, and in the second, you can write your own BASIC program, have it converted to JavaScript, and then run it in your browser. BackgroundMany programmers who grew up in the 80s and 90s learnt programming in a language called BASIC, whether it was using a Commodore 64/VIC-20/etc. plugged into their TV, or programming in GW-BASIC on their 286. For my part, I was given a Dick Smith VZ200 at the age of 8, which came with a stunning 8KB of RAM and loaded programs from cassette tapes. The games it came with very quickly/instantly became horribly boring, but the ability to program the machine made it infinitely interesting. The first BASIC program I and most people learnt generally looked something like this: Hide Copy Code 10 print "Hello world"
20 goto 10
This would generate something like the following output (and would never stop): Side note: "Hello world" would often be replaced by something a little more whimsical, such as "You suck", especially when the 10-year-old programmer was faced with a bank of Commodore 64s and Amigas at the local electronics shop. The main points to note from such a simple basic program is that each line is numbered, flow is controlled using harmful " Hide Copy Code 10 print "Hello"
20 rem Get the user to enter their name, and hold the result in the variable a$
30 input "Please enter your name" a$
40 rem Quit if the user entered "quit"
50 if a$ = "quit" then goto 80
60 print "Hello " a$
70 goto 30
80 print "Goodbye"
What this program does should be pretty obvious, but feel free to see it in action here. Using the constructs here, and a few other statements such as Hide Copy Code 10 locate 10,4
20 print "Down here"
30 locate 2, 10
40 print "Up here"
Hide Copy Code 10 a$ = inkey$
20 if a$ <> "" then print "You pressed " a$
30 goto 10
With this knowledge, I was able to draw a spaceship (>=-) at a certain position, and when the 'A' button was pressed, remove the ship from its current location by printing three spaces over the top of where it was, I drew a second ship, and using The implementation of this project can be nicely split into two halves: compiling a BASIC program, and generating JavaScript. Compiling a BASIC ProgramIn order to read BASIC source code and generate script from that, the source code must be parsed to pick up line numbers, keywords, variable names, strings, loops, Once the abstract syntax tree is built, the tree can be traversed and code can be generated. There are a lot of tools available for this, but to me, it has always seemed a little bit too much work to get the tools together in a .NET project and generate a parser and do whatever else is needed. And then, one day I read about Irony on The Code Project, which allows you to write the grammar of the language in C#, and then it takes care of parsing and building the tree. I immediately saw the benefit in this, and realised that this is the sort of project which could enable the implementation of the Interpreter pattern to become much more widespread, what with its barrier-removing-goodness. The key to language implementation is to understand the aforementioned "grammar" of the language. This requires you to describe how the language you are writing the compiler for works, and requires quite a bit of recursive thinking. In terms of BASIC though, we can say the following:
Using the descriptions above, the following program... Hide Copy Code 10 print "Hello world"
20 goto 10
...can be represented as a tree as follows: Terminology-time: The orange nodes above are "Terminals" as they come at the end of each branch in the tree; the blue nodes are Non-Terminals. This is somewhat of a simplification, as a line can actually be a statement list, where each statement is separated by a colon, and goto and print have their own non-terminals ( The problem that Irony solves is how to convert a source code into a tree, such as that above. The first step is to describe the grammar formally. A common way to describe a grammar is using the Backus-Naur form (BNF), which is not a required step for using Irony, but it is helpful to do this first. The above descriptions in BNF would be as follows: Hide Copy Code <program> ::= <lines>
<lines> ::= <lines> <line> | <line>
<line> ::= <number> <statement>
<statement> ::= "print" <expression>
| "input" <expression> <variable>
| "if" <expression> "then" <statement>
| "if" <expression> "then" <statement> "else" <statement>
| "goto" <number>
| <variable> "=" <expression>
<expression> ::= <string>
| <number>
| <variable>
| <expression> <binary operator> <expression>
<binary operator> ::= "=" | "<>" | "<" | ">" | "<=" | ">="
Once you have defined the grammar of your language, you need to write this as a C# class. Using Irony.NET, you create a new class which extends the abstract Hide Copy Code NonTerminal PROGRAM = new NonTerminal("PROGRAM");
NonTerminal LINE = new NonTerminal("LINE");
// etc
And also declare the terminals: Hide Copy Code Terminal number = new NumberTerminal("NUMBER");
Terminal stringLiteral = new StringLiteral("STRING", "\"", ScanFlags.None);
Terminal variable = new VariableIdentifierTerminal();
In this case, Once we have those formalities out of the way, we can get down to business. The following is a subset and simplification of what ended up in the final grammar of JSBasic: Hide Copy Code PROGRAM.Rule = MakePlusRule(PROGRAM, null, LINE);
LINE.Rule = number + STATEMENT + NewLine;
STATEMENT.Rule = EXPR | ASSIGN_STMT | PRINT_STMT | INPUT_STMT | IF_STMT |
| IF_ELSE_STMT | BRANCH_STMT;
ASSIGN_STMT.Rule = variable + "=" + EXPR;
PRINT_STMT.Rule = "print" + EXPR_LIST;
INPUT_STMT.Rule = "input" + EXPR_LIST + variable;
IF_STMT.Rule = "if" + CONDITION + "then" + STATEMENT;
IF_ELSE_STMT.Rule = "if" + CONDITION + "then" +
STATEMENT + "else" + STATEMENT;
EXPR.Rule = number | variable | EXPR + BINARY_OP + EXPR | stringLiteral;
Remember, this is all compilable C#, with operator overloading provided by Irony to allow a pretty close translation from BNF to C#. It's really a nice way to define a language's grammar. MakePlusRule and MakeStarRuleNote the first line in the code above, which is how you specify that a NonTerminal (in this case, Important note: when using Case InsensitivityMaking your language case-insensitive couldn't be much easier. In the constructor of your grammar, you just set a property, and then all keywords such as " Hide Copy Code this.CaseSensitive = false;
Line-breaksBASIC lines are ended with line breaks. Languages, like JavaScript, which are ended with characters such as the semi-colon are generally easier to handle; simply add ";" to the end of the rule. However, if the language you are implementing has lines which end with line-breaks, then the following steps are required: 1: End the rule with " Hide Copy Code LINE.Rule = number + STATEMENT + NewLine;
2: Because Irony ignores whitespace (including line breaks) when scanning the source code, you need a way to resurrect the line breaks when the Abstract Syntax Tree is being created. Fortunately, this can be done with the following line, which can go anywhere in your grammar's constructor: Hide Copy Code this.TokenFilters.Add(new CodeOutlineFilter(false));
The CommentsIrony provides a terminal which matches comments. To match BASIC comments (i.e., statements starting with " Hide Copy Code Terminal comment = new CommentTerminal("Comment", "REM", "\n");
And, this was then used in the statement rule: Hide Copy Code COMMENT_STMT.Rule = comment;
This was required by JSBasic as I wanted to re-print the BASIC comments as JavaScript comments in the generated JavaScript. Normally, you just want to ignore comments, as defining the fact that, for example, in C# the /* */ comments can appear anywhere would really bloat your grammar definition if you didn't just ignore it! So, if you want to ignore comments, define the comment terminal, and rather than putting it in one of your rules, just add it to the non-terminals list: Hide Copy Code base.NonGrammarTerminals.Add(comment);
Compiling Source CodeThere are a few little details I've skipped above (see the source code for the full story), but once the Grammar is defined, you can get a Hide Copy Code string sourceCode = "10 print \"hello world\"";
BasicGrammar basicGrammer = new BasicGrammar();
LanguageCompiler compiler = new LanguageCompiler(basicGrammer);
AstNode rootNode = compiler.Parse(sourceCode);
OK, so now we've got a tree in memory. Next step: traversing the tree and generating the code. Generating JavaScript from an Abstract Syntax TreeThere is more than one way to do this, but a rather elegant way to generate code is to create a class extending the Irony-defined Hide Copy Code public IEnumerable<LineNode> Lines { get; }
As another example, the Hide Copy Code public ExpressionNode Condition { get; private set};
public StatementNode ThenExpression { get; private set};
public StatementNode ElseExpression { get; private set};
In general, these properties should be set in the constructor of the class. This gets called by Irony, and so you need to copy-and-paste basically the same thing into each node class, with a few different name-changes depending on the number of child nodes: Hide Copy Code public class IfElseStmtNode : AstNode
public IfElseStmtNode(AstNodeArgs args)
: base(args)
{
// We ignore the "if", "then" and "else" keywords.
// args.ChildNodes[0] is terminal containing "if"
Condition = (ExpressionNode)args.ChildNodes[1];
// args.ChildNodes[2] is terminal containing "then"
ThenExpression = (StatementNode)args.ChildNodes[3];
// args.ChildNodes[4] is terminal containing "else"
ElseExpression = (StatementNode)args.ChildNodes[5];
// This class assumes that the "else" part is mandatory
}
}
The base constructor for Note that when you are in a node created using Hide Copy Code public IEnumerable<LineNode> Lines {
get { // Note: this uses the .NET 3.5 Cast<T> method:
return base.ChildNodes.Cast<LineNode>();
}
}
When Irony is creating the abstract syntax tree, it needs to know that, for example, Hide Copy Code NonTerminal LINE = new NonTerminal("LINE", typeof(LineNode));
This is done for all the other types, and now when the code is compiled, the abstract syntax tree will consist of the Hide Copy Code public interface IJSBasicNode
{
void GenerateJavaScript(JSContext context, TextWriter textWriter);
}
The Hide Copy Code public override void GenerateJavaScript(JSContext context,
TextWriter textWriter)
{
textWriter.Write("if (");
Condition.GenerateJavaScript(context, textWriter);
textWriter.WriteLine(") {");
ThenStatement.GenerateJavaScript(context, textWriter);
textWriter.WriteLine("} else {");
ElseStatement.GenerateJavaScript(context, textWriter);
textWriter.WriteLine("}");
}
The I must admit that I've left quite a few details out here, but hopefully, it's given you a general understanding of how to write a compiler and generate code using Irony. Unfortunately, converting BASIC to JavaScript wasn't so straight-forward for every node type, as the next section describes. Jamming a Square Peg into a Round HoleA JavaScript program written to run in a browser and a BASIC program written for a console are two completely different beasts. This section describes the problems which needed to be overcome to complete this project. The ConsoleIn order to handle Hide Copy Code console.cls(); // clear the screen
var userName = console.input('What is your name?');
console.println('Hello ' + userName);
Replicating the "goto" StatementJavaScript does not have a " Hide Copy Code 10 print "Hello world"
20 goto 10
This would look something like this, then: Hide Copy Code function line10() {
console.println("Hello world");
line20();
}
function line20() {
line10();
}
If you run this though, the function stack will very quickly overflow, and the program will crash (e.g., after "Hello world" has printed 1000 times, 1000 Hide Copy Code function line10() {
console.println("Hello world");
return line20;
}
function line20() {
return line10;
}
function run() {
var currentLine = line10;
while (currentLine != null) {
currentLine = currentLine();
}
}
This will now successfully run without blowing the call-stack, because the stack will always consist of either only Simulating Thread.Sleep in JavaScriptThe problem is that the little program above will run using 100% of the CPU, and will not release any CPU time to the browser. The browser will become stuck (or the browser will think the JavaScript has got into an infinite loop, and let you stop it). Ideally, we need the Hide Copy Code function run() {
var currentLine = line10;
while (currentLine != null) {
currentLine = currentLine();
// give the browser some CPU
Thread.Sleep(10);
}
}
Of course, there is no The trick is to move everything outside of the Hide Copy Code var curLine;
function run() {
curLine = line10;
mainLoop();
}
function mainLoop() {
if (curLine == null) {
return;
}
curLine = curLine();
setTimeout('mainLoop()', 10);
}
This will now run the deceptively simple two-line basic program forever, while keeping the browser responsive. BASIC also supports Hide Copy Code 10 gosub 40
20 print "Hello from line 20"
30 end
40 print "Hello from line 40"
50 return
Are you ready for the output? Hide Copy Code Hello from line 40
Hello from line 20
This was a lot more simple to implement than the The main-loop pattern is very useful for any situation where you need to execute a loop in JavaScript which will last more than a few seconds (such as animations). At a more general level, whenever you have a JavaScript program such as: Hide Copy Code function animatePage() {
// initialisation
while (someCondition) {
// ...
// code to execute
// ...
}
// cleanup code
}
You can convert it to this: Hide Copy Code function initialse() {
// initialisation
mainLoop();
}
function mainLoop() {
if (!condition) {
cleanUp();
return;
}
// ...
// code to execute
// ...
setTimeout(mainLoop, 10);
}
function cleanUp() {
// cleanup code
}
The ConclusionMy goal in this project was both to create a BASIC compiler and evaluate the usefulness of Irony. Using Irony was a real pleasure, and I wouldn't hesitate to drop the Irony DLL into a project and define a little grammar for a domain-specific language, should the need ever arise. I hope that this project may be useful to others as an example of how to create a compiler in .NET. Thanks for reading! History and Acknowledgements
A big thank you to the creator of Irony, Roman Ivantsov, who made major improvements to my initial implementation of JSBasic and answered my many queries. His help has improved the usefulness of this article a great deal. Have a read of the CodeProject Irony article and get the latest release of Irony from CodePlex. LicenseThis article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL) |
|