- JFlex (lexical analyzer generator, aka scanner generator)
- CUP (parser generator)
- LLVM (code generator). We will generate intermediate code, and LLVM will either interpret it directly (acting as a just-in-time compiler, similar to a JVM) or generate assembly language for a particular target machine (e.g., Intel x86 or Sparc) that we can then assemble into binary object code and execute.
A couple of you have said you would like to build the compiler project using ML (OCaml?) rather than Java. JFlex and Cup are Java tools, and you'll be on your own to find equivalent ML tools. LLVM has an API with an OCaml binding, so you may actually have an advantage there, because as far as I know there is no Java binding. With Java, we will simply be printing LLVM intermediate assembly code into a text file, which we will then run the LLVM assembler on. I'll probably ask the MLaniacs to also produce a textal IR file, so I have something I can look at for grading.
Know what's missing? The whole middle piece, static semantic analysis, including type checking. For that we are probably just grinding out code by hand. Maybe someday we'll have good tools for that, too, but at present I don't know of any. The only way to keep that effort reasonable is to keep the language we compile as simple and regular as possible. I'm considering a couple of candidates, and have not made a final choice yet, but it's certain to be a small object-oriented language that is something like a tiny subset of Java (though perhaps with a couple non-Java-ish features).
Can we choose a language other than Java or ML to build the compiler?
ReplyDelete