Friday, December 18, 2009

Tools

If you want to get a head start looking at tools, here are some that we are (almost certainly) going to use in winter 2010:
  • JFlex  (lexical analyzer generator, aka scanner generator)
  • CUP  (parser generator)
  • LLVM (code generator).  We will generate intermediate code, and LLVM will either interpret it directly (acting as a just-in-time compiler, similar to a JVM) or generate assembly language for a particular target machine (e.g., Intel x86 or Sparc) that we can then assemble into binary object code and execute. 
Of these, I've used JFlex and CUP many times, including in previous offerings of CIS 461/561.  LLVM is new to me, but my initial experience is pretty good.  A downside is that it LLVM papers over a few of the nuts and bolts of code generation (register allocation,  register save/restore in procedure call and return, ...) and it is good to understand those low-level details.  But in a 10-week academic term, we would certainly not manage any sophisticated handling of those details anyway ... we would just be generating dumb boilerplate code from templates.  With LLVM we can generate dumb intermediate code and then watch LLVM turn it into either dumb or somewhat smarter assembly code, depending on what analyses and optimizations we tell it to perform.  In principle this is no different from what we'll do with JFlex and CUP ... you'll learn the algorithms you would need to build your own tool like JFlex or CUP, but rather than actually build the generators, we'll use them (and your knowledge of how they work will help you debug your grammars).

A couple of you have said you would like to build the compiler project using ML (OCaml?) rather than Java.  JFlex and Cup are Java tools, and you'll be on your own to find equivalent ML tools.  LLVM has an API with an OCaml binding, so you may actually have an advantage there, because as far as I know there is no Java binding.  With Java, we will simply be printing LLVM intermediate assembly code into a text file, which we will then run the LLVM assembler on. I'll probably ask the MLaniacs to also produce a textal IR file, so I have something I can look at for grading.

Know what's missing?  The whole middle piece, static semantic analysis, including type checking.  For that we are probably just grinding out code by hand.   Maybe someday we'll have good tools for that, too, but at present I don't know of any.  The only way to keep that effort reasonable is to keep the language we compile as simple and regular as possible.  I'm considering a couple of candidates, and have not made a final choice yet, but it's certain to be a small object-oriented language that is something like a tiny subset of Java (though perhaps with a couple non-Java-ish features).

1 comment:

  1. Can we choose a language other than Java or ML to build the compiler?

    ReplyDelete