Thursday, February 25, 2010
Reminder - LALR(1) quiz tomorrow (Friday)
Remember, tomorrow (26 Feb) I will give a quiz (short midterm) so you can show that you've mastered the LR(0) and LALR(1) parsing material that gave you trouble on the first midterm.
Object code lecture notes
Here are my lecture slides on LLVM IR code generation (up to data flow analysis, which I posted yesterday). Some of these have been posted before, but I think the first of those below, on walking the AST to generate LLVM IR, has not.
- From AST to IR (expressions, control flow, short-circuit conditions)
- Addressing in LLVM, Where things live and More on LLVM addressing of objects
Wednesday, February 24, 2010
Data flow analysis intro slides
Here are the slides I used today, with a couple of corrections to the mistakes you caught. (There might be more mistakes ... please let me know if you spot one.)
Clang and C++
I experimented a bit with how Clang compiles C++, but I'm not sure it provides much guidance. With the classes:
class Rect { ... };
class Square : public Rect { ... };
I end up with the llvm type declarations:
%class.Rect = type { i32, i32, i32, i32 }
%class.Square = type { [16 x i8] }
It does use bitcast to call inherited methods:
%0 = bitcast %class.Square* %s to %class.Rect* ; <%class.Rect*> [#uses=1]
call void @_ZN4Rect9setBoundsEiiii(%class.Rect* %0, i32 0, i32 0, i32 10, i32 10)
%call = call i32 @_ZN6Square4areaEv(%class.Square* %s) ; [#uses=1]
The strange names like "_ZN4Rect9setBoundsEiiii" are from "name mangling" in C++, which is how C++ provides overloading while maintaining compatibility with C which does not. The source code name is "setBounds", and the random junk is added to make it unique.
My C++ mojo has not been sufficient so far to really figure out how the dynamic dispatch is working, but it at least appears that the Clang front end (which is not as complete as the gcc front end) is using casts instead of using nested structures. That doesn't mean nested structures is a bad idea ... I think either way should work.
class Rect { ... };
class Square : public Rect { ... };
I end up with the llvm type declarations:
%class.Rect = type { i32, i32, i32, i32 }
%class.Square = type { [16 x i8] }
It does use bitcast to call inherited methods:
%0 = bitcast %class.Square* %s to %class.Rect* ; <%class.Rect*> [#uses=1]
call void @_ZN4Rect9setBoundsEiiii(%class.Rect* %0, i32 0, i32 0, i32 10, i32 10)
%call = call i32 @_ZN6Square4areaEv(%class.Square* %s) ;
The strange names like "_ZN4Rect9setBoundsEiiii" are from "name mangling" in C++, which is how C++ provides overloading while maintaining compatibility with C which does not. The source code name is "setBounds", and the random junk is added to make it unique.
My C++ mojo has not been sufficient so far to really figure out how the dynamic dispatch is working, but it at least appears that the Clang front end (which is not as complete as the gcc front end) is using casts instead of using nested structures. That doesn't mean nested structures is a bad idea ... I think either way should work.
Thursday, February 18, 2010
Lecture notes for Friday (more fun with addresses)
Tomorrow's slides (Friday 19 Feb) are here
It's pretty much a continuation of Wednesday's look at how we address things in LLVM, starting with a repeat of the last slide of LLVM code from Wednesday. Normally I try not to stuff quite so much onto each slide, but in this case I really wanted stuff side-by-side, so we can compare between C and LLVM code. There is also a slide that shows how the LLVM code becomes assembly code (with x86 assembly; of course it will look a little different if you produce assembly code for a different machine architecture).
It's pretty much a continuation of Wednesday's look at how we address things in LLVM, starting with a repeat of the last slide of LLVM code from Wednesday. Normally I try not to stuff quite so much onto each slide, but in this case I really wanted stuff side-by-side, so we can compare between C and LLVM code. There is also a slide that shows how the LLVM code becomes assembly code (with x86 assembly; of course it will look a little different if you produce assembly code for a different machine architecture).
Wednesday, February 17, 2010
LLVM web interface
Try the online LLVM demo to compile bits of C code into LLVM intermediate representation. Also you can meet Bitter Melon, one of the smartest cats I have been introduced to. Most cats are not very good programmers, because they are too easily distracted.
Thanks Kristy for the tip.
Thanks Kristy for the tip.
LLVM addressing slides
My slides for today's lecture (Wednesday 17 Feb) are full of example code ... it might be useful to have a printed copy, or an online copy to refer to. A PDF version is here.
Tuesday, February 16, 2010
llvm travails
I am now zero for three on compiles of llvm + Clang on ix. It seems that the nearness to a new release is indeed causing difficulties ... every time I do "svn update" I get several newly updated files. The first compile took 3+ hours; subsequent compiles have been a little faster. I'll give it a couple more tries before I admit defeat and either pull an older version from SVN or try put together a binary-only set.
Wednesday, February 10, 2010
Grammar booboo
Suzanne asked about a weird production in the starter grammar I provided ... which seems to allow an expression on the left-hand-side of an assignment. Oops.
In many programming languages, there are "lexprs" (left-hand expressions) and "rexprs" (right-hand expressions), and there can be lexprs inside rexprs and vice versa. For example, you could have something like this in Java:
myArray[ j + 5 ].itsField = (foo.bar()).baz();
where j + 5 is an rexpr (denoting a value), but myArray[j + 5] is a lexpr (denoting a location from which we can get a value or into which we can store a value).
But not in Cool. All fields are private. There are no arrays. I don't think you can have anything more complicated than a simple identifier on the left side of an assignment.
My bad. You could just check for this in the type-checker, but a better solution is to fix the grammar so that the left side of an assignment is an identifier and not an expression. Good catch Suzanne.
In many programming languages, there are "lexprs" (left-hand expressions) and "rexprs" (right-hand expressions), and there can be lexprs inside rexprs and vice versa. For example, you could have something like this in Java:
myArray[ j + 5 ].itsField = (foo.bar()).baz();
where j + 5 is an rexpr (denoting a value), but myArray[j + 5] is a lexpr (denoting a location from which we can get a value or into which we can store a value).
But not in Cool. All fields are private. There are no arrays. I don't think you can have anything more complicated than a simple identifier on the left side of an assignment.
My bad. You could just check for this in the type-checker, but a better solution is to fix the grammar so that the left side of an assignment is an identifier and not an expression. Good catch Suzanne.
A small request: README files
I get many assignments turned in as archives that expand to "Cool" or "Compilers". I look in the README.txt file to remind me of which assignment was turned in by whom ... and usually what I see is the original README.txt from the scanner assignment. Please replace the contents of the README file with some identifying information (e.g., "This is the type checker assignment, turned in by I.M. Smart and U.B.A. Mazed"). Thanks.
(I can deal with what I've got so far ... but please start doing this for the type checker and code generator assignments.)
(I can deal with what I've got so far ... but please start doing this for the type checker and code generator assignments.)
Tuesday, February 9, 2010
List of static semantic checks
Thanks Alex for sifting through the Cool manual and making a list of all the static semantic checks that the "type checker" is supposed to check. (The scare quotes are because, strictly speaking, a lot of these are not type rules, so really this is a static semantic analyzer which includes type checking and other checks).
1. [No undefined classes, pg. 7] If class C inherits B, then B must be
defined somewhere. This goes for any type specified by the programmer (ex:
let x:Foo := new Foo in 5, where Foo is never defined).
2. [No cycles in the class hierarchy, pg. 7] Class C must not inherit C,
nor should it inherit any other class that eventually inherits C.
3. [No class redefinitions, pg. 5] No two class definitions should use the
same name, i.e. "class C { ... }; ... class C { ... };" is invalid.
4. [No stdlib redefinitions, pg. 18] No class definition may use the names
Object, IO, Int, String, or Bool.
5. [Limited stdlib inheritance, pg. 18] No classes may inherit from Int,
Bool, or String. Classes may inherit from Object and IO.
6. [No attribute/method redefinitions, pg. 5] In the same class, no two
attributes/methods can have the same identifier. A method and an attribute
may share the same name, but not two methods or two attributes.
7. [No inherited attribute redefinitions, pg. 7] If class C inherits B,
and B defines an attribute "foo:Int", C must not contain an attribute
called foo.
8. [Limited inherited method redefinitions, pg. 10] If class C inherits B,
and B defins a method "foo(a:Int, b:Int):Int {}", then C can only
change the, it must not redefine the type signature of foo.
9. [No "self" redefinitions, pg. 12] It is an error to assign a value to
self, bind self in a let or case expression, or use self as the name of a
parameter to a method.
10. [Need a Main class / main method, pg. 19] Every program must have a
class called Main. In Main, there must be a method called main with no
parameters.
11. [All attributes/methods must typecheck, pg. 9] All methods and
attributes must pass the typechecker.
1. [No undefined classes, pg. 7] If class C inherits B, then B must be
defined somewhere. This goes for any type specified by the programmer (ex:
let x:Foo := new Foo in 5, where Foo is never defined).
2. [No cycles in the class hierarchy, pg. 7] Class C must not inherit C,
nor should it inherit any other class that eventually inherits C.
3. [No class redefinitions, pg. 5] No two class definitions should use the
same name, i.e. "class C { ... }; ... class C { ... };" is invalid.
4. [No stdlib redefinitions, pg. 18] No class definition may use the names
Object, IO, Int, String, or Bool.
5. [Limited stdlib inheritance, pg. 18] No classes may inherit from Int,
Bool, or String. Classes may inherit from Object and IO.
6. [No attribute/method redefinitions, pg. 5] In the same class, no two
attributes/methods can have the same identifier. A method and an attribute
may share the same name, but not two methods or two attributes.
7. [No inherited attribute redefinitions, pg. 7] If class C inherits B,
and B defines an attribute "foo:Int", C must not contain an attribute
called foo.
8. [Limited inherited method redefinitions, pg. 10] If class C inherits B,
and B defins a method "foo(a:Int, b:Int):Int {
change the
9. [No "self" redefinitions, pg. 12] It is an error to assign a value to
self, bind self in a let or case expression, or use self as the name of a
parameter to a method.
10. [Need a Main class / main method, pg. 19] Every program must have a
class called Main. In Main, there must be a method called main with no
parameters.
11. [All attributes/methods must typecheck, pg. 9] All methods and
attributes must pass the typechecker.
Thursday, February 4, 2010
LALR(1) Howto draft
You asked for a brief, step-by-step guide for creating LALR(1) context-free state machines (CFSMs). I have created a draft: Constructing the LALR(1) CFSM
I was shooting for 2 pages or maybe 3, but ended up with 7 ... and that's with no pictures and no worked examples. I am willing to work on this some more to make it better, but I could use some feedback on the current version. Is this clearer than the books you are using? Less clear? Is it too verbose, or too concise? Too formal or too hand-wavy? Can you see how the inductive rules for First and Nullable are the underlying logic for the pseudocode in the books?
There are small discrepancies in the construction from book to book. I think the Dragon Book mixes together the Nullable and First set algorithms, placing an epsilon in the first set to indicate that a terminal symbol is nullable, while Crafting a Compiler treats these as two different analyses (as I do in this guide). There may also be bugs in my description. If you find something confusing, or think you have spotted an error in my instructions, please let me know.
I was shooting for 2 pages or maybe 3, but ended up with 7 ... and that's with no pictures and no worked examples. I am willing to work on this some more to make it better, but I could use some feedback on the current version. Is this clearer than the books you are using? Less clear? Is it too verbose, or too concise? Too formal or too hand-wavy? Can you see how the inductive rules for First and Nullable are the underlying logic for the pseudocode in the books?
There are small discrepancies in the construction from book to book. I think the Dragon Book mixes together the Nullable and First set algorithms, placing an epsilon in the first set to indicate that a terminal symbol is nullable, while Crafting a Compiler treats these as two different analyses (as I do in this guide). There may also be bugs in my description. If you find something confusing, or think you have spotted an error in my instructions, please let me know.
Wednesday, February 3, 2010
AST assignment due Monday
As we decided in class, the AST-building assignment will be due Monday.
The requirement is very simple: Parse Cool programs, and print the tree in any way you like. Dot (GraphViz) is one very nice way to look at trees, but if you use it I suggest you find a way to look at just parts of the overall AST, because graphic depictions of large trees get unreadable fast. While a nice indented listing is also good, I wouldn't invest a huge amount of time building a tree pretty-printer unless you think you will have other uses for it later.
The requirement is very simple: Parse Cool programs, and print the tree in any way you like. Dot (GraphViz) is one very nice way to look at trees, but if you use it I suggest you find a way to look at just parts of the overall AST, because graphic depictions of large trees get unreadable fast. While a nice indented listing is also good, I wouldn't invest a huge amount of time building a tree pretty-printer unless you think you will have other uses for it later.
Monday, February 1, 2010
AST example: Itsy bitsy teeny weeny compiler
Here is a very very simple example of building (and then using) an AST.
tiny.cup parses a very simple language of assignment statements with expressions as right-hand sides. It builds a simple AST,which is passed back to the driver.
The driver (Tiny.java) obtains the AST from the parser, and hands it off to an evaluator. The evaluator emits simple stack code (in the form of pseudocode ... but you can see how you could modify this to be either real code in a language like C or Java or instructions to a stack-machine interpreter).
tiny.cup parses a very simple language of assignment statements with expressions as right-hand sides. It builds a simple AST,which is passed back to the driver.
The driver (Tiny.java) obtains the AST from the parser, and hands it off to an evaluator. The evaluator emits simple stack code (in the form of pseudocode ... but you can see how you could modify this to be either real code in a language like C or Java or instructions to a stack-machine interpreter).
Subscribe to:
Posts (Atom)