|
|
|
Introduction
Syntax specified structure. Semantics is "meaning".
How can meaning be specified?
-
Lanugage reference manual
-
Defining translator
-
Formal definition - category theory, etc.
Binding Names
There are many names in a typical program.
There are names for variables, types, procedures and functions, classes,
constants, libraries, monitors, etc.
Names are for humans readers of programs, good names help understanding,
bad names hinder it.
Can you guess what the following function does?
int factorial(int x, int sum) {
return x + sum * previous;
}
How about if we use different names?
int next_height(int height, int velocity) {
return height + velocity * DT;
}
Some languages impose lexical restrictions on names.
-
C - A name must start with a letter
-
Fortran 77 - Names must be six characters or less
-
C89 - Names cannot be more than 31 characters
-
C - A name cannot have an embedded space. The following
is not a name.
foo bar
In Fortran the space would be removed yielding just foobar.
In C it used to be common to add an underscore,
foo_bar
but now it is more conventional to use capitalization.
fooBar
-
C++ - Case matters, so fooBar and FooBar would be different names.
-
C - Alphanumeric characters (and underscore) only in a
name, e.g., foo,Bar is not one name. The reason for this restriction
is that, in languages that are whitespace-free, the lexical analyzer
would have a difficult time determining where a name ends.
-
C - Name cannot be a reserved word, so we couldn't have a
variable named 'if'. A special case of
a reserved word is a keyword. Keywords have context-sensitive
meanings, and names can be a keyword when used in a different
context. For example in Fortran we could define the following variable,
named
Integer to be of type Real.
Real Integer;
Attributes
A name has (potentially) several attributes depending on what
kind of thing is this name? Function, type, variable.
Variable
-
Address - Location in memory of the value of the variable. Sometimes
referred to as the l-value (the value used in the left-hand side
of an assignment).
-
Type - The type specifies the interpretation of bits at the storage
location and a set of possible operations permitted for that type.
Types are described in more detail below.
-
Value - The sequence of bits at the location, also called the r-value.
-
Lifetime - How long does this var live?
-
Scope - Where is it visible?
Example
int x;
x = 5;
Function
-
Return type
-
Type and number of parameters
-
Code pointer
let f x = x;;
Type/Class
typedef struct {int x; int y;} point point_type;
Binding
Binding - associates a name and a property (e.g., the Type).
Static binding
-
binding occurs before run-time
-
often remains unchanged during execution
The Type binding in C is an example of a static binding.
Dynamic binding
-
first happens at run-time
-
can change during execution
The following PostScript code dynamically binds a name.
/x exch def
Refinement of binding times
-
language definition time
-
language implementation time
-
translation-time (compile-time)
-
link-time
-
load-time
-
run-time
For example, consider the bindings in the following fragment of a
C program.
int fooBar;
fooBar = 1;
The Type, int,
is (statically) bound to the name, fooBar, at compile-time.
The compiler generates intermediate code to interpret fooBar's value
as an integer (stored in two's complement, with a sign bit).
The compiler also checks that only operations on integers are applied
fooBar.
The Value of fooBar is bound (dynamically) at run-time.
The Address of fooBar might be bound at run-time or
load-time, but let's use the earliest possible time load-time, which
is static binding.
Declarations/Definitions
A declaration is elaborated to produce a
binding.
Pascal example
program main;
(* global variables declared *)
var x: integer;
procedure p;
(* local vars to p *)
var x: boolean;
procedure foo;
(* vars local to foo *)
var x: integer;
begin
(* no more local defs allowed *)
x := 2;
end
begin
x := true;
end
begin
x := 2;
end
The binding could be explicit or implicit.
-
Explicit - The following C example explicitly binds the type.
int fooBar;
Implicit - In the above example, value is implicit (zero or undefined).
-
Implicit declaration - Fortran implicitly defined variables that start
with I-N to be an integer rather than a float
x = 3 /* Implicit declaration of x as a float */
i = x + 3 /* Implicit declaration of i as an integer */
Perl has a different convention for implicit declarations.
$x = 3; # Declare a scalar variable, x
@x = (1, 3); # Declare a list variable, x
%x = (1 => 3); # Declare an associate array variable, x
Definition - Declaration that binds all possible attributes
C example
int foo(int x); /* function declaration (forward), code attribute not bound */
...
int foo(int y) { return y; } /* function definition */
Other terms are used, e.g., prototype for function declaration
In dynamic type binding, the type binding can change during
run-time. Another way to think of this is that the values have types,
rather than the variables.
Name Spaces and Scopes
The scope refers to the region of program where a binding is
maintained
- a name can be visible or hidden, a name is visible if it can be referenced
(inner name spaces can hide names in outer name spaces)
- non-local names - the name is visible in the name space, but not
declared in the name space
Name spaces
- flat name space - in early computer languages such as COBOL,
there is only one name space in a program so all names are global
- modules, blocks, functions, or procedures create a
"hierarchical" name space in which names spaces can be nested.
- scope rules govern to which namespace a name belong
- the lifetime and scope are not always the same (consider a
local static variable in C).
- C has a block-structured namespace
+---------------------+----global namespace
|int x; |
| |
|main (...) { |
| +----------------+------main namespace
| |int x; | |
| | { | |
| | +-----------+--------block inside main namespace
| | |int x; | | |
| | | | | |
| | +-----------+ | |
| | } | |
| | ... | |
| +----------------+ |
| } |
| |
|foo (...) { |
| +----------------+------foo namespace
| |int x; | |
| | ... | |
| +----------------+ |
| } |
+---------------------+
-
environment - The bindings at a given point in the code.
The environment in the body of foo consists of
x (in foo)
main
bar
this (built-in, but a keyword)
foo
all other public classes, methods, instance vars
The x in the bar class namespace is hidden
by the x in foo's namespace.
How to refer to "hidden" variable?
Ada example
B1: declare
a: integer;
begin ...
B2: declare
a: integer; -- local a now hides a in B1
if a > B1.a then ... -- use scope.name to resolve
end B2;
end B1;
C++ example (uses ::)
Generally a name must be declared before it is used.
Declaration order is important role in determining
referencing environment. Options include
-
scope is from where declared to end of "block"
/* C89 example */
int x;
void foo() {
int y = x; /* this x is a global x */
int x;
-
relax declare before use, scope is entire block
/* C# example */
class A {
const int N = 10;
void foo() {
const int M = N; /* refers to N defined below */
const int N = 20;
-
provide constructs to control
/* Scheme example */
(define x 3)
(let (((x 4) (y x)) (+ y 3)) ; returns 6, x is 3
(let* (((x 4) (y x)) (+ y 3)) ; returns 7, x is 4
-
make all variables local by default, scope is entire block
# Python example
n = 1
def foo():
x = n
n = 3
Static vs. dynamic scoping
In static scoping, the scope can be determined at compile-time,
C has static scoping. Does Scheme have static scoping?
; The following program determines if Scheme has static scoping
; for parameters
(define (foo) (define y 1)) ; when foo is called, it will bind y
(define (bar y) (begin ; does y have static scope?
(foo)
y ; bar will return the value of y
))
; if local names are statically scoped, then 3 will be displayed,
; otherwise y is dynamically scoped so 1 will be displayed
(display (bar 3))
Scheme has static scoping for function parameters, but dynamic
scoping in the global name space when interpreted.
In dynamic scoping, the scope is determined at run-time.
PostScript has dynamic scoping. Let's do in PostScript
the same thing that we did above in Scheme.
/foo {/y 1 def} def
/bar {
1 dict begin
/y exch def
foo
y
end
} def
3 bar
If 3 is left on the stack, the PostScript is statically scoped, but
if 1 is left on the stack, then PostScript is dynamically scoped
(the binding of /y in foo binds the
/y in bar).
PostScript has dynamic binding, so 1 will be left on the stack.
Symbol Table
Bindings maintained in symbol table.
A symbol table is a table, internal to a translator/compiler, that records the
information about each identifier.
Can be modeled as a function
names --> static attributes
Binding of names to locations refered to sometimes as environment.
names --> locations
Binding of storage to values known as state or memory.
locations --> values
In interpreter, environment maps names to (all) attributes in
an enviornment.
names --> attributes
In complied languages, symbol table only present during compilation (i.e.,
all names are resolved to addresses when the code is produced). How
is table constructed and maintained by compiler? Consider block
structured program.
int x;
char y;
void q(char x) {
int y;
...
}
void p() {
double x;
...
{ int y[10];
::x = 3;
p::x = 1.2;
...
}
int main ()
...
}
Single-pass compiler
- on-entry to a block (that has a declaration) - bind new symbol table
to function/procedure name
- declaration - add binding to "local" symbol table
- name lookup - resolve name by looking in proper symbol table (one
method, prefix name with scope, i.e., fully resolve name)
Two-pass compiler
- Build local symbol table for each scope/block in first pass
- declaration - do nothing
- name lookup - resolve name by looking in proper symbol table (which
should be fully populated)
Overloading
Names and operators can be overloaded with several meanings.
C++ example
int max(int x, int y) { return x > y ? x : y; }
double max(double x, double y) { return x > y ? x : y; }
int max(int x, int y, int z) { ... }
Use types to determine which to call
y = max(1.0, 2.0); // call second max
x = max(1, 2); // call first max
x = max(1, 2.0); // which to call, depends on conversion rules...
C++ and Java allow overloading based on disambiguating type and number
of params. Ada also allows return type overloading, so the following
would be legal.
int max(int x, int y) { ... }
double max(int x, int y) { ... }
Why is this not allowed in C++?
Some languages limit the overloading of names for different kinds of
things, e.g., if x is a variable, can it be a type? It can become
confusing if you allow the same name to be used.
A Java example
class X {
...
X X(X X) {X Y; ... return X;
}
Storage Binding
-
Allocation - process of binding storage to a name
-
Deallocation - process of breaking the binding between storage and a name
-
Lifetime - allocation to deallocation time, bind to unbind time
Storage Model
There is a storage model that supports all of the storage bindings.
- visual depiction of run-time storage
- Calls to
malloc (C) or new (Java
and C++) allocates space in the heap.
- Static storage is for statically allocated variables
- The stack is for local variables and parameters.
- function call - push stack frame
- function exit - pop stack frame
- visual depiction of stack calls
Kinds of Storage Bindings
-
static
-
lifetime is entire run-time
-
allocated, bound when program starts
-
deallocated, unbound when program finishes
-
stack-dynamic
-
Elaboration or evaluation of declaration produces a binding
-
Allocation - when block is "entered"
-
Deallocation - when block is "exited"
-
lifetime - During block
-
local variables and parameters are stack-dynamic variables
-
explicit heap-dynamic
-
implicit heap-dynamic
-
In some language implementations, the run-time system manages the heap
and implicitly allocates heap memory when needed.
For example in Scheme
(list 1 2 3 4)
the list '(1 2 3 4) is allocated implicitly in the heap,
and deallocated implicitly as well!
Garbage collection - Garbage is the set of memory cells that have been
deallocated. Garbage collection is the process of reclaiming this memory
making it available for allocation.
-
Free-list - A list of unallocated memory locations. Memory is allocated
by searching this list and using either a "first-fit" strategy (i.e.,
use the first available block of the needed size, or "best-fit" strategy,
(i.e., find the block that most closely fits the requested size).
-
Reference counters - A garbage collection strategy that keeps track of
how many pointers point to a particular block and puts memory back on
the free-list when the count reaches zero.
-
each memory cell is split into a counter and a data value. The
counter records the number of pointers that point to the cell.
Initially the count is zero. When a pointer to the cell is
allocated the count is increased, when a pointer to the
cell is deallocated the count is decreased.
/* Reference count for allocated block is 0 */
int *p = malloc(4);
/* The assigment increases the reference count to 1 */
foo (int *x) {
}
/* When function is called, reference count increases to 2
since now both x and p point to the allocated memory */
foo(p); /* call foo */
/* When foo exits, x is deallocated and the count decreases
to 1 */
p = malloc(4);
/* Reference count is now 0, but 1 for newly allocated block */
-
eager strategy - it happens right away
-
incremental - reference counts are adjusted every time a pointer
is allocated/deallocated
-
has additional space and time cost
-
Mark and sweep
-
This strategy first finds each pointer in
stack, static, and heap memory. For each pointer that it finds, it
marks the heap memory that it points to. Finally, it
sweeps through the heap and re-creates the free-list
from the unmarked memory.
-
lazy strategy - only called when needed
-
usually invoked when out of memory, but most programs don't
run out of memory
-
one-time, high cost - it is expensive to mark and sweep
Overloading and Polymorphism
Operator overloading refers to an operator changing
depending upon the type of the operands.
C++ example
if (a < b) ...
The types of "a" and "b" variables determine which
kind of comparison is called, e.g., int, double, string, etc.
The compiler figures out which kind to call, of course
often this is combined with type coercion.
Languages that support the definition of abstract data types should
also allow operator overloading so that new types can be used
with existing operators.
Parametric polymorphism explicitly matches a function with
the same name and number of arguments to the type.
boolean lt(int x, int y) ...
boolean lt(int x, double y) ...
boolean lt(double x, double y) ...
Dependending on the type of the arguments, the appropriate
routine is called.
In the above, the polymorphism is explict.
Implicit parametric polymorphism also occurs in languages.
let foo g f y = (g y) + 2; f y;;
Inferring the types yields
"(alpha -> int) -> (alpha -> beta) ->beta"
as the type.
This type of polymorphism is referred to as parametric polymorphism
which is different than overloading polymorphism and
subclass polymorphism. A language that does not have
(parametric) polymorphism is said to be monomorphic.
Can be implemented by
-
Expansion - analyze calls (convert to overloading polymorphism in essence)
-
Boxing and tagging
Generics are a way to reuse code in explicit polymorphism.
/* C++ template */
template <typename T>
/* A node in the stack */
struct StackNode {
T data;
StackNode<T> * next;
};
/* The stack is a just a pointer to the first node */
struct Stack {
StackNode<T> * theStack;
};
...
Stack<int> S;
...
Template is instantiated to produce code tied to a particular type.
Bindings and Closure
Consider the following function.
(define (foo x) (+ x y))
Should the referencing environment when the function is defined includes
"y" or not? And to which "y", the most recently defined "y" or some
other "y"?
In shallow binding (often combined with dynamic scoping so
sometimes referred to as late binding) the binding of "y" occurs
when the function is called and "y" resolves to whatever "y" is
currently bound to.
In constrast deep binding binds "y" to the defining environment.
Example.
(define y 4)
(define (foo x) (+ x y))
(define y 2)
(foo 3)
If the call to "(foo 3)" returns 7 then deep binding is used (also
important for static scoping!).
If the call to "(foo 3)" returns 6 then shallow binding is used (also
looks like dynamic scoping).
The deep binding of the function is often called its closure.
The idea is to package up the function and all the referred to
names in the referencing environment at the time the function was
defined.
Sources of Information
These lecture notes are based on Chapter 3 in Scott's book and
Chapter 5 in Louden's book.
|