logo
 
Basic Semantics
CS 4700 - Programming Languages
Utah State University
Home
Calendar
Homework   
Lectures
Handouts
Syllabus
Resources
People
Print

Introduction

Syntax specified structure. Semantics is "meaning". How can meaning be specified?
  • Lanugage reference manual
  • Defining translator
  • Formal definition - category theory, etc.

Binding Names

There are many names in a typical program. There are names for variables, types, procedures and functions, classes, constants, libraries, monitors, etc.

Names are for humans readers of programs, good names help understanding, bad names hinder it. Can you guess what the following function does?

  int factorial(int x, int sum) {
    return x + sum * previous;
    }
How about if we use different names?
  int next_height(int height, int velocity) {
    return height + velocity * DT;
    }

Some languages impose lexical restrictions on names.

  • C - A name must start with a letter
  • Fortran 77 - Names must be six characters or less
  • C89 - Names cannot be more than 31 characters
  • C - A name cannot have an embedded space. The following is not a name.
      foo bar 
    
    In Fortran the space would be removed yielding just foobar. In C it used to be common to add an underscore,
      foo_bar 
    
    but now it is more conventional to use capitalization.
      fooBar 
    
  • C++ - Case matters, so fooBar and FooBar would be different names.
  • C - Alphanumeric characters (and underscore) only in a name, e.g., foo,Bar is not one name. The reason for this restriction is that, in languages that are whitespace-free, the lexical analyzer would have a difficult time determining where a name ends.
  • C - Name cannot be a reserved word, so we couldn't have a variable named 'if'. A special case of a reserved word is a keyword. Keywords have context-sensitive meanings, and names can be a keyword when used in a different context. For example in Fortran we could define the following variable, named Integer to be of type Real.
      Real Integer;  
    

Attributes

A name has (potentially) several attributes depending on what kind of thing is this name? Function, type, variable.
Variable
  • Address - Location in memory of the value of the variable. Sometimes referred to as the l-value (the value used in the left-hand side of an assignment).
  • Type - The type specifies the interpretation of bits at the storage location and a set of possible operations permitted for that type. Types are described in more detail below.
  • Value - The sequence of bits at the location, also called the r-value.
  • Lifetime - How long does this var live?
  • Scope - Where is it visible?
Example
int x;

x = 5;

Function

  • Return type
  • Type and number of parameters
  • Code pointer
let f x = x;;

Type/Class

  • typedef
  • interface
typedef struct {int x; int y;} point point_type;

Binding

Binding - associates a name and a property (e.g., the Type).

Static binding

  • binding occurs before run-time
  • often remains unchanged during execution
The Type binding in C is an example of a static binding.

Dynamic binding

  • first happens at run-time
  • can change during execution
The following PostScript code dynamically binds a name.
  /x exch def

Refinement of binding times

  • language definition time
  • language implementation time
  • translation-time (compile-time)
  • link-time
  • load-time
  • run-time

For example, consider the bindings in the following fragment of a C program.

   int fooBar;
   fooBar = 1;
The Type, int, is (statically) bound to the name, fooBar, at compile-time. The compiler generates intermediate code to interpret fooBar's value as an integer (stored in two's complement, with a sign bit). The compiler also checks that only operations on integers are applied fooBar. The Value of fooBar is bound (dynamically) at run-time. The Address of fooBar might be bound at run-time or load-time, but let's use the earliest possible time load-time, which is static binding.

Declarations/Definitions

A declaration is elaborated to produce a binding.

Pascal example

program main; 
 (* global variables declared *)
 var x: integer;

procedure p; (* local vars to p *) var x: boolean;

procedure foo; (* vars local to foo *) var x: integer;

begin (* no more local defs allowed *) x := 2; end begin x := true; end begin x := 2; end

The binding could be explicit or implicit.

  • Explicit - The following C example explicitly binds the type.
           int fooBar;
       
    Implicit - In the above example, value is implicit (zero or undefined).
  • Implicit declaration - Fortran implicitly defined variables that start with I-N to be an integer rather than a float
           x = 3      /* Implicit declaration of x as a float */
           i = x + 3  /* Implicit declaration of i as an integer */
       
    Perl has a different convention for implicit declarations.
           $x = 3;           # Declare a scalar variable, x 
           @x = (1, 3);      # Declare a list variable, x 
           %x = (1 => 3);    # Declare an associate array variable, x 
       
Definition - Declaration that binds all possible attributes

C example

int foo(int x);  /* function declaration (forward), code attribute not bound */
...
int foo(int y) { return y; } /* function definition */

Other terms are used, e.g., prototype for function declaration

In dynamic type binding, the type binding can change during run-time. Another way to think of this is that the values have types, rather than the variables.

  • Languages that have dynamic type binding are sometimes referred to as "typeless" languages, but this is not very descriptive since they still do type checking. Perl has dynamic typing.
           $x = 3;           # Declare a scalar variable, x, bind to an integer
           $x = "hello";     # Bind x to a string type!
       
  • With dynamic type binding, some type errors cannot be detected until run-time.
  • Convenient for user (don't have to specify types, can be lazy)

Name Spaces and Scopes

The scope refers to the region of program where a binding is maintained
  • a name can be visible or hidden, a name is visible if it can be referenced (inner name spaces can hide names in outer name spaces)
  • non-local names - the name is visible in the name space, but not declared in the name space

Name spaces

  • flat name space - in early computer languages such as COBOL, there is only one name space in a program so all names are global
  • modules, blocks, functions, or procedures create a "hierarchical" name space in which names spaces can be nested.
  • scope rules govern to which namespace a name belong
  • the lifetime and scope are not always the same (consider a local static variable in C).
  • C has a block-structured namespace
       +---------------------+----global namespace
       |int x;               |
       |                     |
       |main (...) {         |
       |  +----------------+------main namespace
       |  |int x;          | |
       |  |  {             | |
       |  |  +-----------+--------block inside main namespace
       |  |  |int x;     | | |
       |  |  |           | | |
       |  |  +-----------+ | |
       |  |  }             | |
       |  |  ...           | |
       |  +----------------+ |
       |  }                  |
       |                     |
       |foo  (...) {         |
       |  +----------------+------foo namespace
       |  |int x;          | |
       |  |  ...           | |
       |  +----------------+ |
       |  }                  |
       +---------------------+
  • environment - The bindings at a given point in the code. The environment in the body of foo consists of
       x (in foo)
       main
       bar
       this (built-in, but a keyword)
       foo
       all other public classes, methods, instance vars
    
    The x in the bar class namespace is hidden by the x in foo's namespace.

    How to refer to "hidden" variable?

    Ada example

    B1: declare
      a: integer;
      begin ...
        B2: declare
          a: integer; -- local a now hides a in B1
          if a > B1.a then ...  -- use scope.name to resolve
      end B2;
    end B1;
    

    C++ example (uses ::)

Generally a name must be declared before it is used. Declaration order is important role in determining referencing environment. Options include

  • scope is from where declared to end of "block"
    /* C89 example */
    int x;
    

    void foo() { int y = x; /* this x is a global x */ int x;

  • relax declare before use, scope is entire block
    /* C# example */
    

    class A { const int N = 10; void foo() { const int M = N; /* refers to N defined below */ const int N = 20;

  • provide constructs to control
    /* Scheme example */
    (define x 3)
    

    (let (((x 4) (y x)) (+ y 3)) ; returns 6, x is 3 (let* (((x 4) (y x)) (+ y 3)) ; returns 7, x is 4

  • make all variables local by default, scope is entire block
    # Python example
    n = 1
    

    def foo(): x = n n = 3

Static vs. dynamic scoping

In static scoping, the scope can be determined at compile-time, C has static scoping. Does Scheme have static scoping?
   ; The following program determines if Scheme has static scoping
   ; for parameters

(define (foo) (define y 1)) ; when foo is called, it will bind y

(define (bar y) (begin ; does y have static scope? (foo) y ; bar will return the value of y ))

; if local names are statically scoped, then 3 will be displayed, ; otherwise y is dynamically scoped so 1 will be displayed (display (bar 3))

Scheme has static scoping for function parameters, but dynamic scoping in the global name space when interpreted.

In dynamic scoping, the scope is determined at run-time. PostScript has dynamic scoping. Let's do in PostScript the same thing that we did above in Scheme.

   /foo {/y 1 def} def

/bar { 1 dict begin /y exch def foo y end } def

3 bar

If 3 is left on the stack, the PostScript is statically scoped, but if 1 is left on the stack, then PostScript is dynamically scoped (the binding of /y in foo binds the /y in bar). PostScript has dynamic binding, so 1 will be left on the stack.

Symbol Table

Bindings maintained in symbol table. A symbol table is a table, internal to a translator/compiler, that records the information about each identifier.

Can be modeled as a function

names --> static attributes

Binding of names to locations refered to sometimes as environment.

names --> locations

Binding of storage to values known as state or memory.

locations --> values

In interpreter, environment maps names to (all) attributes in an enviornment.

names --> attributes

In complied languages, symbol table only present during compilation (i.e., all names are resolved to addresses when the code is produced). How is table constructed and maintained by compiler? Consider block structured program.

int x;
char y;

void q(char x) { int y; ... }

void p() { double x; ... { int y[10]; ::x = 3; p::x = 1.2; ... }

int main () ... }

Single-pass compiler

  • on-entry to a block (that has a declaration) - bind new symbol table to function/procedure name
  • declaration - add binding to "local" symbol table
  • name lookup - resolve name by looking in proper symbol table (one method, prefix name with scope, i.e., fully resolve name)

Two-pass compiler

  • Build local symbol table for each scope/block in first pass
  • declaration - do nothing
  • name lookup - resolve name by looking in proper symbol table (which should be fully populated)

Overloading

Names and operators can be overloaded with several meanings. C++ example
int max(int x, int y) { return x > y ? x : y; }
double max(double x, double y) { return x > y ? x : y; }
int max(int x, int y, int z) { ... }
Use types to determine which to call
  y = max(1.0, 2.0); // call second max
  x = max(1, 2); // call first max
  x = max(1, 2.0); // which to call, depends on conversion rules...
C++ and Java allow overloading based on disambiguating type and number of params. Ada also allows return type overloading, so the following would be legal.
int max(int x, int y) { ... }
double max(int x, int y) { ... }
Why is this not allowed in C++?

Some languages limit the overloading of names for different kinds of things, e.g., if x is a variable, can it be a type? It can become confusing if you allow the same name to be used. A Java example

class X  {
 ...
 X X(X X) {X Y; ... return X;
 }

Storage Binding

  • Allocation - process of binding storage to a name
  • Deallocation - process of breaking the binding between storage and a name
  • Lifetime - allocation to deallocation time, bind to unbind time

Storage Model

There is a storage model that supports all of the storage bindings.
  • visual depiction of run-time storage

  • Calls to malloc (C) or new (Java and C++) allocates space in the heap.
  • Static storage is for statically allocated variables
  • The stack is for local variables and parameters.
  • function call - push stack frame
  • function exit - pop stack frame
  • visual depiction of stack calls

Kinds of Storage Bindings

  • static
    • lifetime is entire run-time
    • allocated, bound when program starts
    • deallocated, unbound when program finishes
  • stack-dynamic
    • Elaboration or evaluation of declaration produces a binding
    • Allocation - when block is "entered"
    • Deallocation - when block is "exited"
    • lifetime - During block
    • local variables and parameters are stack-dynamic variables
  • explicit heap-dynamic
    • In C++, when we create a new object (by calling a constructor), an explicit heap-dynamic binding is done
             Person p = new Person(...);  /* p is explicitly bound to a storage
                                             dynamically allocated in the heap */
         
    • Allocation - must be explicit, e.g., constructor call or malloc() in C
    • Deallocation - can implicit or explicit, e.g., free() in C
    • Lifetime - from allocation to deallocation
  • implicit heap-dynamic
    • In some language implementations, the run-time system manages the heap and implicitly allocates heap memory when needed. For example in Scheme
           (list 1 2 3 4)
         
      the list '(1 2 3 4) is allocated implicitly in the heap, and deallocated implicitly as well!

      Garbage collection - Garbage is the set of memory cells that have been deallocated. Garbage collection is the process of reclaiming this memory making it available for allocation.

      • Free-list - A list of unallocated memory locations. Memory is allocated by searching this list and using either a "first-fit" strategy (i.e., use the first available block of the needed size, or "best-fit" strategy, (i.e., find the block that most closely fits the requested size).
      • Reference counters - A garbage collection strategy that keeps track of how many pointers point to a particular block and puts memory back on the free-list when the count reaches zero.
        • each memory cell is split into a counter and a data value. The counter records the number of pointers that point to the cell. Initially the count is zero. When a pointer to the cell is allocated the count is increased, when a pointer to the cell is deallocated the count is decreased.

          /* Reference count for allocated block is 0 */ int *p = malloc(4); /* The assigment increases the reference count to 1 */

          foo (int *x) { }

          /* When function is called, reference count increases to 2 since now both x and p point to the allocated memory */ foo(p); /* call foo */ /* When foo exits, x is deallocated and the count decreases to 1 */ p = malloc(4); /* Reference count is now 0, but 1 for newly allocated block */

        • eager strategy - it happens right away
        • incremental - reference counts are adjusted every time a pointer is allocated/deallocated
        • has additional space and time cost
      • Mark and sweep
        • This strategy first finds each pointer in stack, static, and heap memory. For each pointer that it finds, it marks the heap memory that it points to. Finally, it sweeps through the heap and re-creates the free-list from the unmarked memory.
        • lazy strategy - only called when needed
        • usually invoked when out of memory, but most programs don't run out of memory
        • one-time, high cost - it is expensive to mark and sweep

Overloading and Polymorphism

Operator overloading refers to an operator changing depending upon the type of the operands.

C++ example

if (a < b) ...
The types of "a" and "b" variables determine which kind of comparison is called, e.g., int, double, string, etc. The compiler figures out which kind to call, of course often this is combined with type coercion. Languages that support the definition of abstract data types should also allow operator overloading so that new types can be used with existing operators.

Parametric polymorphism explicitly matches a function with the same name and number of arguments to the type.

boolean lt(int x, int y) ...
boolean lt(int x, double y) ...
boolean lt(double x, double y) ...
Dependending on the type of the arguments, the appropriate routine is called.

In the above, the polymorphism is explict. Implicit parametric polymorphism also occurs in languages.

   let foo g f y = (g y) + 2; f y;;
Inferring the types yields "(alpha -> int) -> (alpha -> beta) ->beta" as the type. This type of polymorphism is referred to as parametric polymorphism which is different than overloading polymorphism and subclass polymorphism. A language that does not have (parametric) polymorphism is said to be monomorphic. Can be implemented by
  • Expansion - analyze calls (convert to overloading polymorphism in essence)
  • Boxing and tagging

Generics are a way to reuse code in explicit polymorphism.

   /* C++ template */
  template <typename T>
  /* A node in the stack */
  struct StackNode {
    T data;
    StackNode<T> * next;
  };
  /* The stack is a just a pointer to the first node */
  struct Stack {
    StackNode<T> * theStack;
  };

... Stack<int> S; ...

Template is instantiated to produce code tied to a particular type.

Bindings and Closure

Consider the following function.

(define (foo x)  (+ x y))
Should the referencing environment when the function is defined includes "y" or not? And to which "y", the most recently defined "y" or some other "y"?

In shallow binding (often combined with dynamic scoping so sometimes referred to as late binding) the binding of "y" occurs when the function is called and "y" resolves to whatever "y" is currently bound to.

In constrast deep binding binds "y" to the defining environment.

Example.

(define y 4)
(define (foo x)  (+ x y))
(define y 2)
(foo 3)
If the call to "(foo 3)" returns 7 then deep binding is used (also important for static scoping!). If the call to "(foo 3)" returns 6 then shallow binding is used (also looks like dynamic scoping).

The deep binding of the function is often called its closure. The idea is to package up the function and all the referred to names in the referencing environment at the time the function was defined.

Sources of Information

These lecture notes are based on Chapter 3 in Scott's book and Chapter 5 in Louden's book.
                                                                                                                                                                                                                                                                                                                                             
  Copyright © 2011 by Curtis Dyreson. All rights reserved.