logo
 
Cloud Computing Lecture Notes - Spring Semester, 2011
CS 6800 - Advanced Database Management Systems
Utah State University
Home
Calendar
Homework   
Syllabus
Resources
People
A cloud, for our purposes, is a shared-nothing, networked group of computers that we can use to run some computation in parallel on a massive dataset.

A typical application, search engine log analasys, see www.google.com/trends

Example log file, terabytes per day.

  query?Volcno Bat
  query?Island palm tree
  images?volcano
  ...

Workflow to analyze *) Write to lower case *) map to correct spelling *) aggregate and count

Need a dataflow language to manage each step. Assume each set of words is a list.

1) Map to lower case

volcno bat

2) Map to correct spelling

volcano bat

3) [1 volcano, 1 bat]

In functional programming, this is "map" and "reduce."

Pig is a dataflow language, built on top of a map/reduce architecture (Hadoop).

Kinds of objects

relations (a bag)
a bag is a set of tuples
a tuples is a list of fields
a field is a piece of data

Alias is a name bound to an object.

The following loads some data

  A = LOAD 'actor.csv' USING PigStorage(',') AS (id:int, name:chararray);

To look at the data.

  DUMP A;

To store the data.

  STORE A;

Projection, create an iterator over a column.

  B = FOREACH A GENERATE name;

Selection, use a filter.

  C = FILTER A BY id < 20;

Join

  E = LOAD 'address.csv' USING PigStorage(',') AS (name:chararray, address:chararray);
  D = JOIN A BY name, E BY name;

Grouping creates a bag of tuples with the group-by values.

  M = FOREACH A GENERATE id % 3 as mod, name;
  N = GROUP M By (mod);
  X = FOREACH N GENERATE mod, COUNT(name);


                                                                                                                                                                                                                                                                                                                                             

  E-mail questions or comments to Curtis.Dyreson at usu dot edu