Difference between revisions of "Data Science"

From TedYunWiki
Jump to navigation Jump to search
Line 18: Line 18:
 
* Scale
 
* Scale
 
* Flexibility
 
* Flexibility
 +
 +
=== Relational Algebra ===
 +
Operations
 +
* Union $\cup$, intersection $\cap$, difference $-$
 +
* Selection $s$
 +
* Projection $\Pi$
 +
* Join $\bowtie$
 +
* Duplicate elimination $d$ (Extended RA)
 +
* Grouping and aggregation $g$ (Extended RA)
 +
* Sorting $t$ (Extended RA)

Revision as of 22:44, 2 September 2013

The Three V's of Big Data

  • Volume: number of rows/objects/bytes
  • Variety: number of columns/dimensions/sources
  • Velocity: number of rows/bytes per unit time

(Veracity: Can we trust this data?)

Data Model

Three components:

  • Structures
  • Constraints
  • Operations

What is a database? A collection of information organized to afford efficient retrieval.

Why do we need a database?

  • Sharing
  • Data model enforcement
  • Scale
  • Flexibility

Relational Algebra

Operations

  • Union $\cup$, intersection $\cap$, difference $-$
  • Selection $s$
  • Projection $\Pi$
  • Join $\bowtie$
  • Duplicate elimination $d$ (Extended RA)
  • Grouping and aggregation $g$ (Extended RA)
  • Sorting $t$ (Extended RA)