Difference between revisions of "Data Science"
Jump to navigation
Jump to search
| (8 intermediate revisions by the same user not shown) | |||
| Line 4: | Line 4: | ||
* Velocity: number of rows/bytes ''per unit time'' | * Velocity: number of rows/bytes ''per unit time'' | ||
(Veracity: Can we trust this data?) | (Veracity: Can we trust this data?) | ||
| + | |||
| + | === Data Model === | ||
| + | Three components: | ||
| + | * Structures | ||
| + | * Constraints | ||
| + | * Operations | ||
| + | |||
| + | What is a database? '''A collection of information organized to afford efficient retrieval.''' | ||
| + | |||
| + | Why do we need a database? | ||
| + | * Sharing | ||
| + | * Data model enforcement | ||
| + | * Scale | ||
| + | * Flexibility | ||
| + | |||
| + | === Relational Algebra === | ||
| + | http://en.wikipedia.org/wiki/Relational_algebra | ||
| + | |||
| + | Operations | ||
| + | * Union $\cup$, intersection $\cap$, difference $-$ | ||
| + | * Selection $\sigma$ | ||
| + | * Projection $\Pi$ | ||
| + | * Join $\bowtie$ | ||
| + | * (Extended RA) Duplicate elimination $d$ | ||
| + | * (Extended RA) Grouping and aggregation $g$ | ||
| + | * (Extended RA) Sorting $t$ | ||
| + | |||
| + | ==== Join ==== | ||
| + | * Equi-join $\bowtie_{A=B}$ | ||
| + | * $\theta$-join $\bowtie_\theta$ | ||
Latest revision as of 21:50, 2 September 2013
The Three V's of Big Data
- Volume: number of rows/objects/bytes
- Variety: number of columns/dimensions/sources
- Velocity: number of rows/bytes per unit time
(Veracity: Can we trust this data?)
Data Model
Three components:
- Structures
- Constraints
- Operations
What is a database? A collection of information organized to afford efficient retrieval.
Why do we need a database?
- Sharing
- Data model enforcement
- Scale
- Flexibility
Relational Algebra
http://en.wikipedia.org/wiki/Relational_algebra
Operations
- Union $\cup$, intersection $\cap$, difference $-$
- Selection $\sigma$
- Projection $\Pi$
- Join $\bowtie$
- (Extended RA) Duplicate elimination $d$
- (Extended RA) Grouping and aggregation $g$
- (Extended RA) Sorting $t$
Join
- Equi-join $\bowtie_{A=B}$
- $\theta$-join $\bowtie_\theta$