Difference between revisions of "Data Science"
Jump to navigation
Jump to search
| Line 18: | Line 18: | ||
* Scale | * Scale | ||
* Flexibility | * Flexibility | ||
| + | |||
| + | === Relational Algebra === | ||
| + | Operations | ||
| + | * Union $\cup$, intersection $\cap$, difference $-$ | ||
| + | * Selection $s$ | ||
| + | * Projection $\Pi$ | ||
| + | * Join $\bowtie$ | ||
| + | * Duplicate elimination $d$ (Extended RA) | ||
| + | * Grouping and aggregation $g$ (Extended RA) | ||
| + | * Sorting $t$ (Extended RA) | ||
Revision as of 21:44, 2 September 2013
The Three V's of Big Data
- Volume: number of rows/objects/bytes
- Variety: number of columns/dimensions/sources
- Velocity: number of rows/bytes per unit time
(Veracity: Can we trust this data?)
Data Model
Three components:
- Structures
- Constraints
- Operations
What is a database? A collection of information organized to afford efficient retrieval.
Why do we need a database?
- Sharing
- Data model enforcement
- Scale
- Flexibility
Relational Algebra
Operations
- Union $\cup$, intersection $\cap$, difference $-$
- Selection $s$
- Projection $\Pi$
- Join $\bowtie$
- Duplicate elimination $d$ (Extended RA)
- Grouping and aggregation $g$ (Extended RA)
- Sorting $t$ (Extended RA)