Business Intelligence
Business Intelligence combines multiple technologies including Data Warehousing (DW), On-Line Analytical Processing (OLAP), Data Mining (DM), Data Visualisation (VIS), Decision Analysis (what-if), and Customer Relationship Management (CRM).
Operational vs Analytical Data
Operational data involves transaction-based information from day-to-day operations, maintained with frequent updates and predictable usage patterns.
Analytical data supports decision-making through read-only access, summary information, and unpredictable query patterns requiring access to large datasets.
Data Warehouse — definition
"A Data Warehouse is an enterprise-wide collection of" data characterised by four properties:
- Subject oriented
- Integrated
- Time variant
- Non-volatile
The ETL process includes extraction, cleansing, transformation, aggregation and loading of data.
Dimensional Modelling
Dimension tables
Dimension tables provide descriptive (STRING) attributes (COLUMNS) of a business with rich columns, heavy indexing and hierarchical organisation.
Dimension types:
- Normal Dimension
- Confirmed Dimension
- Junk Dimension
- Degenerated Dimension
- Role-Playing Dimension
Fact tables
"The centralised table in a star schema is called the FACT table" — containing numerical measures and foreign keys to dimension tables, typically holding millions of records.
Measure types:
- Additive — summarisable across all dimensions
- Non-Additive — cannot be summarised
- Semi-Additive — summarisable across some dimensions
Database Schemas
Star schema
The simplest design where "the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions."
Advantages: easy definition, fewer physical joins, simple metadata.
Drawbacks: poor performance for summary levels; huge dimension tables.
Snowflake schema
"A snowflake schema is a term that describes a star schema structure normalised through the use of outrigger tables."
Advantages: better performance for aggregation queries.
Disadvantages: complex maintenance, more tables, increased joins needed.
Slowly Changing Dimensions (SCD)
Three approaches handle dimension changes over time:
- SCD Type 1 — overwrites old values when history isn't required.
- SCD Type 2 — creates new records maintaining full history (most preferred).
- SCD Type 3 — adds new fields for previous-level history when changes are minimal.
Data Warehouse Architecture Components
ETL process: "Extraction, transformation and loading" involves accessing, manipulating source data, and loading into target databases.
Data Staging Area: "A storage area that cleans, transforms, combines, deduplicates and prepares source data for use in the data warehouse."
Metadata: information summarising essential facts about data, enabling efficient searching and data-instance management.
Design Approaches
Top-Down
Creates data marts from the enterprise data warehouse, providing consistent dimensional views. Preferred by large organisations but costly and time-intensive.
Bottom-Up
Creates data marts first for immediate reporting, then integrates into the warehouse. Lower cost and faster initial results but less consistent dimensional views.
Additional Concepts
- Surrogate Keys — numeric, sequential identifiers enabling faster retrieval and easier index maintenance.
- Granularity — determines detail levels stored in fact tables, either transactional (more detailed) or periodic snapshot (summarised).
- Degenerated Dimension — operational control numbers like invoice or order numbers that resemble dimension keys but don't join to actual dimension tables.
- Operational Data Store (ODS) — "A central archive that provides a snapshot of the most recent data from multiple transactional processes for operational monitoring."

