Back to blog
DW · MATERIALS

Data Warehouse — Study Materials

Foundational reference for data warehousing — BI components, operational vs analytical data, dimensional modelling, star vs snowflake schemas, SCD types, ETL architecture, and design approaches.

18 November 2023· 2 min read·by Nataraj Virupaksham (Raj)
Note
Migrated from the Data Warehouse Concepts Materials post on blogs.rajcloudtech.com.

Business Intelligence

Business Intelligence combines multiple technologies including Data Warehousing (DW), On-Line Analytical Processing (OLAP), Data Mining (DM), Data Visualisation (VIS), Decision Analysis (what-if), and Customer Relationship Management (CRM).

Operational vs Analytical Data

Operational data involves transaction-based information from day-to-day operations, maintained with frequent updates and predictable usage patterns.

Analytical data supports decision-making through read-only access, summary information, and unpredictable query patterns requiring access to large datasets.

Data Warehouse — definition

"A Data Warehouse is an enterprise-wide collection of" data characterised by four properties:

  • Subject oriented
  • Integrated
  • Time variant
  • Non-volatile

The ETL process includes extraction, cleansing, transformation, aggregation and loading of data.


Dimensional Modelling

Dimension tables

Dimension tables provide descriptive (STRING) attributes (COLUMNS) of a business with rich columns, heavy indexing and hierarchical organisation.

Dimension types:

  • Normal Dimension
  • Confirmed Dimension
  • Junk Dimension
  • Degenerated Dimension
  • Role-Playing Dimension

Fact tables

"The centralised table in a star schema is called the FACT table" — containing numerical measures and foreign keys to dimension tables, typically holding millions of records.

Measure types:

  • Additive — summarisable across all dimensions
  • Non-Additive — cannot be summarised
  • Semi-Additive — summarisable across some dimensions

Database Schemas

Star schema

The simplest design where "the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions."

Advantages: easy definition, fewer physical joins, simple metadata.

Drawbacks: poor performance for summary levels; huge dimension tables.

Snowflake schema

"A snowflake schema is a term that describes a star schema structure normalised through the use of outrigger tables."

Advantages: better performance for aggregation queries.

Disadvantages: complex maintenance, more tables, increased joins needed.


Slowly Changing Dimensions (SCD)

Three approaches handle dimension changes over time:

  • SCD Type 1 — overwrites old values when history isn't required.
  • SCD Type 2 — creates new records maintaining full history (most preferred).
  • SCD Type 3 — adds new fields for previous-level history when changes are minimal.

Data Warehouse Architecture Components

ETL process: "Extraction, transformation and loading" involves accessing, manipulating source data, and loading into target databases.

Data Staging Area: "A storage area that cleans, transforms, combines, deduplicates and prepares source data for use in the data warehouse."

Metadata: information summarising essential facts about data, enabling efficient searching and data-instance management.

Design Approaches

Top-Down

Creates data marts from the enterprise data warehouse, providing consistent dimensional views. Preferred by large organisations but costly and time-intensive.

Bottom-Up

Creates data marts first for immediate reporting, then integrates into the warehouse. Lower cost and faster initial results but less consistent dimensional views.

Additional Concepts

  • Surrogate Keys — numeric, sequential identifiers enabling faster retrieval and easier index maintenance.
  • Granularity — determines detail levels stored in fact tables, either transactional (more detailed) or periodic snapshot (summarised).
  • Degenerated Dimension — operational control numbers like invoice or order numbers that resemble dimension keys but don't join to actual dimension tables.
  • Operational Data Store (ODS) — "A central archive that provides a snapshot of the most recent data from multiple transactional processes for operational monitoring."

Related posts

Materials

Oracle Database — Study Materials

Foundational Oracle reference — SQL command categories, index types with examples, views (simple, multi-table, inline), and the everyday filter operators (DISTINCT, LIKE, AND/OR/NOT, WHERE, IS NULL, ORDER BY).

27 May 2026
Materials

Informatica PowerCenter — Study Materials

A combined reference covering PowerCenter performance tuning, pushdown optimisation and partitioning, version differences (8 → 9 → 10), and how to update a target table without Update Strategy.

27 May 2026·PowerCenter
Interview Questions

Snowflake — Interview Questions

A 200+ question bank covering Snowflake architecture, micro-partitions, Time Travel, caching, masking, streams, tasks, dynamic tables, data sharing and dbt-on-Snowflake — the depth panels probe in 2026.

27 May 2026·Snowflake