Data Warehousing and Integration Part 1

Gain next-level skills with Coursera Plus for $199 (regularly $399). Save now.

Data Warehousing and Integration Part 1

Instructor: Venkat Krishnamurthy

Included with

Learn more

7 modules

Gain insight into a topic and learn the fundamentals.

2 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

7 modules

Gain insight into a topic and learn the fundamentals.

2 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

Skills you'll gain

Details to know

Shareable certificate

Add to your LinkedIn profile

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

There are 7 modules in this course

This course will cover various topics in data engineering in support of decision support systems, data analytics, data mining, machine learning, and artificial intelligence. You will study on-premises data warehouse architecture, dimensional modeling of data warehouses, Extract-Transform-Load (ETL) integration from source systems to data warehouse, On-line Analytical Processing (OLAP) systems, and the evolving world of data quality and data governance. It offers you an opportunity to design, develop and maintain cloud-based data pipelines. Both on-premises and cloud-based platforms will be used to illustrate and implement data engineering techniques using operational and analytical data warehouses.

This module introduces data warehousing and business intelligence, emphasizing their role in enhancing organizational decision-making. Data warehouses transform raw data into actionable insights using processes like ETL (Extract, Transform, and Load), supported by tools such as OLAP for querying and data mining. While operational databases (OLTP) are suited for daily transactions, OLAP databases are optimized for complex analytics.

What's included

3 videos6 readings1 assignment

3 videosTotal 7 minutes

Course Overview1 minute
Meet Your Instructor: Venkat Krishnamurthy2 minutes
Introduction to Data Warehouses3 minutes

6 readingsTotal 178 minutes

Welcome to Data Warehousing & Integration Part 12 minutes
Syllabus - Data Warehousing & Integration Part 110 minutes
Academic Integrity1 minute
Module 1 Overview5 minutes
Introduction to Data Warehouses5 minutes
Conceptual Database Design155 minutes

1 assignmentTotal 15 minutes

Assess Your Learning: Conceptual Database Modeling15 minutes

This module builds on the foundations of database design from the previous module, focussing on relational database modeling, normalization, and SQL. The readings will guide you in translating a conceptual EER diagram into a relational model, ensuring adherence to normalization principles and aiming for Third Normal Form (3NF). We’ll also emphasize understanding primary keys and foreign keys for maintaining data integrity and establishing table relationships. You will also have the opportunity to create and critique relational models. We’ll then explore SQL basics, covering syntax (SELECT, INSERT, UPDATE, DELETE), querying techniques (WHERE, ORDER BY, JOIN), and operations involving functions and aggregates (COUNT, SUM, AVG, MIN, MAX), which are fundamental in database querying and management.

What's included

3 readings2 assignments1 app item

This module provides an introduction to data warehouse concepts. Data warehouses are based on a multidimensional model. We will look closely into the multidimensional model and its representation as data cubes (also known as hypercubes). We’ll examine how different aspects of data are categorized into facts, measures, and dimensions. Dimensions such as Product, Time, and Customer are organized hierarchically within a cube, allowing data to be analyzed at various levels of detail. Measures such as Quantity and Sales Amount are stored within these cubes, and analysts can navigate through different levels of detail using "rolling up" and "drilling down" techniques. We will also explore key concepts such as granularity, dimension schema, and member hierarchies, which are essential in understanding how data is structured and analyzed in multidimensional models. Finally, we will learn to use techniques such as disjointness, completeness, and correctness to ensure data accuracy and integrity when aggregating information in data cubes, collectively known as summarizability.

What's included

2 videos5 readings2 assignments1 app item

2 videosTotal 5 minutes

Mental Image of Multidimensional Cube3 minutes
Summarizability2 minutes

5 readingsTotal 93 minutes

Module 3 Overview5 minutes
Multidimensional Model12 minutes
Measures and Summarizability46 minutes
OLAP Operations on a Multidimensional Model10 minutes
Data Warehouse and Architecture20 minutes

2 assignmentsTotal 50 minutes

Assess Your Learning: Measures & Summarizability25 minutes
Assess Your Learning: OLAP Operations25 minutes

1 app itemTotal 15 minutes

The Multidimensional Model15 minutes

In this module we’ll explore conceptual modeling with multidimensional models, visualized using MultiDim. This approach helps us organize data into facts and dimensions and understand the relationships between them, which is essential for designing data warehouses. We’ll explore topics such as dimensions (e.g., date, customer) and measures (e.g., quantity, total sales) in more detail. We’ll also explore the difference between primary events and secondary events and learn how they are used. Finally, we will look at another categorization of Measures into Flow: Level and Unit Measures.

What's included

2 videos4 readings3 assignments

2 videosTotal 8 minutes

Primary and Secondary Events3 minutes
Additivity of Measures5 minutes

4 readingsTotal 56 minutes

Module 4 Overview5 minutes
Design Conceptual Multidimensional Models36 minutes
Primary and Secondary Events5 minutes
Additivity of Measures10 minutes

3 assignmentsTotal 31 minutes

Assess Your Learning: Conceptual Modeling 115 minutes
Assess Your Learning: Primary and Secondary Events8 minutes
Assess Your Learning: Additivity of Measures8 minutes

In this module, we’ll dive into conceptual modeling of hierarchies within data warehouses, exploring their definitions, characteristics, and significance. Balanced hierarchies have a uniform structure where each child has one parent and all branches are of the same length, making data analysis consistent and efficient. In contrast, unbalanced hierarchies have varying branch lengths and missing aggregation levels, offering flexibility to model real-world scenarios like product categories and geographical hierarchies. You’ll also be introduced to generalized hierarchies, which involve "is-a" relationships between supertypes and subtypes, allowing for detailed data representation but requiring careful management of aggregation and specialization. We’ll also explore alternative hierarchies, showcasing different ways to organize the same dimension, such as calendar vs. fiscal views of time. Finally, we’ll look at parallel hierarchies, both independent and dependent, as tools for analyzing data from multiple perspectives, representing complex organizational structures. Understanding these hierarchy types is crucial for effective data management and analysis in data warehousing.

What's included

4 videos3 readings2 assignments

4 videosTotal 13 minutes

Balanced and Unbalanced Hierarchies5 minutes
Generalized Hierarchies4 minutes
Alternative Hierarchies2 minutes
Parallel Hierarchies1 minute

3 readingsTotal 140 minutes

Module 5 Overview5 minutes
Balanced and Unbalanced Hierarchies60 minutes
Advanced Modeling Concepts75 minutes

2 assignmentsTotal 23 minutes

Assess Your Learning: Conceptual Modeling of Hierarchies15 minutes
Assess Your Learning: Advanced Modeling Concepts8 minutes

In this module, you’ll explore logical modeling in data warehousing, which is the process of designing a structured, abstract representation of data to be stored, focusing on how data is organized, related, and optimized for efficient querying and analysis. Building on what you learned in the previous modules, you'll take the next step in data warehouse design: translating a conceptual model into a logical model for implementation. The module will focus on the relational representation of data warehouses, including the study of various schema implementations: star, snowflake, starflake, and constellation. You'll also examine the rules for mapping a multidimensional conceptual model to a relational model, highlighting the role and importance of different types of keys in this process. We'll also discuss strategies for maintaining consistency in a data warehouse. Finally, you'll explore how to pre-populate certain dimensions, like time, to streamline operations and improve query performance.

What's included

6 videos11 readings2 assignments1 app item

6 videosTotal 8 minutes

Introduction to Logical Modeling in Data Warehousing1 minute
Different ROLAP Schemas Conclusion1 minute
Surrogate Keys1 minute
Importance of Data Consistency0 minutes
Consistency in a Data Warehouse Example1 minute
Prepopulating Dimensional Data Example1 minute

11 readingsTotal 122 minutes

Module 6 Overview5 minutes
Logical Modeling of Data Warehouse32 minutes
Introduction to Surrogate Keys10 minutes
Benefits of Surrogate Keys10 minutes
Implementation of Surrogate Keys in a Data Warehouse10 minutes
Importance of Data Consistency5 minutes
Challenges & Best Practices for Maintaining and Ensuring Data Consistency10 minutes
Understanding Prepopulating Dimensions5 minutes
The Process of Prepopulating Time and Geography Dimensions5 minutes
Benefits of Prepopulating Time and Geography Dimensions5 minutes
Prepopulating Dimensions25 minutes

2 assignmentsTotal 35 minutes

Assess Your Learning: Logical Modeling20 minutes
Assess Your Learning: Keys, Consistency and Prepopulating Dimensions15 minutes

1 app itemTotal 20 minutes

Types of ROLAP Schemas20 minutes

Designing a data warehouse is a complex process that requires transitioning from high-level conceptual models to detailed logical models. This transition is critical because it bridges the gap between understanding business needs and translating them into a technical framework that effectively supports those needs. In this module, you’ll expand on the logical modeling process covered in the previous module, with a particular focus on dimensional model design and the intricacies of hierarchy modeling. As you delve deeper, you’ll encounter logical modeling for advanced concepts such as many-to-many dimensions, links between facts, and facts with multiple granularities. We’ll also explore the concept of Slowly Changing Dimensions (SCDs), which are essential for managing historical data in your warehouse. You’ll learn how to implement different SCD types to accurately track and manage changes in dimension data over time. Finally, we’ll touch on SQL for OLAP, focusing on advanced concepts like aggregation and window functions, and you’ll learn how to use SQL to query and analyze data warehouses.

What's included

5 videos11 readings1 assignment

5 videosTotal 12 minutes

Modeling Various Types of Hierarchies4 minutes
SCD Best Practices2 minutes
Translating between SCDs3 minutes
Examples of Translating Between SCD Types2 minutes
Conclusion 0 minutes

11 readingsTotal 137 minutes

Module 7 Overview5 minutes
Introduction to Conceptual & Logical Models15 minutes
Mapping Process10 minutes
Conclusion1 minute
Advanced Modeling Concepts36 minutes
Understanding Slowly Changing Dimensions5 minutes
Types of Slowly Changing Dimensions10 minutes
Benefits of Managing Slowly Changing Dimensions5 minutes
Steps for Translating Between SCD Types10 minutes
Performing OLAP queries with SQL38 minutes
Congratulations! 2 minutes

1 assignmentTotal 25 minutes

Assess Your Learning: Logical Representation of Hierarchies and Advanced concepts25 minutes

Instructor

Venkat Krishnamurthy

Northeastern University

3 Courses479 learners

Offered by

Northeastern University

Explore more from Data Analysis

Status: Preview
Northeastern University
Data Warehousing and Integration Part 2
Course
Status: Free Trial
University of Colorado System
Data Warehouse Concepts, Design, and Data Integration
Course
Status: Preview
Northeastern University
Data Warehousing Essentials for Analytics and AI Support
Course
Status: Free Trial
University of California, Irvine
Data Warehousing and Business Intelligence
Course

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Open new doors with Coursera Plus

Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Learn more

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Explore degrees

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Learn more

Frequently asked questions

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.