Udacity Logo
Log InJoin for Free

Data Engineering

Course

In data engineering for data scientists, you will practice building ETL, NLP, and machine learning pipelines. This will prepare you for the project with our industry partner Figure 8.

In data engineering for data scientists, you will practice building ETL, NLP, and machine learning pipelines. This will prepare you for the project with our industry partner Figure 8.

Built in collaboration with

IBM

Advanced

1 month

Real-world Projects

Completion Certificate

Last Updated January 16, 2024

Skills you'll learn:
scikit-learn • Data cleaning • Machine learning pipeline creation • Part of speech tagging
Prerequisites:
Basic SQL • Python for data science • JSON

Course Lessons

Lesson 1

Introduction to Data Engineering

You will get an introduction to the data engineering for data scientists course and project. The lessons include ETL pipelines, natural language pipelines, and machine learning pipelines.

Lesson 2

ETL Pipelines

ETL stands for extract, transform, and load. This is the most common type of data pipeline, and you will practice each step in this lesson.

Lesson 3

NLP Pipelines

In order to complete the project at the end of the course, you will need some natural language processing skills. Here you will practice engineering machine learning features from text data.

Lesson 4

Machine Learning Pipelines

You'll use the Scikit-Learn package to code a machine learning pipeline. With these skills, you can ingest data, create features, and train a machine learning algorithm in just one step.

Lesson 5 • Project

Project: Disaster Response Pipeline

You’ll build a machine learning pipeline to categorize emergency messages based on the needs communicated by the sender.

Taught By The Best

Photo of Andrew Paster

Andrew Paster

Instructor

Andrew has an engineering degree from Yale, and has used his data science skills to build a jewelry business from the ground up. He has additionally created courses for Udacity's Self-Driving Car Engineer Nanodegree program.

Photo of Juno Lee

Juno Lee

Curriculum Lead at Udacity

Juno is the curriculum lead for the School of Data Science. She has been sharing her passion for data and teaching, building several courses at Udacity. As a data scientist, she built recommendation engines, computer vision and NLP models, and tools to analyze user behavior.

Photo of Arpan Chakraborty

Arpan Chakraborty

Instructor

Arpan is a computer scientist with a PhD from North Carolina State University. He teaches at Georgia Tech (within the Masters in Computer Science program), and is a coauthor of the book Practical Graph Mining with R.

The Udacity Difference

Combine technology training for employees with industry experts, mentors, and projects, for critical thinking that pushes innovation. Our proven upskilling system goes after success—relentlessly.

Demonstrate proficiency with practical projects

Projects are based on real-world scenarios and challenges, allowing you to apply the skills you learn to practical situations, while giving you real hands-on experience.

  • Gain proven experience

  • Retain knowledge longer

  • Apply new skills immediately

Top-tier services to ensure learner success

Reviewers provide timely and constructive feedback on your project submissions, highlighting areas of improvement and offering practical tips to enhance your work.

  • Get help from subject matter experts

  • Learn industry best practices

  • Gain valuable insights and improve your skills