Azure Databricks Deep Dive
Have you looked at Azure DataBricks yet? No! Then you need to. Knowing how to use Apache Spark will earn you more money. It is that simple. Data Engineers and Data Scientists who know Apace Spark are in-demand! This workshop is designed to introduce you to the skills required to do both.
In the morning we will introduce Azure DataBricks then discuss how to develop in-memory elastic scale data engineering pipelines. We will talk about shaping and cleaning data, the languages, notebooks, ways of working, design patterns and how to get the best performance. You will build an engineering pipeline with Python, as well as the other options available to you. The Engineering element will be delivered by UK MVP Simon Whiteley. Simon has been deploying engineering projects with Azure DataBricks since it was announced. He has real world experience in multiple environments.
Then we will shift gears, we will take the data we moved and cleansed and apply distributed machine learning at scale. We will train a model and productionise it. We will then enrich our data with our newly predicted values. The Data Science element will be led by UK MVP Terry McCann. Terry holds an MSc in Data Science and has been working with Apache Spark for the last 5 years. He is dedicated to applying engineering practices to data science to make model development, training and scoring as easy an as automated as possible
By the end of the day, you will understand how Azure Databricks supports both data engineering and data science, levering Apace Spark to deliver blisteringly fast data pipelines and distributed machine learning models. Bring your laptop as this will be hands on.
Pre-requisites
An understanding of ETL processing either ETL or ELT on either on-premises or in a big data environment. A basic level of Machine Learning would also be beneficial, but not critical.
Laptop Required:Yes
- Software: In the session we will be using Azure Databricks. We will have labs and demos that you can follow if you want to. If you do want to then you will need the following: – An Azure Subscription – Money on the Azure Subscription – Enough access on the subscription to make service principals. – Azure Storage explorer- PowerShell
- Subscriptions: Azure
Biography Simon
Director of Engineering for Advancing Analytics Ltd and Microsoft Data Platform MVP. Simon is a seasoned solution architect & technical lead with well over a decade of Microsoft Analytics experience. A deep techie with a focus on emerging cloud technologies and applying “big data” thinking to traditional analytics problems, Simon also has a passion for bringing it back to the high level and making sense of the bigger picture. When not tinkering with tech, Simon is a death-dodging London cyclist, a sampler of craft beers, an avid chef and a generally nerdy person.
Simon is one of the organisers for Microsoft Data London and can commonly be found around Europe sharing his experience on all things Azure, analytics & data engineering.
Biography Terry
Microsoft MVP. Principal Consultant and Owner of Advancing Analytics Limited, an Advanced Analytics consultancy in the UK. Terry helps businesses advance their analytical capabilities, drawing upon a deep expertise in Data Science, Data Engineering, DataOps and applied AI. Terry holds a Master’s degree in Data Science – with a focus on DataOps for Machine Learning. Organiser of the Data Science Exeter user group, frequent speaker at conferences across the world and the host of the Data Science in Production Podcast.