Machine Learning is a new and interesting area for data scientists and so we are running an education series to help build skills. In this third lecture we will discuss various assumptions in linear regression models and how to validate them. We will look at high-order polynomial curves and how we can use them to capture curvilinear relationships. We will also introduce the holdout method and k-fold cross validation for estimating the generalisation of the models to unseen data.
Please note: If you haven't attended the first two lectures in the series, there is a recording available on Skills Matter and YouTube. Please spend some time to familiarise yourself with the topics from the previous lectures, so you can get the most value out of Lecture 3.
Just a reminder: The language of choice for the series is Python, so if you are not familiar with Python or if you need to brush up your skills I suggest you spend some time with the videos from the freely available Google's Python Class.
YOU MAY ALSO LIKE:
Curve fitting and model validation
Nikolay has over 10 years of database experience and has been involved in large scale migration, consolidation, and data warehouse deployment projects in the UK and abroad. He is a speaker, blogger, author of numerous articles and a book on advanced database topics. For the last three years Nikolay has been working exclusively in the big data (Hadoop) space with focus on Spark and machine learning. He has an M.Sc. in Software Technologies and is working towards an M.Sc. in Data Science.