There’s already a lot of cool features that the tidymodels ecosystem offers, which make data science and predictive modelling easy and effective, but at times I lacked this one: automated, supervised discretization preprocessing of numeric variables. In this blogpost I’d like to present to you a new step that I implemented with Max Kuhn in the embed package, which recently became officially available on CRAN!

Continue reading

Have you also always wanted to seemlessly account for missing data patterns when doing data modelling in R? In the following blogpost I will provide you with a ready-to-use, custom recipes step that will allow you to incorporate such technique easily and quickly in all your machine learning projects.

Continue reading

Have you ever also found yourself in a situation in which you were dealing with an imbalanced classification problem, but you weren’t really quite sure how much upsampling to apply? Or what’s exactly the impact of correcting the imbalance on model performance? In this post I will explore the relationship between the upsampling ratio and model performance, while using the brand new tidymodels tune package.

Continue reading

In this post I will make a comparison between the most popular (by number of monthly downloads from Github) ML framework available for R to date: caret and its successor packages being written by the same author (Max Kuhn) that are wrapped together in a so called tidymodels framework.

Continue reading

Author's picture

Konrad Semsch

Practitioners view on predictive modelling

Senior Data Scientist @ innogy SE

Dortmund, Germany