Machine Learning in Production - Concepts you should know

Adi Polak — Thu, 04 Mar 2021 00:00:00 +0000

Are you interested in learning about the Machine Learning side of data? Hurry 🎉 , you have reached the right place to start learning about it.

Here is a list of concepts for you to get started:

ML Algorithm

ML algorithm is a procedure that runs on data and produces a machine learning model. Some of the popular ones are Decision trees, Naive Bayes, and Linear Regression.

ML Model

ML model is the ML algorithm process outcome; It often contains a statistical representation of the data ingested into the algorithm. ML model input is data, and the output is either a prediction, decision, or classification.

Training set

The training set is the data ingested into the machine learning algorithm; it trains the ML model.

Testing set

The testing set is the dataset we test the ml model with. To test the ML model’s accuracy, we ingest the data into the model and measure the accuracy level of the outcome. It helps us reason about the quality of the machine learning model.

Machine Learning pipeline

The machine learning pipeline is an automation process of the machine learning workflow. It includes data transformation and correlation to fit the ML algorithm, running the algorithm to produce a model, and testing it with a test set.

Model interpretability

ML Model interpretability is the degree to which a human can reason the machine learning model’s output. The higher the degree, the easier it is for a human to understand the model’s decision or prediction.

Data Quality

Data quality measures the data’s condition based on accuracy, precision, legitimacy, validity, reliability, consistency, completeness, and more. In machine learning, data quality is important for producing high-quality, non-bias machine learning models.

Data drifts

Data drift is unexpected and undocumented changes to the data structure, semantics. Data drift can result in corrupted data and data low quality. Lack of awareness of data drift can result in a lesser quality of ML models.

Concept drift

Concept drift refers to the changes in target variables. Target variables are the outcomes of the prediction process you do with machine learning models. You can detect concept drift by measuring the statistical properties of the target variables. Machine learning’s actual target variable can change over time in unforeseen ways and presents a challenge since the predictions become less accurate as time passes.

I hope it was helpful for you and gave you more clarity about the concepts.

💡 Curious to learn more?

Read here about how to create machine learning models with python.

Terminology - Adi Polak