IN THIS ARTICLE
- 5 key limitations of machine learning
- 2 instances when you should (definitely) not use machine learning
- How do you know when to use machine learning, and when not to?
Get expert insights straight to your inbox.
The machine learning field has made significant progress over the last decade, offering solutions for almost all kinds of domains, like banking (fraud detection), e-commerce (recommendation system), and medical applications (tumor detection). Although machine learning provides many solutions, it is not always feasible to incorporate a machine learning-based approach for solving the problem at hand.
In this article, we will discuss the limitations of machine learning and when it is best to avoid using it.
5 key limitations of machine learning
There are a number of limitations and concerns in using machine learning to solve a variety of problems. We have summarized the top five below:
- Ethics. We are slowly moving into the stage called “dataism,” which means humans trust data and algorithms more than their personal insights. This can raise questions against the ethics of machine learning algorithms and systems. Some situations are still unanswered—like how should a self-driving car react in a fatal collision, who to blame if a self-driving car kills someone, etc. Data Scientists, ML Engineers, Deep Learning Researchers are working around the clock to crack these challenges. In the future, we might get a choice to select an ethical framework in a self-driving car.
- Data. Machine learning systems, especially deep learning models, have been data-hungry (recently, some progress has been made towards training deep learning models with few sample examples). They need a good amount of labeled data to train and provide useful insights and predictions. As the size of the architecture increases, so does the data requirement. For example, GPT3 was trained on 499 billion tokens and cost approximately $4.6 million. Another problem lies with the quality of the data. The poor quality data can significantly reduce the model’s accuracy or give risky predictions. For example, if data created for breast cancer detection uses X-rays from mostly white women, the model trained on this dataset can be biased for predictions reading X-rays of Black women. In such cases, the more accurate the data gathered via predictive analysis (which comes within the domain of data science), the more convenient the outcome will be.
- Interpretability. Interpretability is a major issue with machine learning, especially deep learning algorithms. For example, suppose you are working in a finance firm and your boss asks you to build a model to detect fraudulent transactions, along with some metrics like recall and accuracy. In that case, your model should be able to justify its classifications. A deep learning algorithm can have good accuracy and recall for this task—but might fail to validate its decisions.
- Deterministic system. Machine learning algorithms find applications in a deterministic domain, such as Computational Fluid Dynamics (CFD). The traditional methodology to solve governing equations of physics can take a long time—sometimes months—to get final results. A deep learning system applied to this domain might be able to get results in a short period of time, but it doesn’t understand the laws of physics. For example, it might give correct final answers, but intermediate fields like density can have negative values that are not possible according to physics laws.
- Reproducibility. Reproducibility is a growing issue in the machine learning field due to a lack of transparency for code and testing methodology for models being developed. New models developed in research labs are being implemented in real-world applications at a fast pace. However, these models can fail to perform in the real world despite their state-of-the-art performance in research papers. Reproducibility can help different industries and practitioners to implement the same model and find any hidden problems sooner. Lack of reproducibility can prevent models from being assessed for bias, safety, and robustness.
Get To Know Other Data Science Students
2 instances when you should (definitely) not use machine learning
Below are two examples where machine learning is not feasible.
- Solving less complex problems. Machine learning, specifically deep learning algorithms, are useful for finding complex relationships and hidden patterns in data consisting of many interdependent variables. For less complicated problems, if the rule-based system is giving performance comparable to a machine learning system, then it is advisable to avoid the use of a machine learning system.
- Lack of labeled data and in-house expertise. Most deep learning models require labeled data and an expert team to train the models and put them in production. It is advisable not to use deep learning algorithms to deliver projects if you don’t have enough labeled data and a dedicated team. For example, let’s say that you are developing a model that detects illegal listings from the e-commerce company website. The operation team has determined some keywords to help find illegal listings. Due to the mentioned constraints, you might go with a rule-based approach using keywords to detect illegal listings, and later, as the second version of the model, you can implement an image classification system along with some text models to detect illegal listings once required resources are acquired.
How do you know when to use machine learning, and when not to?
The easiest way around this question is to abide by a simple rule: Don’t build a machine learning model where a simpler approach might succeed just as well.
Sometimes, a company might prefer to train a model that is interpretable vs. a more accurate one that might be more difficult to interpret (e.g. deep learning). However, a lot of research is taking place to attempt to address this very issue in deep learning. Before you start, ask yourself: does the problem you’re trying to solve require that your model be interpretable? If not, you have your answer.
Since you’re here…
Curious about a career in data science? Experiment with our free data science learning path, or join our Data Science Bootcamp, where you’ll get your tuition back if you don’t land a job after graduating. We’re confident because our courses work – check out our student success stories to get inspired.