{"id":6346,"date":"2021-12-02T11:00:00","date_gmt":"2021-12-02T19:00:00","guid":{"rendered":"https:\/\/www.springboard.com\/?p=6346"},"modified":"2023-07-13T02:04:35","modified_gmt":"2023-07-13T09:04:35","slug":"machine-learning-projects","status":"publish","type":"post","link":"https:\/\/www.springboard.com\/blog\/data-science\/machine-learning-projects\/","title":{"rendered":"21 Machine Learning Projects [Beginner to Advanced Guide]"},"content":{"rendered":"\n<p>While theoretical machine learning knowledge is important, hiring managers value production engineering skills above all when looking to fill a machine learning role. To become job-ready, aspiring machine learning engineers must build applied skills through project-based learning. Machine learning projects can help reinforce different technical concepts and can be used to showcase a dynamic skill set as part of your professional portfolio. <\/p>\n\n\n\n<p>No matter your skill level, you\u2019ll be able to find machine learning project ideas that excite and challenge you. For inspiration, we\u2019ve gathered examples of real-world ML projects geared towards beginner, intermediate, and advanced skill levels. Using these projects as templates, we\u2019ll explore what a completed project should look like and discuss actionable tips for building your own impressive machine learning project. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Machine Learning Projects<\/h2>\n\n\n\n<p>First, we\u2019ll examine basic machine learning projects geared toward learners who are proficient with R or Python (the most renowned language in the field of <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/data-science-definition\/\" target=\"_blank\" rel=\"noreferrer noopener\">data science<\/a> and machine learning) programming language and want to experiment with machine learning fundamentals. Next, we\u2019ll review ML project ideas suited for those with <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/machine-learning-skills\/\" target=\"_blank\" data-type=\"URL\" data-id=\"https:\/\/www.springboard.com\/blog\/data-science\/machine-learning-skills\/\" rel=\"noreferrer noopener\">intermediate and advanced machine learning skills<\/a>. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Machine Learning Project Ideas for Beginners<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">Titanic Survival Project<\/h4>\n\n\n\n<p>After striking an iceberg, the so-called \u201cunsinkable\u201d RMS Titanic disappeared into the icy waters of the North Atlantic on April 15th, 1912. Over half of the ship\u2019s 2224 passengers and crew members perished, and demographic data shows that some people aboard were more likely to survive than others. <\/p>\n\n\n\n<p>This <a href=\"https:\/\/www.kaggle.com\/c\/titanic\" target=\"_blank\" rel=\"noreferrer noopener\">Kaggle project<\/a> asks participants to build a model that predicts passenger survival based on passenger information like ticket class, gender, age, port of embarkation, and more. Kaggle offers a training data set that participants can use to build their own machine learning models, which can be constructed locally or on Kaggle Kernels (a no-setup, customizable Jupyter Notebooks environment with free GPUs). <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Identifying Twits on Twitter Using Natural Language Processing<\/h4>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/12\/shutterstock_1353897677-scaled.jpg\" alt=\"Identifying Twits on Twitter Using Natural Language Processing\" class=\"wp-image-13743\"\/><\/figure>\n\n\n\n<p>Try your hand at determining the probability that a given tweet originated from a particular user with sentiment analysis. This natural language processing technique can scan thousands of text documents for specific filters in a matter of seconds. This technique is how Twitter, for example, can scan and separate out tweets that contain racist or misogynistic content. <\/p>\n\n\n\n<p>For inspiration, check out Eugene Aiken\u2019s application of natural language processing to determine the probability that certain tweets were published by either Donald Trump or Hillary Clinton. To conduct a similar project, you\u2019ll need to pick two users, scrape their tweets, run your twitter data through a natural language processor, classify it with a machine learning algorithm, and use the predict-proba method to determine probabilities. Learn more about the original project <a href=\"https:\/\/towardsdatascience.com\/twitter-api-and-nlp-7a386758eb31\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a> and download the data set <a href=\"https:\/\/github.com\/elaiken3\/twitter_api-nlp-project1\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Housing Prices Prediction<\/h4>\n\n\n\n<p>This <a href=\"https:\/\/www.kaggle.com\/c\/house-prices-advanced-regression-techniques\/data?select=data_description.txt\" target=\"_blank\" rel=\"noreferrer noopener\">Kaggle competition<\/a> will help you practice creative feature engineering as well as regression techniques like random forest and gradient boosting. The goal of the project is to predict the final sales price of a home based on the Ames Housing Dataset. <\/p>\n\n\n\n<p>The data includes 79 explanatory variables that describe vital attributes of homes in the city of Ames, Iowa. These data points range from zoning classification to lot size, remodel date, proximity to a railroad, and even masonry veneer type. The effect of each characteristic on house prices might surprise you!<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Google Search Analysis With Python<\/h4>\n\n\n\n<p>Google users perform roughly 3.5 billion search engine queries per day. If you\u2019re wondering what people are Googling, try using Google Trends to analyze a keyword of your choice. Google Trends offers an API called pytrends, which Aman Kharwal <a href=\"https:\/\/thecleverprogrammer.com\/2021\/04\/27\/google-search-analysis-with-python\/\" target=\"_blank\" rel=\"noreferrer noopener\">used to analyze<\/a> the performance of the keyword, \u201cmachine learning.\u201d<\/p>\n\n\n\n<p>Aman used this tool to pinpoint 10 countries with the highest number of searches for \u201cmachine learning,\u201d and also determined how the number of \u201cmachine learning\u201d search queries changed over time. After conducting his analysis, Aman then used data visualizations to communicate his findings. Try building your own data visualization and consider what story your results might tell, and how that information could be important in a business context. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Chatbot Using Python<\/h4>\n\n\n\n<p>Chatbots are AI-powered applications that simulate human conversation, and are often implemented to field simple customer queries online. If you\u2019re interested in natural language processing, try creating a Chatbot with Python\u2019s NTLK library. <\/p>\n\n\n\n<p>First, you\u2019ll need to compile a list of queries and their correlating responses for the chatbot. Next, you\u2019ll run the program and try out your queries with the chatbot. Once you\u2019re satisfied with your baseline chatbot, you can use additional Python NLP packages or add more queries. To get started, take a look at Aman Khalwar\u2019s <a href=\"https:\/\/thecleverprogrammer.com\/2021\/03\/25\/chatbot-using-python\/\" target=\"_blank\" rel=\"noreferrer noopener\">guide to creating a chatbot with Python<\/a>. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Image Recognition<\/h4>\n\n\n\n<p>If you\u2019re curious about computer vision, check out <a href=\"https:\/\/www.kaggle.com\/c\/digit-recognizer\/overview\/description\" target=\"_blank\" rel=\"noreferrer noopener\">this Kaggle competition<\/a>, which invites participants to build a digit recognizer using the classic MNIST dataset of handwritten numbers. The MNIST dataset\u2014commonly referred to as the &#8220;Hello World&#8221; of machine learnings\u2014comes equipped with pre-extracted features, which will streamline your data processing. Overall, this competition is an excellent introduction to simple <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/rnn-vs-cnn\/\" target=\"_blank\" data-type=\"URL\" data-id=\"https:\/\/www.springboard.com\/blog\/data-science\/rnn-vs-cnn\/\" rel=\"noreferrer noopener\">neural networks<\/a>, computer vision fundamentals, and classification methods like SVMs (Support Vector Machines) and K-nearest neighbors. <\/p>\n\n\n\n<p>The competition also includes links to Python tutorials, as well as information about the details of the dataset (including previously applied algorithms and their levels of success). <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Python Recommendation Engine<\/h4>\n\n\n\n<p>Building a recommendation engine sounds like a difficult task for a beginner, but your code can be as simple or as complex as you\u2019d like. To create a basic content-based recommendation system, you\u2019ll just need to maintain a log of items a user has seen and liked and calculate the top-N most similar products that user has not yet seen. A simple collaborative filtering recommendation engine can be powered by a user-user similarity matrix that recommends items that similar users like. <\/p>\n\n\n\n<p>To learn more about building a Python recommendation engine, check out this <a href=\"https:\/\/www.kaggle.com\/gspmoreira\/recommender-systems-in-python-101\/notebook\" target=\"_blank\" rel=\"noreferrer noopener\">Kaggle notebook<\/a>, which explains how to implement collaborative filtering and content-based filtering in Python to generate personalized recommendations. The notebook explores these concepts using a <a href=\"https:\/\/www.kaggle.com\/gspmoreira\/articles-sharing-reading-from-cit-deskdrop\" target=\"_blank\" rel=\"noreferrer noopener\">rich, rare dataset<\/a> about article sharing.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Intermediate Machine Learning Projects<\/h3>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/11\/Intermediate-Machine-Learning-Projects.jpg\" alt=\"machine learning projects: Intermediate Machine Learning Projects\" class=\"wp-image-13589\"\/><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Finding Frauds While Tracking Imbalanced Data<\/h4>\n\n\n\n<p>From banking via smartphones to AI-fueled stock price prediction, the financial sector embraces a cloud-based future. Thanks to a rising financial <a href=\"https:\/\/www.forbes.com\/sites\/steveculp\/2020\/08\/26\/why-banks-need-to-sharpen-their-focus-on-financial-crime\/?sh=6f1019fe5038\" target=\"_blank\" rel=\"noreferrer noopener\">crime rate<\/a>, the importance of AI-powered fraud detection is greater than ever. But because fraudulent financial interactions comprise only a small portion of the total number of financial transactions that occur daily, analysts must figure out how to reliably detect fraud with imbalanced data.  <\/p>\n\n\n\n<p>Fraud detection is a classification problem that deals with imbalanced data, meaning the issue to be predicted (fraud) is in the minority. As such, predictive models often struggle to generate real business value from imbalanced data, and sometimes results may be incorrect. <\/p>\n\n\n\n<p>To address the issue, you can implement three different strategies: <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Oversampling <\/li>\n\n\n\n<li>Undersampling<\/li>\n\n\n\n<li>A combined approach <\/li>\n<\/ul>\n\n\n\n<p>A combination approach can strike a balance between precision and recall, but you may choose to prioritize one over the other depending on the demands of your project and your desired business outcomes. You can learn more about conducting fraud detection with imbalanced data <a href=\"https:\/\/mlopshowto.com\/detecting-financial-fraud-using-machine-learning-three-ways-of-winning-the-war-against-imbalanced-a03f8815cce9\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Market Basket Analysis<\/h4>\n\n\n\n<p>With this Kaggle dataset, you can deploy an apriori algorithm to analyze and predict customer purchasing behaviors, otherwise known as Market Basket Analysis. Retailers often use this modeling technique used by retailers to determine associations between items based on rules of conditional probability. <\/p>\n\n\n\n<p>As per the theory of Market Basket Analysis, if a customer buys a certain group of items, that customer is likely to buy related items as well. For example, purchasing baby formula often carries a correlation with buying diapers. This particular Kaggle dataset contains information about customers&#8217; grocery purchases. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Text Summary<\/h4>\n\n\n\n<p>Text summarization condenses a piece of text while preserving its meaning. Increasingly, text summarization is being automated with Natural Language Processing. Extractive text summarization uses a scoring function to identify and lift key pieces of text from a document and assemble them into an edited version of the original. Abstractive text summarization, however, uses advanced natural language processing techniques to generate a new, shorter version that conveys the same information. <\/p>\n\n\n\n<p>To create a text summarization system with machine learning, you\u2019ll need familiarity with Pandas, Numpy, and NTLK. You\u2019ll also need to use unsupervised learning algorithms like the Glove method (developed by Stanford) for word representation. Find a step-by-step guide to text summarization system building <a href=\"https:\/\/thecleverprogrammer.com\/2020\/08\/24\/summarize-text-with-machine-learning\/\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>. <\/p>\n\n\n<div class=\"bg-leaf-50 p-4 my-3\"><h4 class=\"fw-bold text-center\">Get To Know Other\tData Science Students<\/h4><div class=\"row row-cols-1 row-cols-lg-3\"><div class=\"col\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/peter-liu\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1629203191\/Student%20Success\/Peter_Liu_125x125.png\" alt=\"Peter Liu\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Peter Liu<\/p><p class=\"text-muted lh-1\">Business Intelligence Analyst at Indeed<\/p><\/div><div class=\"w-100 d-block d-md-none mt-3\"><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/peter-liu\">Read Story<\/a><\/p><\/div><\/div><div class=\"col d-none d-md-block\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/brandon-beidel\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1635453422\/Brandon_Beidel_125x125.png\" alt=\"Brandon Beidel\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Brandon Beidel<\/p><p class=\"text-muted lh-1\">Senior Data Scientist at Red Ventures<\/p><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/brandon-beidel\">Read Story<\/a><\/p><\/div><\/div><div class=\"col d-none d-md-block\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/diana-xie\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1629203192\/Student%20Success\/Diana_Xie_125x125.png\" alt=\"Diana Xie\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Diana Xie<\/p><p class=\"text-muted lh-1\">Machine Learning Engineer at IQVIA<\/p><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/diana-xie\">Read Story<\/a><\/p><\/div><\/div><\/div><\/div>\n\n\n\n<h4 class=\"wp-block-heading\">Black Friday Sales Prediction<\/h4>\n\n\n\n<p>Want to work on a <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/regression-vs-classification\/\" target=\"_blank\" data-type=\"URL\" data-id=\"https:\/\/www.springboard.com\/blog\/data-science\/regression-vs-classification\/\" rel=\"noreferrer noopener\">regression model<\/a> and expand your feature engineering skills? With this <a href=\"https:\/\/datahack.analyticsvidhya.com\/contest\/black-friday\/\" target=\"_blank\" rel=\"noreferrer noopener\">practice problem from Analytics Vidhya<\/a>, you can use retail sales data to make predictions about Black Friday sales. <\/p>\n\n\n\n<p>The dataset contains demographic information for customers (including age, gender, marital status, location, and more) as well as product details and total purchase amounts. A training data set and a testing data set are available. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Text Mining<\/h4>\n\n\n\n<p>From emails to social media posts, <a href=\"https:\/\/www.datamation.com\/big-data\/structured-vs-unstructured-data\/\" target=\"_blank\" rel=\"noreferrer noopener\">80% of extant text data<\/a> is unstructured. Text mining is a way to extract valuable insights from this type of raw data. The process of text mining transforms unstructured text data into a structured format, facilitating the identification of key patterns and relationships within data sets. <\/p>\n\n\n\n<p>To try your hand at text mining, experiment with <a href=\"https:\/\/www.csie.ntu.edu.tw\/~cjlin\/libsvmtools\/datasets\/multilabel.html#siam-competition2007\" target=\"_blank\" rel=\"noreferrer noopener\">these publicly available text data sets<\/a>, which are geared towards multi-level classification, which is an important aspect of natural language processing. The <a href=\"http:\/\/manikvarma.org\/downloads\/XC\/XMLRepository.html\" target=\"_blank\" rel=\"noreferrer noopener\">Extreme Classification Repository<\/a> that contains these data sets also provides resources that can be used to evaluate the performance of multi-label algorithms. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Million Song Analysis<\/h4>\n\n\n\n<p>Use this subset of the <a href=\"http:\/\/archive.ics.uci.edu\/ml\/datasets\/YearPredictionMSD\" target=\"_blank\" rel=\"noreferrer noopener\">Million Song Dataset<\/a> to predict the release year of a song from its audio features. The songs are primarily commercial Western tracks dating from 1922 to 2011, although the dataset does not include any audio\u2014it consists of derived features only. <\/p>\n\n\n\n<p>The core of the dataset is feature analysis and metadata related to each track. Song descriptions include values expressing danceability, loudness, duration of the track in seconds, and much more. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Movie Recommendation Engine<\/h4>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1000\" height=\"667\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/12\/shutterstock_1187976172.jpg\" alt=\"Movie Recommendation Engine\" class=\"wp-image-13750\" srcset=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/12\/shutterstock_1187976172.jpg 1000w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/12\/shutterstock_1187976172-400x267.jpg 400w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/12\/shutterstock_1187976172-768x512.jpg 768w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/12\/shutterstock_1187976172-380x253.jpg 380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/12\/shutterstock_1187976172-700x467.jpg 700w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/12\/shutterstock_1187976172-380x253.jpg 420w\" sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><\/figure>\n\n\n\n<p>Netflix uses collaborative filtering as part of its complex recommendation system, and with the <a href=\"https:\/\/grouplens.org\/datasets\/movielens\/1m\/\" target=\"_blank\" rel=\"noreferrer noopener\">MovieLens Dataset<\/a>, you can too! Collaborative filtering recommendation engines analyze user behavior\/preferences and similarities between users to predict what users will like.&nbsp;<\/p>\n\n\n\n<p>For example, if User A rated Spiderman, Batman Returns, and X-Men highly and User B gave high ratings to Batman Returns, X-Men, and Wonder Woman, a collaborative filtering algorithm would identify that both users enjoy superhero movies. Based on their shared behaviors, the system would recommend Wonder Woman to User A and Spiderman to User B. <\/p>\n\n\n\n<p>The MovieLens 1M Data Set contains 1,000,209 ratings of roughly 3,9000 movies from 6,040 MovieLens users who joined MovieLens in 2000. The data set notes the genre of each film as well as the gender, occupation, age, and zip code of each user. You can learn more about building a movie recommendation engine with this data set <a href=\"https:\/\/medium.com\/analytics-vidhya\/implementation-of-a-movies-recommender-from-implicit-feedback-6a810de173ac\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Advanced Machine Learning Projects<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">Catching Crooks on the Hook Using Geo-Mapping and Cloud Computing<\/h4>\n\n\n\n<p><a href=\"https:\/\/globalfishingwatch.org\/press-release\/technology-collaboration\/\" target=\"_blank\" rel=\"noreferrer noopener\">Global Fishing Watch<\/a> is a website launched by Google in partnership with environmental nonprofits to monitor commercial fishing activities worldwide, with the goal of reducing overfishing, illegal fishing, and marine habitat destruction.     <\/p>\n\n\n\n<p>Global Fishing Watch identifies and tracks illegal fishing activity by harvesting GPS data from ships and processing GPS data and other important information with neural networks. 60 million data points from 300,000+ vessels are harvested daily, and the website\u2019s algorithm has learned to classify these ships by type (sail, cargo, or fishing), fishing gear (grawl, longline, purse seine), and fishing behaviors (where and when a vessel is active). <\/p>\n\n\n\n<p>Global Fishing Watch shares vessel tracking information publicly, meaning anyone can download the website\u2019s data and even track commercial fishing activity in real time. To warm up, see if you can use supervised classification to determine whether a vessel is fishing. <\/p>\n\n\n\n<p>Download Global Fishing Watch datasets and find links to GitHub documentation and details <a href=\"https:\/\/globalfishingwatch.org\/datasets-and-code\/\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Uber Helpful Customer Support Using Deep Learning<\/h4>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"329\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/12\/Uber-COTA.png\" alt=\"\" class=\"wp-image-13745\" srcset=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/12\/Uber-COTA.png 768w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/12\/Uber-COTA-400x171.png 400w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/12\/Uber-COTA-380x163.png 380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/12\/Uber-COTA-700x300.png 700w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/12\/Uber-COTA-380x163.png 420w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><figcaption class=\"wp-element-caption\">Source: <a href=\"https:\/\/eng.uber.com\/cota\/\" target=\"_blank\" rel=\"noreferrer noopener\">Uber Engineering<\/a><\/figcaption><\/figure>\n\n\n\n<p>To resolve customer issues with efficiency and ease, Uber developed a machine learning tool called COTA (Customer Obsession Ticket Assistant) to process customer support tickets using \u201chuman-in-the-loop\u201d model architecture. Essentially, COTA uses machine learning and natural language processing techniques to rank tickets, identify ticket issues, and suggest solutions.<\/p>\n\n\n\n<p>This project is great inspiration for anyone interested in applied machine learning and actual implementation. Uber also used A\/B testing to evaluate two versions of their COTA model to assess impacts on ticket handling time, customer satisfaction, and revenue. Consider <a href=\"https:\/\/eng.uber.com\/cota-v2\/\" target=\"_blank\" rel=\"noreferrer noopener\">learning more about COTA<\/a> if you\u2019re interested in deep learning projects that combine clever technical architecture with human input. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Barbie With Brains Using Deep Learning Algorithms<\/h4>\n\n\n\n<p>Talking dolls that regurgitate pre-recorded phrases are nothing new\u2014but what if dolls could actually listen and respond to kids? Enter <a href=\"http:\/\/hellobarbiefaq.mattel.com\/wp-content\/uploads\/2015\/12\/hellobarbie-faq-v3.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Hello Barbie<\/a>. <\/p>\n\n\n\n<p>To create Hello Barbie, Mattel used natural language processing and advanced audio analytics that enabled the doll to interact logically in conversation. With the push of a button cleverly integrated into her outfit, Hello Barbie was able to record conversations and upload them to servers operated by ToyTalk, where the data was analyzed. <\/p>\n\n\n\n<p>While some were excited that the doll could learn about users over time, Hello Barbie was met with public backlash around <a href=\"https:\/\/www.cnbc.com\/2021\/07\/11\/future-ai-toys-may-be-smarter-than-parents-and-less-protective.html\" target=\"_blank\" rel=\"noreferrer noopener\">privacy concerns<\/a> and eventually was discontinued. While this application of natural language processing proved contentious, those interested in complex deep learning architectures might find inspiration in the mechanics of the project. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Netflix Artwork Personalization Using Artificial Intelligence<\/h4>\n\n\n\n<p>Netflix marshalls sophisticated AI solutions to <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/machine-learning-personalization-netflix\/\" target=\"_blank\" data-type=\"URL\" data-id=\"https:\/\/www.springboard.com\/blog\/data-science\/machine-learning-personalization-netflix\/\" rel=\"noreferrer noopener\">personalize title recommendations for users<\/a>. But personalization at Netflix doesn\u2019t stop there\u2014the streaming behemoth also personalizes the artwork and imagery used to convey those titles to users.&nbsp;<\/p>\n\n\n\n<p>The goal is to show you what you like, so if you\u2019ve watched several movies starring Uma Thurman, you\u2019d be likely to see Pulp Fiction art featuring her instead of co-stars John Travolta or Samuel L. Jackson.<\/p>\n\n\n\n<p>To do so, Netflix uses a convolutional neural network that analyzes visual imagery. The company explains that they also rely on \u201ccontextual bandits,\u201d which continually work to determine which artwork gets better engagement. Find out more about how to harness machine learning for artwork personalization <a href=\"https:\/\/hackernoon.com\/how-you-can-use-the-same-powerful-machine-learning-netflix-uses-4079715a5ff8\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Myers-Briggs Personality Prediction<\/h4>\n\n\n\n<p>The Myers Briggs Type Indicator is a popular personality test that divides people into 16 different personality types across 4 axes. With <a href=\"https:\/\/www.kaggle.com\/datasnaek\/mbti-type\" target=\"_blank\" rel=\"noreferrer noopener\">this Kaggle dataset<\/a>, you can evaluate the efficacy of the test and attempt to identify patterns related to personality type and writing style Each row in this dataset contains a person\u2019s Myers-Briggs personality type along with examples of their writing. <\/p>\n\n\n\n<p>The dataset could potentially be used to evaluate the validity of the Myers-Briggs test as it relates to the analysis, prediction, or categorization of human behavior. For example, you could apply machine learning techniques to examine the test\u2019s ability to predict linguistic style and online behavior. Alternatively, try creating an algorithm that determines a subject\u2019s personality type based on their writing. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">YouTube Comment Analysis<\/h4>\n\n\n\n<p>If you want to analyze YouTube comments with natural language processing techniques, start by scraping your text data by leveraging a library like <a href=\"https:\/\/pypi.org\/project\/youtube-comment-scraper-python\/\" target=\"_blank\" rel=\"noreferrer noopener\">Youtube-Comment-Scraper-Python<\/a>, which fetches YouTube video comments using browser automation. <\/p>\n\n\n\n<p>With automated scraping, you\u2019ll be able to focus your energy on exploratory data analysis, feature engineering, and other more advanced steps in the standard natural language processing workflow. Consider using your data to explore sentiment analysis, topic modeling, or word clouds.&nbsp;<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Hate Speech Detection<\/h4>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/12\/shutterstock_1818129140-scaled.jpg\" alt=\"\" class=\"wp-image-13747\"\/><\/figure>\n\n\n\n<p>Are you concerned about the growing prevalence of hate speech online? Try training a hate speech detection model using Python. <\/p>\n\n\n\n<p>The United States does not have laws that prohibit hate speech, as the U.S. Supreme Court has ruled that criminalizing hate speech violates the constitutional right to free speech. However, the United Nations defines gate speech as communications that attack or use discriminatory language related to a person\u2019s religion, ethnicity, race, gender, and other identity markers. <\/p>\n\n\n\n<p>Using this definition of hate speech, you can develop a hate speech detection model using a dataset originally collected from Twitter. This interesting machine learning problem will revolve around sentiment classification. Learn more about creating a hate speech detection model with Python <a href=\"https:\/\/thecleverprogrammer.com\/2021\/07\/25\/hate-speech-detection-with-machine-learning\/\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Tips To Generate Your Own Machine Learning Project Ideas<\/h2>\n\n\n\n<p>If you need to jumpstart your project ideation, here are a few tips to put you on the right track. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Pick an Idea That Excites You<\/h3>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/11\/Pick-an-Idea-That-Excites-You-scaled.jpg\" alt=\"machine learning projects: Pick an Idea That Excites You\" class=\"wp-image-13603\"\/><\/figure>\n\n\n\n<p>Create high-level concepts around your interests.  If you\u2019re passionate about fair housing, for example, learn more about how housing authorities in California are using AI to analyze and plan <a href=\"https:\/\/brownschool.wustl.edu\/News\/Pages\/Brown-School-Connects-Artificial-Intelligence-to-Social-Work,-Public-Health-Through-Open-Classroom-and-New-Courses.aspx\" target=\"_blank\" rel=\"noreferrer noopener\">affordable housing strategies<\/a>. Then consider building your own model using <a href=\"https:\/\/www.huduser.gov\/portal\/pdrdatas_landing.html\" target=\"_blank\" rel=\"noreferrer noopener\">HUD<\/a> or <a href=\"https:\/\/www.census.gov\/topics\/housing\/data.html\" target=\"_blank\" rel=\"noreferrer noopener\">U.S. Census<\/a> datasets. <\/p>\n\n\n\n<p>If you\u2019re a movie buff and aspire to work in the streaming space, peruse the <a href=\"https:\/\/netflixtechblog.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Netflix Tech Blog<\/a> for inspiration and try building different types of recommendation systems powered by collaborative filtering, content filtering, or a hybrid model. <\/p>\n\n\n\n<p>Whatever topic you choose, solidify your most viable idea with a written proposal, which will serve as a blueprint to refer back to throughout the project.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Avoid Going Out of Scope<\/h3>\n\n\n\n<p><a href=\"https:\/\/medium.com\/codex\/how-to-scope-a-machine-learning-project-d74d4025e04c\" target=\"_blank\" rel=\"noreferrer noopener\">Scoping<\/a> is the first stage of machine learning project planning, and offers an opportunity to settle on a data question, identify your objective, and select the machine learning solutions you will harness to solve your problem. <\/p>\n\n\n\n<p>If you\u2019re new to machine learning, focus on simple projects. Pick a small, succinctly-defined problem and research a lage, relevant data set to increase the odds that your project will generate a positive return on investment. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Test Your Hypothesis<\/h3>\n\n\n\n<p>In a machine learning context, <a href=\"https:\/\/www.analyticsvidhya.com\/blog\/2021\/09\/hypothesis-testing-in-machine-learning-everything-you-need-to-know\/\" target=\"_blank\" rel=\"noreferrer noopener\">hypothesis testing<\/a> is conducted to confirm initial observations catalogued during data exploration and validate these assumptions for a desired significance level. First, you\u2019ll model your hypothesis, and then you\u2019ll select your hypothesis test type based on your predictor variable type (quantitative or categorical). Python is the easiest language for beginners who want to conduct hypothesis testing. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Implement the Results<\/h3>\n\n\n\n<p>Once you\u2019ve reached all the desired outcomes, you\u2019ll be ready to implement your project. This stage consists of several steps:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Creating an API (application programming interface).  This allows you to integrate your machine learning insights into the product.<\/li>\n\n\n\n<li>Record results on a single database. Collating your results will allow you to build upon them more easily.<\/li>\n\n\n\n<li>Embed the code. If you\u2019re short on time, embedding the code is faster than an API.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Revise and Learn<\/h3>\n\n\n\n<p>When you\u2019ve finished the project, evaluate your findings. Think about what happened, and why. What could you have done differently? As you gain experience, you will be able to learn from your mistakes over time.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">ML Project Tips<\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/11\/ML-Project-Tips-scaled.jpg\" alt=\"ML Project Tips\" class=\"wp-image-13598\"\/><\/figure>\n\n\n\n<p>Both simple and complex machine learning projects should be well-organized, properly documented, and presented in an impactful way. No matter your level of expertise, here are some concrete steps you can take to make your project shine. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How To Organize a Machine Learning Project<\/h3>\n\n\n\n<p>Properly organizing your machine learning project will boost productivity, ensure reproducibility, and make your project more accessible to other machine learning engineers and <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/what-does-a-data-scientist-do\/\" target=\"_blank\" data-type=\"post\" data-id=\"24427\" rel=\"noreferrer noopener\">data scientists<\/a>. When organizing your project, be sure to: <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Streamline your file structure. In your main project folder, create subfolders for notes, input files and data, src, models, and notebooks. If you\u2019re working on GitHub, don\u2019t forget to create a README.md file to introduce newcomers to your project. <\/li>\n\n\n\n<li>Manage data effectively. Use a directory structure and do not directly modify raw data. Be sure to check the consistency of your data and use a GNU make. <\/li>\n\n\n\n<li>Keep your code clean. Be sure to provide thorough documentation and organize your code into functional, annotated units. <\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How To Start a Machine Learning Project<\/h3>\n\n\n\n<p>Ready to get started on your project? Here\u2019s how to begin: <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Identify your problem. What problem are you looking to solve, and why? Figure out which AI solution you will use to address the problem. <\/li>\n\n\n\n<li>Acquire your data. Download open-source data or try your hand at web scraping. <\/li>\n\n\n\n<li>Prepare your data. You may need to clean your data to eliminate unnecessary data or features. You may also need to transform your data, particularly if it is unstructured. Finally, you may also choose to conduct exploratory data analysis to look for patterns that might inform your project.<\/li>\n<\/ul>\n\n\n\n<p>Once your data is prepared, you\u2019ll be ready to develop your machine learning model. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How To Measure, Review, and Document an ML Project<\/h3>\n\n\n\n<p>To evaluate your machine learning project, you\u2019ll need to use metrics to measure the performance of your model. The metrics you use will depend on your problem type. The performance of a classification model can be measured using metrics like accuracy, precision, and more. Specific evaluation metrics exist for regression, natural language processing, computer vision, deep learning, and other types of problems. <\/p>\n\n\n\n<p>Before signing off on your project, you\u2019ll need to review your work for quality assurance and reproducibility. To review your project, explain how you framed your question as a machine learning task and how you prepared your data. You should also compare your training, validation, and test metrics, and explain how you validated your model. Finally, you should note potential improvements and consider how you would deploy your model. <\/p>\n\n\n\n<p>To share your work with others, you\u2019ll also need to document your machine learning project. Your documentation should offer the information necessary to reproduce your work. It should clearly and succinctly outline the problem attacked, your proposed machine learning solution, and evidence of the solution\u2019s success. Your project documentation should include: <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An executive summary of the project<\/li>\n\n\n\n<li>Context and background information about the problem <\/li>\n\n\n\n<li>A list of data sources <\/li>\n\n\n\n<li>Model documentation<\/li>\n\n\n\n<li>Validation performance results <\/li>\n\n\n\n<li>Appendix with source code <\/li>\n<\/ul>\n\n\n\n<p>Once you\u2019ve evaluated, reviewed, and documented your project, you\u2019ll be able to show it to hiring managers. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How To Include a Machine Learning Project on Your Resume<\/h3>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/11\/How-To-Include-a-Machine-Learning-Project-on-Your-Resume-scaled.jpg\" alt=\"How To Include a Machine Learning Project on Your Resume\" class=\"wp-image-13602\"\/><\/figure>\n\n\n\n<p>Your machine learning resume should highlight what you can do for your employer. When adding a project to your resume, <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/machine-learning-resume\/\" target=\"_blank\" data-type=\"URL\" data-id=\"https:\/\/www.springboard.com\/blog\/data-science\/machine-learning-resume\/\" rel=\"noreferrer noopener\">use it to demonstrate<\/a> how you would create business value in your new role. cf<\/p>\n\n\n\n<p>Frame your contributions to a project as accomplishments while using numbers and key metrics to convey your successes. Be sure to include the project title and a link to the project itself. After briefly describing the project, note the tools, programs, and skills used\u2014and emphasize any that overlap with those detailed in the description of your desired role.<\/p>\n\n\n\n<p class=\"rm has-background\" style=\"background-color:#efeff6\"><strong>Since you\u2019re here\u2026<br><\/strong>Curious about a career in data science? Experiment with our <a rel=\"noreferrer noopener\" href=\"https:\/\/www.springboard.com\/resources\/guides\/data-science-process\/\" target=\"_blank\">free data science learning path<\/a>, or join our <a rel=\"noreferrer noopener\" href=\"https:\/\/www.springboard.com\/courses\/data-science-career-track\/\" target=\"_blank\">Data Science Bootcamp<\/a>, where you\u2019ll get your tuition back if you don&#8217;t land a job after graduating. We\u2019re confident because our courses work \u2013 check out our <a rel=\"noreferrer noopener\" href=\"https:\/\/www.springboard.com\/success\/\" target=\"_blank\">student success stories<\/a> to get inspired.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>While theoretical machine learning knowledge is important, hiring managers value production engineering skills above all when looking to fill a machine learning role. To become job-ready, aspiring machine learning engineers must build applied skills through project-based learning. Machine learning projects can help reinforce different technical concepts and can be used to showcase a dynamic skill [&hellip;]<\/p>\n","protected":false},"author":100,"featured_media":13749,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_eb_attr":"","_eb_data_table":"","footnotes":""},"categories":[67],"tags":[],"marketing_tags":[],"class_list":{"0":"post-6346","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-data-science"},"acf":[],"_links":{"self":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/6346"}],"collection":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/users\/100"}],"replies":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/comments?post=6346"}],"version-history":[{"count":3,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/6346\/revisions"}],"predecessor-version":[{"id":48317,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/6346\/revisions\/48317"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/media\/13749"}],"wp:attachment":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/media?parent=6346"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/categories?post=6346"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/tags?post=6346"},{"taxonomy":"marketing_tags","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/marketing_tags?post=6346"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}