Back to Blog

Springboard Tutorials
Data Science

Springboard Tutorial: Intro to AWS

8 minute read | January 24, 2021
Leah Davidson

Written by:
Leah Davidson

Ready to launch your career?

When will demand for a certain product spike during the holiday season? Can we detect fraudulent ordering patterns? How do we reduce wait time in the call center?

These are all questions machine learning (ML) engineers can answer through Amazon Web Services (AWS), a comprehensive cloud computing platform that creates accurate and scalable ML predictions, using data stored in Amazon S3, Redshift, and Relational Database Service (RDS) (RDS is used for MySQL databases).

Want to learn more about becoming a machine learning engineer? Check out Springboard’s comprehensive career guide here.

Getting started with AWS

Amazon Machine Learning (ML) helps you to create and train models using easy-to-learn APIs. When building an ML algorithm, engineers usually follow a few standard steps: collect and transform data, split the data into training and validation sets, find a relevant model, and train the data to retrieve results.

Here are some quick steps from Knowledge Hut to begin using AWS Machine Learning:

  1. Sign in to AWS and select “Machine Learning.” Launch with Standard Setup.
  2. Choose and format a data source. Data sources do not actually store the data, but they provide a reference to the Amazon S3 location holding the input data.
  3. Create a machine learning model. In “Training and Evaluation” settings, choose the default mode to use Amazon ML’s recommended recipe, training parameters, and evaluation settings.
  4. Build a prediction. Under “ML Model Report,” the option “Try real-time predictions” gives you the opportunity to quickly create prediction results.

AWS tools and services

AWS Machine Learning includes foundational tools like SageMaker, frameworks (including TensorFlow, PyTorch, and Apache MXNet), infrastructure (EC2, Elastic Inference, and AWS Inferentia), and learning tools (AWS DeepLens, DeepComposer, and DeepRacer).

Here are some top services and tools to round out your skillset and apply your learnings in real-life settings:

Sagemaker Studio is a state-of-the-art platform that allows you to wrangle, label, and process data; build your own models with Jupyter notebooks; refine through training and tuning procedures; and deploy in a single click to the cloud. This fully integrated development environment (IDE) detects biases and ensures security through features like encryption, authorization, and authentication. This Medium post provides a deep-dive into SageMaker and its pros and cons.

This console provides automatic speech recognition (ASR) and helps to apply conversational interfaces to applications. You can use the Lex chatbot to add automated options to a call center or mobile application, so customers don’t need to wait to speak to a human agent to change their password or cancel a flight.

This tool extracts data and text from scanned documents and can run off of anything from handwriting to PDFs, forms, and tables. Features include optical character recognition (OCR) and an optional built-in human review workflow, which selects random documents for manual revision.

This service is used for natural language processing (NLP) to identify references to a given topic in a series of text files, which Amazon often stores in a S3 data lake. Comprehend can synthesize customer feedback from product reviews, recommend news content to readers based on past history, classify customer support tickets for more efficient processing, and recruit participants to the right medical trials based on cohort analysis.

This technology detects objects and faces in images and develops neural network models to label items in a visual search. Through Custom Labels, you can input objects or scenes relevant to your business. Amazon Rekognition will then build the appropriate models to accomplish defined tasks, like identifying household items in pictures, finding a product logo on a store shelf, flagging inappropriate content for families with children, deciphering human emotion based on facial expressions, interpreting hard-to-read text (billboards or handwritten signs), and recognizing celebrities or brand influencers.

This machine learning inference chip is designed to deliver higher throughput and low latency to more easily integrate ML into machine learning applications.

This is a neural machine translation service that uses NLP to convert text from one language to another. Ranked as the top machine learning provider of 2020, Amazon Translate can create websites that appear automatically in a given user’s native language, understand brand sentiment through social media analytics, and scale customer support by translating helpdesk queries into a universal language.

Working with Sagemaker, EC2, and ECS, Amazon Elastic Inference allows you to use GPU-powered acceleration to reduce costs. It can operate with TensorFlow, Apache MXNet, PyTorch and ONNX models.

This is a deep learning enabled video camera that can analyze on-camera action and sync with SageMaker for training models, Polly for speech enablement, and Rekognition for image analysis. You can perform tasks like recognizing different types of daily activities or detecting head and facial movements.

This virtual keyboard can generate music from pre-programmed genres like country, pop, and jazz, which you can then share via Soundcloud.

This 3D racing simulator allows you to compete in the AWS DeepRacer League, the world’s first global autonomous racing league with real prizes. The tiny racing car comes with cameras and sensors, creating an environment conducive to experimenting with reinforcement learning and neural network configurations.

Advantages of AWS for machine learning

There are many benefits to using AWS for machine learning needs. AWS is cost-effective, with API integrations and support for TensorFlow (a very popular open source library among data scientists), Caffe2, and Apache MXNet. With AWS, you only pay for what you use and most organizations are billed an hourly rate for the compute time and then for the number of predictions generated. There’s also an AWS Free Tier, which provides access to various machine learning resources for a limited time (usually up to a year).

Although Amazon ML is an easy solution for companies already within the AWS ecosystem because of its interoperability, it does have more limited offerings than players like Microsoft and targets software developers instead of corporate data scientists.

Here’s a solid overview of the pros and cons of AWS vs Azure vs Google Cloud Platform. AWS is known to be the most reliable of all the cloud storage options, with S3 offering eleven nines of durability (or 99.999999999% reliability) and E2 guaranteeing 99.99% uptime. Depending on the server types and whether discounted pricing is used, the main cloud-based providers vary in cost.

Get To Know Other Data Science Students

Corey Wade

Corey Wade

Founder And Director at Berkeley Coding Academy

Read Story

Sam Fisher

Sam Fisher

Data Science Engineer at Stratyfy

Read Story

Esme Gaisford

Esme Gaisford

Senior Quantitative Data Analyst at Pandora

Read Story

Practice with these hands-on examples

Practice makes perfect and there are many free resources to give you hands-on experience with AWS for ML. Here are some examples of problems you can solve!

Convert the input data with Python (a widely popular programming language in the field of data science), and use the boto3 library to generate online predictions. This dataset runs on 10,000 records from smartphone sensors, which classify activities in a numerical system (1 = walking, 2 = walking upstairs, 3 = walking downstairs, 4 = sitting, 5 = standing, 6 = lying down). The Evaluation Matrix below shows the model’s F-score (evaluation metric of 0 to 1) which tests precision and recall. The diagonal pattern shows how likely the activities are to be correctly classified.

  • Apply Amazon ML to predict how customers will respond to a marketing offer

You can look at historical customers who have bought products similar to the bank term deposit and build a model that targets the top 3% of customers most likely to take up the offer.

  • Take a free deep-dive Coursera course on how to use Amazon SageMaker and Jupyter Notebooks

This also covers Amazon Comprehend, Translate, and DeepLens, with practice exercises.

  • Create a custom bot for booking car reservations with Amazon Lex

You can train the chatbot on some sample utterances like “Book a Car” or “Make a Reservation” that trigger the booking workflow and specify different response cards.

YouTube video player for KTa1T14nkbw

You can upload a sample image and detect emotions like happy, confused, and calm, with different confidence intervals. You can also check to see if the people in other images correspond with a given reference image.

aws tutorial
  • Extract text from a Base64 image, S3 bucket image, and S3 bucket document with AWS Textract

Do this using AWS Lambda and Python. This will eventually allow you to generate a new txt file based on text extracted from a given PDF.

Create a new scene and a main entity. You can then change the “Script” to “Speech,” edit in Text Editor, and write code to associate the host voice ID with the right language code.

Build it so it extracts information on required skills, degrees, and majors from job descriptions by employing Named Entity Recognition (NER). Afterward, you can employ the UBIAI annotation tool to annotate job descriptions and train a model using the Custom Entity Recognizer (CER), so that Comprehend can automatically extract skills, diploma, and diploma major from an inputted job description.

Glossary: key AWS concepts

Here is some key terminology for MLE to better leverage the AWS tools.

Datasource: When you input data, Amazon ML stores all the details, identifying the attributes (unique, named properties), like cost, distance, color, and size, which often comprise the column headings of a CSV file or spreadsheet. You can then create interactive mathematical models and find patterns in the metadata.

Models: Let’s take a look at the different types of models that AWS can build for machine learning:

  • A binary classification model leading to one of two results (yes or no)
  • A multi-class classification model that can predict different conditions (purple, green, red, blue, yellow)
  • A regression model that results in an exact value (e.g. how many burgers will the average customer order at a restaurant on a single visit?).

You will need to decide on the model size (more patterns = larger model), number of passes (determines how many times Amazon ML can run the same data records), and regularization (getting the model complexity right to avoid overfitting).

Evaluations: To understand how well your model is performing, you’ll need to familiarize yourself with terms like AUC (area under the ROC curve), macro-averaged F1-score, Root Mean Square Error (RMSE), cut-off, accuracy, precision, and recall. This assesses the quality of your model and how accurately it will predict outcomes.

Batch Predictions: Batch predictions allow several observations to happen simultaneously.

Real-Time Predictions: With real-time predictions, you can send a request and ask for an immediate response. The Real-Time Prediction API may be well suited for web, mobile, or desktop applications with low latency requirements.

Companies are no longer just collecting data. They’re seeking to use it to outpace competitors, especially with the rise of AI and advanced analytics techniques. Between organizations and these techniques are the data scientists – the experts who crunch numbers and translate them into actionable strategies. The future, it seems, belongs to those who can decipher the story hidden within the data, making the role of data scientists more important than ever.

In this article, we’ll look at 13 careers in data science, analyzing the roles and responsibilities and how to land that specific job in the best way. Whether you’re more drawn out to the creative side or interested in the strategy planning part of data architecture, there’s a niche for you. 

Is Data Science A Good Career?

Yes. Besides being a field that comes with competitive salaries, the demand for data scientists continues to increase as they have an enormous impact on their organizations. It’s an interdisciplinary field that keeps the work varied and interesting.

10 Data Science Careers To Consider

Whether you want to change careers or land your first job in the field, here are 13 of the most lucrative data science careers to consider.

Data Scientist

Data scientists represent the foundation of the data science department. At the core of their role is the ability to analyze and interpret complex digital data, such as usage statistics, sales figures, logistics, or market research – all depending on the field they operate in.

They combine their computer science, statistics, and mathematics expertise to process and model data, then interpret the outcomes to create actionable plans for companies. 

General Requirements

A data scientist’s career starts with a solid mathematical foundation, whether it’s interpreting the results of an A/B test or optimizing a marketing campaign. Data scientists should have programming expertise (primarily in Python and R) and strong data manipulation skills. 

Although a university degree is not always required beyond their on-the-job experience, data scientists need a bunch of data science courses and certifications that demonstrate their expertise and willingness to learn.

Average Salary

The average salary of a data scientist in the US is $156,363 per year.

Data Analyst

A data analyst explores the nitty-gritty of data to uncover patterns, trends, and insights that are not always immediately apparent. They collect, process, and perform statistical analysis on large datasets and translate numbers and data to inform business decisions.

A typical day in their life can involve using tools like Excel or SQL and more advanced reporting tools like Power BI or Tableau to create dashboards and reports or visualize data for stakeholders. With that in mind, they have a unique skill set that allows them to act as a bridge between an organization’s technical and business sides.

General Requirements

To become a data analyst, you should have basic programming skills and proficiency in several data analysis tools. A lot of data analysts turn to specialized courses or data science bootcamps to acquire these skills. 

For example, Coursera offers courses like Google’s Data Analytics Professional Certificate or IBM’s Data Analyst Professional Certificate, which are well-regarded in the industry. A bachelor’s degree in fields like computer science, statistics, or economics is standard, but many data analysts also come from diverse backgrounds like business, finance, or even social sciences.

Average Salary

The average base salary of a data analyst is $76,892 per year.

Business Analyst

Business analysts often have an essential role in an organization, driving change and improvement. That’s because their main role is to understand business challenges and needs and translate them into solutions through data analysis, process improvement, or resource allocation. 

A typical day as a business analyst involves conducting market analysis, assessing business processes, or developing strategies to address areas of improvement. They use a variety of tools and methodologies, like SWOT analysis, to evaluate business models and their integration with technology.

General Requirements

Business analysts often have related degrees, such as BAs in Business Administration, Computer Science, or IT. Some roles might require or favor a master’s degree, especially in more complex industries or corporate environments.

Employers also value a business analyst’s knowledge of project management principles like Agile or Scrum and the ability to think critically and make well-informed decisions.

Average Salary

A business analyst can earn an average of $84,435 per year.

Database Administrator

The role of a database administrator is multifaceted. Their responsibilities include managing an organization’s database servers and application tools. 

A DBA manages, backs up, and secures the data, making sure the database is available to all the necessary users and is performing correctly. They are also responsible for setting up user accounts and regulating access to the database. DBAs need to stay updated with the latest trends in database management and seek ways to improve database performance and capacity. As such, they collaborate closely with IT and database programmers.

General Requirements

Becoming a database administrator typically requires a solid educational foundation, such as a BA degree in data science-related fields. Nonetheless, it’s not all about the degree because real-world skills matter a lot. Aspiring database administrators should learn database languages, with SQL being the key player. They should also get their hands dirty with popular database systems like Oracle and Microsoft SQL Server. 

Average Salary

Database administrators earn an average salary of $77,391 annually.

Data Engineer

Successful data engineers construct and maintain the infrastructure that allows the data to flow seamlessly. Besides understanding data ecosystems on the day-to-day, they build and oversee the pipelines that gather data from various sources so as to make data more accessible for those who need to analyze it (e.g., data analysts).

General Requirements

Data engineering is a role that demands not just technical expertise in tools like SQL, Python, and Hadoop but also a creative problem-solving approach to tackle the complex challenges of managing massive amounts of data efficiently. 

Usually, employers look for credentials like university degrees or advanced data science courses and bootcamps.

Average Salary

Data engineers earn a whooping average salary of $125,180 per year.

Database Architect

A database architect’s main responsibility involves designing the entire blueprint of a data management system, much like an architect who sketches the plan for a building. They lay down the groundwork for an efficient and scalable data infrastructure. 

Their day-to-day work is a fascinating mix of big-picture thinking and intricate detail management. They decide how to store, consume, integrate, and manage data by different business systems.

General Requirements

If you’re aiming to excel as a database architect but don’t necessarily want to pursue a degree, you could start honing your technical skills. Become proficient in database systems like MySQL or Oracle, and learn data modeling tools like ERwin. Don’t forget programming languages – SQL, Python, or Java. 

If you want to take it one step further, pursue a credential like the Certified Data Management Professional (CDMP) or the Data Science Bootcamp by Springboard.

Average Salary

Data architecture is a very lucrative career. A database architect can earn an average of $165,383 per year.

Machine Learning Engineer

A machine learning engineer experiments with various machine learning models and algorithms, fine-tuning them for specific tasks like image recognition, natural language processing, or predictive analytics. Machine learning engineers also collaborate closely with data scientists and analysts to understand the requirements and limitations of data and translate these insights into solutions. 

General Requirements

As a rule of thumb, machine learning engineers must be proficient in programming languages like Python or Java, and be familiar with machine learning frameworks like TensorFlow or PyTorch. To successfully pursue this career, you can either choose to undergo a degree or enroll in courses and follow a self-study approach.

Average Salary

Depending heavily on the company’s size, machine learning engineers can earn between $125K and $187K per year, one of the highest-paying AI careers.

Quantitative Analyst

Qualitative analysts are essential for financial institutions, where they apply mathematical and statistical methods to analyze financial markets and assess risks. They are the brains behind complex models that predict market trends, evaluate investment strategies, and assist in making informed financial decisions. 

They often deal with derivatives pricing, algorithmic trading, and risk management strategies, requiring a deep understanding of both finance and mathematics.

General Requirements

This data science role demands strong analytical skills, proficiency in mathematics and statistics, and a good grasp of financial theory. It always helps if you come from a finance-related background. 

Average Salary

A quantitative analyst earns an average of $173,307 per year.

Data Mining Specialist

A data mining specialist uses their statistics and machine learning expertise to reveal patterns and insights that can solve problems. They swift through huge amounts of data, applying algorithms and data mining techniques to identify correlations and anomalies. In addition to these, data mining specialists are also essential for organizations to predict future trends and behaviors.

General Requirements

If you want to land a career in data mining, you should possess a degree or have a solid background in computer science, statistics, or a related field. 

Average Salary

Data mining specialists earn $109,023 per year.

Data Visualisation Engineer

Data visualisation engineers specialize in transforming data into visually appealing graphical representations, much like a data storyteller. A big part of their day involves working with data analysts and business teams to understand the data’s context. 

General Requirements

Data visualization engineers need a strong foundation in data analysis and be proficient in programming languages often used in data visualization, such as JavaScript, Python, or R. A valuable addition to their already-existing experience is a bit of expertise in design principles to allow them to create visualizations.

Average Salary

The average annual pay of a data visualization engineer is $103,031.

Resources To Find Data Science Jobs

The key to finding a good data science job is knowing where to look without procrastinating. To make sure you leverage the right platforms, read on.

Job Boards

When hunting for data science jobs, both niche job boards and general ones can be treasure troves of opportunity. 

Niche boards are created specifically for data science and related fields, offering listings that cut through the noise of broader job markets. Meanwhile, general job boards can have hidden gems and opportunities.

Online Communities

Spend time on platforms like Slack, Discord, GitHub, or IndieHackers, as they are a space to share knowledge, collaborate on projects, and find job openings posted by community members.

Network And LinkedIn

Don’t forget about socials like LinkedIn or Twitter. The LinkedIn Jobs section, in particular, is a useful resource, offering a wide range of opportunities and the ability to directly reach out to hiring managers or apply for positions. Just make sure not to apply through the “Easy Apply” options, as you’ll be competing with thousands of applicants who bring nothing unique to the table.

FAQs about Data Science Careers

We answer your most frequently asked questions.

Do I Need A Degree For Data Science?

A degree is not a set-in-stone requirement to become a data scientist. It’s true many data scientists hold a BA’s or MA’s degree, but these just provide foundational knowledge. It’s up to you to pursue further education through courses or bootcamps or work on projects that enhance your expertise. What matters most is your ability to demonstrate proficiency in data science concepts and tools.

Does Data Science Need Coding?

Yes. Coding is essential for data manipulation and analysis, especially knowledge of programming languages like Python and R.

Is Data Science A Lot Of Math?

It depends on the career you want to pursue. Data science involves quite a lot of math, particularly in areas like statistics, probability, and linear algebra.

What Skills Do You Need To Land an Entry-Level Data Science Position?

To land an entry-level job in data science, you should be proficient in several areas. As mentioned above, knowledge of programming languages is essential, and you should also have a good understanding of statistical analysis and machine learning. Soft skills are equally valuable, so make sure you’re acing problem-solving, critical thinking, and effective communication.

Since you’re here…Are you interested in this career track? Investigate with our free guide to what a data professional actually does. When you’re ready to build a CV that will make hiring managers melt, join our Data Science Bootcamp which will help you land a job or your tuition back!

About Leah Davidson

A graduate of the Wharton School of Business, Leah is a social entrepreneur and strategist working at fast-growing technology companies. Her work focuses on innovative, technology-driven solutions to climate change, education, and economic development.