For any job in any industry, the interview process can induce major anxiety. That’s especially true for a data analyst interview, when your communication skills and overall fit will be judged by people whose jobs literally are to analyze. Yeesh.

The best way to combat the pre-interview jitters is to prepare yourself. That’s why we’ve curated a list of some common data analyst interview questions—with answers.

(Many of these questions have been gathered directly from candidates for positions at specific companies, as listed on Glassdoor. We’ve called out those companies in parentheses.)

Let’s roll.

### Entry-Level Data Analyst

**1. Why do you want to be a data analyst?**

For the most part, this sort of question can serve as an icebreaker. However, sometimes, even if the interviewers don’t explicitly say it, they expect you to answer a more specific question: “Why do you want to be a data analyst for *us*?”

With these self-reflective questions, there’s not really a right answer I can offer you. There *are* wrong answers, though—red flags for which the employer is searching.

Answers that show you misunderstand the role are the main “wrong” answers here. Equally, an answer that makes you sound wishy-washy about data analysis can raise red flags.

A few things you probably want to get across include:

- You love data.
- You’ve researched the company and understand why your role as a data analyst will help it succeed.
- You more or less understand what’s expected of your role.
- You’re confident in your decision.

*Sample answer: I want to be a data analyst because data has an inherent storytelling ability that I find fascinating. I read a blog post from one of your data analysts that showed how the sale of your products has demonstrated a positive correlation with your customers’ standards of living. I want to get involved with the team that makes those insights a possibility, and share those sorts of stories. *

**2. Where do you see yourself in five years?**

This question can be a bit tricky. There are land mines all over the place. For example, you might be tempted to say you see yourself running the whole joint, but that’s obviously unwise. It demonstrates ambition and enthusiasm, but you’re all but saying you’re going to mutiny the leaders currently in charge.

You also don’t want to be baited into personalizing this question too much. It can get you off-topic very easily. They’re not interested in whether you want to get married in five years but rather in your career, and more specifically your career with the company.

And, of course, avoid suggesting that the company you’re applying to is just a pit stop or a stepping stone. In other words, don’t come off as indecisive or unreliable. Avoid saying things such as, “Well, if my band takes off, I’m hoping to tour,” or, “I’m hoping to have my own cooking show.”

Unlike with most questions, you’re going to want to keep the answer here pretty general, albeit as truthful and candid as you can without foregoing tact.

*Sample answer: Within five years, I hope to have grown with the company and to have advanced professionally toward my ultimate goal of becoming an impactful data analyst, and, eventually, **data scientist**. And, of course, I’d like to have a comfortable work-life balance and pay down my debts from college. *

**3. Describe a time when you had to persuade others. How did you get buy-in? **

The trick to this question is to demonstrate that you not only persuaded others of a decision, but that it was the *right* decision.

*Sample answer*: *As a **data analyst intern** at my last company, we didn’t really have a modern means of transferring files between coworkers. We used flash drives. It took some work, but eventually I convinced my manager to let me research file-sharing services that would work best for our team. We tried Google Drive and Dropbox, but eventually we settled on using Sharepoint drives because it integrated well with some of the software we were already using on a daily basis, especially Excel. It definitely improved productivity and minimized the wasted time searching for who had what files at what times.*

**4. How do you feel about data? (Swedish)**

This question is a measure of your enthusiasm and passion for the field; it serves as a pretty good ice breaker or an *en passant* between questions. Really about the only thing you don’t want to say is that you don’t have any sort of feeling for data.

*Sample answer: I feel that data is king. If you just think about it at a sensory level, data propels everything we do. We take sensory input such as sight, taste, sound, smell, or touch, and we convert that data into actionable insights: only we do it so fast we don’t even realize. But that’s exactly what we do. I’m just the weird type of person who stops to think about the sources of that data and wants to learn what more I can glean from data and how I can use it both more efficiently and effectively. *

### Intermediate Data Analyst

**5. Can you add 1-100 together right now? (Dealer.com)**

This question is straightforward enough. You could, theoretically, compute the solution simply by adding the numbers in sequence, like so: 1+2+3… But this is impractical and probably not what the interviewer is looking for. Fortunately, there’s a formula called a series sum. It’s the number multiplied by itself plus 1, and the resulting solution divided by 2.

*Sample answer: Thankfully, there’s a formula that can help with this: 100(100 + 1) = 10,100; 10,100 / 2 = 5,050. *

**6. What are clustered and non-clustered indexes in SQL? Explain the difference between the two. (Microsoft) **

Just like in textbooks, with digital data, indexes speed up the process of searching through a database. Here you just need to explain the difference between two different types of SQL indexes: clustered and non-clustered.

(Note: Information for the table from Jaipal Reddy.)

*Sample answer: Whereas a clustered index is physically stored on the table and is, therefore, faster to read, nonclustered are stored separately, which slows reading down. However, nonclustered indexes can be updated quicker and, unlike clustered indexes for which there can only be one per table, there can be many nonclustered indexes. *

**7. What is the difference between data mining and data profiling? (Maestro Technologies)**

Data mining is a process in which you identify patterns, anomalies, and correlations in large data sets to predict outcomes. On the other hand, data profiling lets analysts monitor and cleanse data.

*Sample answer: Whereas data mining is concerned with collecting knowledge from data, data profiling is concerned primarily with evaluating the quality of data. *

**8. How have you dealt with messy data in the past? (Two Sigma)**

Up to 80% of a data analyst’s time can be spent on cleaning data. That makes this a very important concept to understand. Even more important when you consider that, if your data is unclean and produces inaccurate insights, it could lead to costly company actions based on false information. Yikes. That could mean trouble for you.

You need to demonstrate not only that you understand the difference between messy data and clean data but also that you used that knowledge to cleanse the data. This article shows the sort of workflow you might be looking for in your response, as well as some methods for identifying inconsistent data and cleaning it.

Just as with any other question where you’re asked to describe a situation you’ve encountered in the past, it’s a good time to employ the STAR method: situation, task, action, result.

*Sample answer: A client of ours was unhappy with our staffing reports, so I needed to pore over one to see what was causing their chagrin. I was looking at some data in a spreadsheet that contained information about when our call center employees went to break, took lunch, etc., and I noticed that the time stamps were inconsistent: some had a.m., some had p.m., some didn’t have any specifications for morning or night, and worst of all, many of these employees were located in different time zones, so this needed to be made more consistent as well. *

*To solve the a.m./p.m. dilemma, I made sure all times were specified in military. This had two benefits: first, it eliminated the strings in the data and made the whole column numeric; second, it removed any need to specify morning or night as military time does this inherently. Next, I converted all times to UTC, this way all of the data was on the same time zone. This was important for the report I was working on because otherwise the data would be presented out of order and it could cause confusion for our client. Reorganizing the report’s data this way helped improve our relationship with the client, who, due to the time discrepancies, previously believed we were understaffed at specific times of day. *

### Senior Data Analyst

**9. How many X are in Y place? **

This question takes many forms, but the premise of it is quite simple. It’s asking you to work through a mathematical problem, usually figuring out the number of an item in a certain place, or figuring out how much of something could potentially be sold somewhere. Here are some real examples from Glassdoor:

- “How many piano tuners are in the city of Chicago?” (Quicken Loans)
- “How many windows are in New York City, by you estimation?” (Petco)
- “How many gas stations are there in the United States?” (Progressive)

The idea here is to put you in a situation where you can’t possibly know something off the top of your head, but to see you work through it anyway. That’s the trap, though. You don’t want to just give up and say, well, gee, I don’t know. As James Patounas, associate director and senior data analyst at Source One, puts it, “I have been asked something similar as well as asked something similar. I personally would not accept ‘you can’t really know’ as an answer; or, at least, I would not hire someone that thought this was a sufficient answer.”

He went on: “Mathematical modeling is typically an approximation of the real world. It is rarely an exact representation.”

Basically, you want to pull the data you do have, or at least can approximate, and work yourself through a solution. Let’s take the number of windows in New York City as an example for the sample answer below.

Note: Figures in this answer do not necessarily realistically reflect facts; they are approximations (there are actually 8.6 million people in NYC, according to 2017 data, for example).

*Sample answer: I believe there are about 10 million people in New York, give or take a couple million. Assuming each of them lives in a residential building, with three rooms or more, if there were one window per room, that would make approximately 30 million windows. I’m making a few different assumptions that are probably inaccurate. For instance, that everyone lives alone and that the average size of their residences is just three rooms with one window per room. Obviously, there will be a lot of variations in reality. But I think, in terms of residences, 30 million windows could be close. *

*Then you’d have to take windows for businesses, subway rail cars, and personal vehicles. If the average subway car seats 1,000 people, with 1 window per 2 seats, that’s 500 windows per car. A little more math: I’d guess there are at least enough subway cars to support the whole population of New York: so 10 million divided by 1,000 comes out to 10,000. So there are another 5 million windows for subway cars. If half of all people own their own vehicle, that’s another six windows per person, so 30 million more windows. I’d guess there are at least 100,000 businesses with windows in NYC. Let’s just say for the sake of argument there’s an average of 10 windows each. That’s another million. I’m sure there’s way more than that. *

*Overall, we’re at 66 million windows (30,000,000 x 2 + 5,000,000 + 1,000,000). All of this pretty much hinges on how close I am to the actual population of New York City. Also, there are other places to find windows, such as busses or boats. But that’s a start. *

**10. You have 10 bags of marbles with 10 marbles in each bag. All but one bag has marbles which weigh 10g each. The exception’s marbles weigh 11g each. How would you determine which bag has 11g marbles using a scale only once? (Google)**

This question would be really difficult to figure out on the spot. Fortunately, it’s a puzzle with answers all over the place online.

The identifying factor for each of these bags of marbles is weight; fortunately, we have only one different bag. Unfortunately, we only have one chance to weigh, so we couldn’t just weigh each bag individually.

Instead, we can solve the problem if we put a different number of marbles from each bag into a new bag to weigh it and reverse engineer the identity of the heavier bag.

Let’s take 1 marble from the first bag, 2 from the second bag, 3 from the third bag, and so on. This way each bag we’ve drawn from is uniquely identifiable by the number of marbles missing. I’ve used my kindergarten-level illustration skills to draw this process.

The total number of marbles in the bag can be calculated now using the series sum formula alluded to in question 5: n(n+1)/2. If we plug the numbers in, we should get 55. Now we have to multiply it by the weight of each marble, which is 10g. That means the total weight of the marbles should be 550g, in a perfect world.

But we’re not in a perfect world. One of these bags is different. Let’s say, for argument’s sake, the third bag is the one that has the heavier 11g marbles. The weights would look like this: 10, 20, 33, 40, 50, 60, 70, 80, 90, 100. If you weighed this, in total, it would add up to 553. Clearly, one of these bags has botched things up. To find out which one, we can subtract 550 from 553, getting 3. In other words, the third bag is the odd one out. The formula, then, would look like this: W – w(n(n+1)/2), where W = total weight and w = weight of each marble (except the odd ones).

Note that we’ve labeled the bags 1-10 based on the number of marbles taken from it. The difference won’t necessarily be this number, however. If the bag were more than 1g heavier or lighter, we’d have to do more math. Say, for example, the odd marbles weighed 12g instead; the difference would have been 6. This still points to the third bag because we know that the odd marbles are 2g heavier than the other marbles. If we divide 6 by 2, we get 3.

*Sample answer: You can find the heavier bag of marbles by taking a different number of marbles, up to 10, from each bag, placing them in a new bag, and weighing the result. For example, you take 1 from the first bag, 2 from the second, all the way up to the final bag, from which you’ll take all 10 marbles and place them in the new bag. If you use a series sum to find the number of marbles (or you’ve counted them as you placed them in the bag), and multiply the total number by the majority weight (10 in this instance), you can then use this number to find out where the weight “problem” is. Weigh the marbles you’ve placed into the new bag and subtract this number from the projected weight. The difference will be the bag from which you took that many marbles. This is the heavier bag. *

### Bonus Q&A With Source One’s Senior Data Analyst, James Patounas

**What would be your top interview question for prospective data analysts? How would you answer this question? **

Suppose that you were provided a flat file (Excel, CSV, etc.) to manipulate and load into a database. It contains millions of rows. Upon loading the data into the database, you are to perform an analysis, perhaps building some type of mathematical model. While you can’t ever be 100% confident that everything was processed and loaded correctly, you can do some things in order to ensure that you are reasonably confident. Describe for me what you would do.

Global assessment: Perform comparative analysis of the raw file and the loaded data by [completing the following]…

- Count the number of rows
- Count the number of columns
- Sum the numeric columns
- Check the data types (i.e., if I thought that a column was entirely filled with dates then that should persist)

Localized assessment:

- Randomly pick a few rows and manually compare
- Check the distinct elements in textual fields (i.e., if categories A, B, and C exist before, then that’s all I should see after)
- Check common transcription issues (i.e., data encoding could be different, dates are typically stored as integers past a certain date so those may be converted incorrectly, etc.)
- Check conversions if applicable (i.e., if NA is used for non-responses for numerical values then the database won’t accept it if we’re storing the data in a numerical field)

**What’s a question you were asked during your interview, and how did you answer? **

[I was asked] “What is your greatest weakness?” I struggle to walk away from an interesting problem.

**A career in data analytics is fast-paced, impactful, and constantly changing, and ****now**** is the perfect time to grow your skill set. Learn more about Springboard’s Data Analytics Career Track now.**