Bayes Theorem and Probability
When we think of probability we often think of flipping coins, rolling dice and roulette wheels. Maybe it’s this association with gambling that leads us to think statistics has mystical powers. As soon as statisticians start talking, they mumble strange things about “Bayes Theorem”, “p-values” and “chi-squared”. Uttering these words, they get excited or dismayed as though these were signs from beyond.
A more useful understanding of probability is probability as extension of logic. Classical logic forces us to constrain our beliefs to absolutes. Something can either be true or false, but we have no way of expressing uncertainty. Probability allows us to extend the rules of logic to reason about an uncertain world.
Probability as Logic
The mapping from Boolean logic to probability is remarkably simple. In Boolean logic we have true and false values. In probability we have a continuous range of values from 0 to 1. The extremes of 1 and 0 map to true and false in Boolean logic. All the other possible values express different beliefs between absolute truth and falsity.
Boolean logic has three operations that are used to combine true and false values: and, or and not. We can quite easily map these logical rules to probabilistic rules.
“A or B” is the sum of two probabilities, P(A)+P(B).
“A and B” is the product of two probabilities, P(A)⋅P(B).
“not A” is just (1-P(A)).
Given these simple rules we can use probability just like we do traditional logic. We all know that classical logic often fails to explain everyday common sense. This is because the world is full of uncertainties. It turns out that probabilistic logic is just the formalization of how we often reason about everyday problems. As an example let’s take a look at a scenario that reflects everyday reasoning. We’ll see that probability allows us to model the way we think quite well.
Reasoning about Bacon
It’s Friday evening and you think to yourself “It’s been a long week, I’ll make myself a nice breakfast in the morning”. You know you have eggs, because you bought some yesterday. But you don’t remember if you have bacon. It’s 9pm and you also don’t remember if the grocery store is still open on Friday that late.
The question is “should I go to the grocery store?” We can model the answer to this question with the variable Store. There are two major facts we’d like to know to decide what to do: “Do I have bacon?” (we’ll name this fact Bacon) and “is the grocery store open?” (this fact will be Open).
Traditional Logic at Home
Suppose you’re at home and you think to yourself “Hmmm… I wonder if we need more bacon?”. You can look in the fridge and see that “no, sadly there is no bacon”. You can also easily look up the grocery store’s phone number and call them. You find out that the grocery store is open!
Now we have a nice traditional logic problem:
Bacon = False Open = True
(not Bacon) AND (Open) => Store
(not False) AND (True) => True
All the facts we need to solve our logic puzzle are right before us. What an easy and simple world! But it’s Friday night, you’re out with your friend getting drinks to celebrate a busy week! You can’t look in your fridge and it’s too loud to call the grocery store. Now we have a probabilistic logic problem!
Logic of Uncertainty
To reason about bacon in a bar we need to use probabilistic logic and Bayes Theorem. You obviously can’t check your refrigerator. But you do remember shopping earlier in the week to get eggs. Sadly, you also remember thinking: “I probably don’t need bacon this week”. Also you’re pretty sure that you went grocery shopping last Monday at 9:30pm so it seems likely that the grocery store is still open.
You can simply assign some values to our logic based on our certainty in these facts. Since we remember that we foolishly decided to not get bacon, it’s very unlikely that you have any.
You don’t assign P(Bacon)=0 because you really don’t remember if you did change your mind later. It is possible you got bacon, but very unlikely.
You definitely remember it was late when you went shopping, probably around 9:30. It could have been 8:30, and so you are not completely sure. We can model this uncertainty as well:
The probabilistic equivalent of our logical statement from before is:
Wow 0.95 certainty! That’s like having a p-value of < 0.05! Surely we’ll go to the store!
Except it’s 9pm on a Friday and your friend says “I’ll buy the next round”. It would be really sad to go to the store to get bacon, only to find either it was closed or you already had bacon at home. You decide that even though the probability is reasonably high that you should go to the store, the little uncertainty makes it not worth your time.
One of the benefits of living in an uncertain world is that we can assign a value to our knowledge. Sometimes being just a little more sure than not makes something worth it. Other times taking a huge risk requires near-absolute certainty. Why risk certain beer for uncertain bacon?
Probabilistic Logic is Common Sense
Ignoring the mathematics, the probabilistic example is much closer to how with think everyday. We have some data, often in the form of beliefs and memories, then we use that to make decisions. Decisions often have a cost. We weigh the potential risks and rewards of actions with our certainty in the results.
Even when we make “stupid” decisions we typically are thinking probabilistically with a modified version of Bayes Theorem. The trouble is we usually don’t do the math. Even with fuzzy guesses there are many cases where we would be surprised with how uncertain we really are.
This is the key feature that makes probabilistic thinking in data science different from using Bayes Theorem in everyday life. In data science we do have numbers, often backed by data. Even in the cases when we don’t have data we can use quantitative values to model what we know. If our lead designer is certain a new page design will improve the number of signups we can model that opinion. Because everything has a value we can see how different information and different beliefs affect our conclusions.
Bayesian thinking is a better way to model how to make decisions on a day-to-day basis. It can help you win poker hands, and help you become a better thinker. We just have to learn this powerful new tool and how it works with existing data science methods to apply it.
Back to the Bar
Your friend notices the look on your face and asks ” You seem really stressed! What’s wrong? “. You reply that you’ve been thinking about this probability problem. You explain the bacon and all your facts. Finally you tell your friend that you’re not going because why risk it when you’re both having fun. Your friend laughs and exclaims “Yeah, but that grocery store is open until midnight on Friday, I’m nearly positive.” Now we have a different model of the P(Store). P(Bacon) remains the same but P(Open) has gone way up:
Not only are you more certain, but cost of going has decreased! You’re definitely going to get bacon and have that beer! Hooray for Bayes Theorem!
New information from our friend has convinced us that we should go to the store. In the first example we saw how probability can model common sense. In this case we get some insight into how common sense can influence how we should think when using probability to make choices.
Probability as Reasoning
While probability can help us understand reasoning, reasoning should inform how we think about probability. The heart of statistics, especially Bayesian statistics, is the formalization of natural reasoning to understand the data we have.
This new trend of thinking impacts a debate nearly every expert in statistics is having. Each one has an opinion about the pros and cons of p-values. Andrew Gelman, the 538 blog and even John Oliver have all weighed in on the issue. But at it’s heart the issue is not about p-values. The important thing is that no single test, or statistical values is a substitute for reasoning–not even Bayes Theorem.
When we test a hypothesis what we are doing is making an argument. In our bar scenario we are arguing about whether or not to go to the store. It’s easy to see how by modifying our risk and increasing our certainty our friend can convince us to stay for a beer and get bacon. Far too often, we don’t hold the same standards to scientific publications, or our own hypothesis tests in our data science work.
A p-value should never convince you that something is true. Changing a p-value to Bayes’ Factor or a Posterior probability shouldn’t either. The entire argument presented by a combination of probabilistic reasoning and facts should argue the case to you. If you aren’t convinced then more data is needed. Whether at work or in a paper, statistics is always about making an argument about the way the world works.