ITNEXT

ITNEXT is a platform for IT developers & software engineers to share knowledge, connect, collaborate, learn and experience next-gen technologies.

Follow publication

Using Matplotlib to Plot a Live Graph of Benford’s Law in Python

Introduction

After publishing my previous article on calculating the 1,000,000th Fibonacci number, I thought about how I could use the sequence to show Benford’s Law, and how I could create a graph in Python (as opposed to Excel). Now before we get started, I had never used matplotlib before writing this article, so this tutorial is aimed at total beginners like me. But even if you know what you’re doing, please do stick around, you might learn something new.

Here’s a quick list of sections we will cover:

  1. What is Benford’s Law?
  2. Plotting a static frequency graph
  3. Plotting a live graph

I also highly recommend you reading my previous article here since we will be using some Fibonacci generating functions today.

What is Benford’s Law?

According to Brilliant, Benford’s law is an:

observation about the leading digits of the numbers found in real-world data sets. Intuitively, one might expect that the leading digits of these numbers would be uniformly distributed so that each of the digits from 1 to 9 is equally likely to appear. In fact, it is often the case that 1 occurs more frequently than 2, 2 more frequently than 3, and so on.

This distribution occurs very often in real-world data sets about almost anything. This also means that there are multiple uses for Benford’s law with one being the ability to detect potential fraud. This is because it is hard for us (people) to create data which satisfies the law since we tend to distribute our numbers evenly across pieces of data.

Surprisingly, it is very simple and easy to calculate the actual distribution for the digits ranging from 1 to 9, for this we can use the following probability formula:

The formula to calculate the probabilities for some digit, between 1 and 9, occurring at the start of a number
The formula to calculate the probabilities for some digit, between 1 and 9, occurring at the start of a number

This equation shows us that for a digit, d, the probability of that occurring in a given dataset is equal to the log of 1 + 1/d to the base 10. We can also further prove that the sum of the probabilities, where d ranges from 1 to 9, sums up to 1:

The proof that the probabilities of the digits 1 to 9 sum up to 1
The proof that the probabilities of the digits 1 to 9 sum up to 1

We can also calculate the different frequency probabilities we expect to see:

1 : 0.3010
2 : 0.1761
3 : 0.1249
4 : 0.0969
5 : 0.0792
6 : 0.0669
7 : 0.0580
8 : 0.0512
9 : 0.0457
(* Multiply these values by 100 to get the probabilities in percentage.)

Now we know what Benford’s law is, let’s start using Python to plot some graphs!

Plotting a static frequency graph

To plot a graph using Python, we first need to install an external library called matplotlib. Fortunately, this is very simple and all we need to do is open up a command line and type in: pip install matplotlib .

Then you can import the pyplot function from matplotlib and create two variables which will be later used, one to colour each bar and another to keep count of the frequency of the first digits in our Fibonacci numbers:

The next thing is to create a function which can return all the numbers in the Fibonacci sequence up to a certain number:

Here I use an iterative solution to append each value to a list, which is then returned at the end of the function. If you want more information as to why I am using an iterative solution over Binet’s Formula, then check out my other medium post here.

Then in the main part of our program, we can call the function and pass the integer 50,000 into it, this will return to us a list of the first 50,000 Fibonacci numbers. We then iterate through that list, and during each iteration we can convert the number into a string, this is so we can access the first character in the string using the slicing method. Then character value in the letters dictionary is incremented. Your letters dictionary should now look like this (you can see it by printing it out):

letters = {    '1': 15052,    '2': 8804,    '3': 6248,    '4': 4844,    '5': 3959,    '6': 3349,    '7': 2898,    '8': 2558,    '9': 2288}

Make sure you store this somewhere as you don’t want to be waiting and generating these again and again each time you run your program.

Next to plot our bar chart, we need to get the values which we will assign to the x and y values, these will be the digits 1 to 9 and the frequency of the numbers, respectively:

For the x axis I used the keys function on the letters dictionary, this will create a list of the different numbers. The y axis might look a bit different, but do not worry it is not too tricky to understand, it’s just a bit of list comprehension. The different values of the letters dictionary are iterated through, and during each iteration the value divided by 50,000 (because we have 50,000 Fibonacci numbers), to get the frequency. This value is then rounded to 4 decimal places and added to a list, which is all returned inside one statement.

Then we can set the title for the graph, as well as the labels for the x and y labels. I also set the range for the y axis, which will range from 0 to 0.35, which is perfect for us:

Here I create the bar chart by supplying in the x and y axis values. I also assign the colour argument with the list of colours I created earlier. This is entirely optional, but I think it looks quite nice:

There is also a quick for loop, which iterates through the 9 different bars where the height of the current bar is fetched and annotated just above the actual bar itself.

Finally we can get a graph where we can clearly see Benford’s law visually.

A static graph to show Benford’s law’s distribution visually
A static graph to show Benford’s law’s distribution visually

We can see from the values on the graph, they correspond perfectly to our predicted values. This proves that the Fibonacci series actually does follow Benford’s law, just as we expected to since it is a natural occurring sequence.

Plotting a live graph

Now if we wanted to see how this graph changes as the Fibonacci numbers increase, then we need to plot a graph which also updates in real time.

First of all, we import the required libraries, the count function from the itertools library will keep hold of which number we are on currently, and the FuncAnimation function from matplotlib allows us to create animations on our graph and update it. We set the count to start at 1 and also create the dictionary of letters and list of colours as before.

We can then use this iterative function to return the nth Fibonacci number, instead of calculating them all at once we will calculate them as we go along.

Next, we need a function which can be called to update our graph every time:

In this function the next value of the counter is fetched, and the nth Fibonacci number is calculated, after that is done the first digit is used to update the letters dictionary. The rest of the code is almost identical, but there is one extra line of code which clears the previous graph. This is because when we draw a new graph it will draw over the previous one, which is something we do not want. Notice that there is one argument which is required for this function to work, but this is automatically passed through by matplotlib so we do not need to think about this at all.

Finally, we can create the animation using the FuncAnimation function which we imported. This takes in a few arguments with the first being the current ‘figure’ which we are using. Do not worry too much about this, we can use the get current figure function to, well, return the current figure. Next, we supply our animate function with an interval of 5, this is in milliseconds so make sure not to set this value too high, but also not too low as it might become quite sluggish.

Hopefully if everything went to plan, executing the code will create a graph which shows how the frequency graph changes as more and more Fibonacci numbers are calculated. Here’s an example (please note that the GIF might take a few moments to load, you can watch the video here as well):

A live graph to show how the a frequency graph changes as more Fibonacci numbers are used in the calculations
A live graph to show how the a frequency graph changes as more Fibonacci numbers are used in the calculations

Final thoughts

As you observe the live graph, it becomes much more ‘accurate’ and settles down as the amount of data we have and can use increases. This is the same in the real scientific world, and why your teachers at school might ask you to repeat an experiment multiple times to obtain better data and remove anomalies.

I hope you enjoyed this article. If you have any questions or feedback, please comment below and I will respond to it as soon as I can.

Thank you for reading! 💖

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Published in ITNEXT

ITNEXT is a platform for IT developers & software engineers to share knowledge, connect, collaborate, learn and experience next-gen technologies.

Written by Kush

20 year old self-taught Python dev | Been programming for 7 years | Love making weekend projects | U.K.

No responses yet

Write a response