KnowledgeCity

Data Visualization and Expression/Narrative

In these lessons, we’ll focus on the relationship between plots and the underlying data sets used to create them.

In these lessons, we’ll focus on the relationship between plots and the underlying data sets used to create them. Not all data is equal, and some plots work better than others for certain situations. We’ll explore several data sets conducive to histograms, scatter plots, and heat maps. For histograms, we use data sets that can be considered a distribution—such a thousand dice rolls and the corresponding results—which allow us to see that distribution. 

We’ll show you how to combine random numbers, Numpy, and matplotlib.pyplot to create data sets with a distribution by drawing data from a NASA Mars mission to plot a scatter diagram. NASA has a satellite that orbits Mars and takes pictures of the Martian surface; and in these lessons, we’ll explore that data set to plot out areas of the Martian surface. You’ll also explore how heat maps work, including the relationship between the different variables of a heat map, when they make sense, and when they don’t.

Learning Objectives:

  • Understand the relationship between the underlying data and its appropriate plot
  • Assess practices with Numpy, Pandas, and large data sets
  • Describe the relationship between data sets and typical ways to express data sets

Author: Bill Hood

Duration: 19m · 3 lessons
Level: Intermediate
Language: English

Skills you’ll gain

Data VisualizationInformation VisualizationScatter PlotsStatistical GraphicsVisual SimulationsVisualization

What You'll Learn

  • Understand the relationship between an underlying data set and the plot type best suited to express it
  • Create distribution data sets by combining random numbers, Numpy, and matplotlib.pyplot
  • Plot histograms for data sets that represent a distribution, such as the results of a thousand dice rolls
  • Build scatter diagrams from a NASA Mars mission data set to plot areas of the Martian surface
  • Construct heat map diagrams and assess the relationships between their variables
  • Apply practices with Numpy, Pandas, and large data sets

Key Takeaways

  • Not all data is equal, and some plots work better than others for certain situations.
  • Histograms suit data sets that can be considered a distribution, such as a thousand dice rolls and their results.
  • Scatter diagrams can be plotted from real data, such as a NASA satellite's pictures of the Martian surface.
  • Heat maps depend on the relationship between their variables, which determines when they make sense and when they don't.
  • Random numbers, Numpy, and matplotlib.pyplot can be combined to create data sets with a distribution.

Frequently Asked Questions

What does this course cover?

It focuses on the relationship between plots and the underlying data sets used to create them, exploring histograms, scatter plots, and heat maps and when each is appropriate.

What plot types will I learn to create?

You'll learn to plot with histograms, scatter diagrams, and heat map diagrams.

What tools and libraries are used in this course?

The lessons use random numbers, Numpy, Pandas, and matplotlib.pyplot to work with data sets and create plots.

What data sets are used as examples?

Examples include a distribution such as a thousand dice rolls and their results, and a NASA Mars mission data set from a satellite that photographs the Martian surface.

What skills will I gain?

You'll build skills in data visualization, information visualization, scatter plots, statistical graphics, visual simulations, and visualization.

Transcript

Show transcript (free preview lesson)

Transcript of the free preview lesson. Remaining lessons unlock with the full course.

In this lesson we're going to talk about histograms, what they are and when you use them. Histogram shows frequency distribution and what I mean by that is how often does something happen in relation to other events. For example, let's use a dice roll. There's only six possible answers. One through six, we roll the dice. We're going to get one through six. We roll the dice 1, 000 times. What would we expect in terms of those dice rolls. And let's just go ahead and simulate that. Let's ask that question. Let's frame up that that's when a histogram is actually the right answer, when you're trying to see what the data looks like. And in this case we want to see what does a dice roll look like. So we're going to use a histogram. We're going to jump right in. These three imports I think we're all familiar with at this point, So we won't spend too much time talking about that. We're going to create a list called rand with the number case r, set it to blank. And then we're going to create six bins, and we'll call them bins for lack of a better name. And then in a range from 0 to 100 we are going to create X which is going to be a random number between 1 and 6 and this is the simulated dice roll right that's what this is right here And then we're going to append whatever those results are to our list. And then just going to plot that out and see what it looks like. Let's start off with just 100 records and let's run that. And what we'll notice is, in this case, here's our dice rolls 1, 2, 3, 4, 5, and 6. And then this is the frequency. And we rolled a 1 20 times. We rolled a 2 15 times. We rolled a 3, looks like a little more than 17 times probably. We rolled a 4 15 times. We run it again, we'll get a different answer. But what we'll begin to notice is that there's definitely a trend And that trend is that this is a uniform distribution. And what I mean by that is when you're rolling a dice, the chances of you getting a 1 are exactly the same as you getting a 6, which are exactly the same as you getting a 4, and so on. So what we would expect this data to look like as we drive up the count is for it become obviously uniform distribution and you can start to see it now with 175 or so as the maximum maybe 180 is the maximum but you can see here where the dice is definitely coming out and it certainly looks uniform, i.e. Equal probability for any answer. We're going to run up now 10, 000, take one last look, and you can really see we're driving it really to this uniform look. Alright, so we can see this is a uniform distribution, this is why you would use a histogram, this is the type of information you're trying to find out when you're using a histogram. Now let's step this up one notch and look at what happens when you have two dice rolls. So now we're going to continue with the same approach except we're going to roll the dice once and then a second time we're going to add those dice together and we're going to append that to our random number. Now why are we doing that? Because that's more like a dice roll. Now is a dice roll with two dice a uniform distribution? No, it should not be. And the reason is because with two dice, there's a number of ways you can roll four and five and six and seven and eight, hence some of the games that are played around that. But with two dice, the options are from 1 to 12. And what we'll find is that the vast majority of these are going to come out in the middle of that, in the distribution area of 6, 7, and 8 and so on. We can already start to see that form and we can certainly tell this is not a uniform distribution. So let's run it up now one more time and we can see that this is definitely got shape to it. What I mean by shape is that here's a 2, alright, that's the lowest possibility, It happened a hundred times. Here's an 8, which is a much higher probability. Happens 300 times. So the 8 clearly is more likely to happen than a 2. And the reason is because there's only one way to get a 2, a 1 and a 1. There's lots of ways to get an 8. Next lesson is scatter diagrams.

Learn on the Go

Take your learning anywhere — the KnowledgeCity mobile app lets you watch lessons on the go.