In general it is good practice to look at a display of the data before you do statistical analysis of it. This is especially true for bivariate data. In the univariate case that main concerns are skewness and outliers. In the bivariate case we need to identify outliers and distributions as well as the relationship between the two variables.
The graph most suitable for such an analysis is the same type of graph that was used in introductory algebra to plot points on the plane. Recall that given an ordered pair of numbers (i.e. two numbers where the order is important.) the location of the point is obtained by locating the first value on the horizontal axis and the second value on the vertical axis. The graph of all the data points is called a scatter plot. In this lesson we will learn how to get scatter plot of a data set. using excel, and how to interpret that graph.
After completing this lesson you should be able to:
The independent variable in a bivariate data set is the variable whose value may have been assigned by the researcher. An alternative terms used for this variable are predictor, exploratory variable, or factor.
The response variable in a bivariate data set is the one whose value is strictly observed and which may have been directly affected by the independent variable. The focus of the analysis in a bivariate study is usually on this variable. An alternative term used for the response is the dependent variable.
The scatter plot is a two dimensional plot using the Cartesian plane (xy-plane). Each variable is assigned to one axis, and every observation (having two values associated with it) is represented as a point on the plane. Typically the horizontal axis (x-axis) is reserved for the predictor (or factor) and the vertical axis (y-axis) is for the response.
Deciding which Variable is the Response
Given a bivariate data set, it is not always easy to determine which variable is the response and which is the independent one. In general it is easier to identify the latter, especially in an experimental study. Any variable whose values are chosen by the researcher is automatically the independent one. If you have an observational study try determine which factor effects the other. The response variable is the one that is affect by the predictor.
If your variables are "weight of car" and "gas mileage", the "gas mileage" is your response variable since it is affected the weight of car.
If your variables are "amount of daily exercise" and "body mass index" your response variable is probably the "body mass index" since exercise can be used to control weight.
On the other hand you observe the relationship between "unemployment rate" and the "crime rate" of different communities then it you may argue either way, and there is no one correct answer.
Interpreting a Scatter Plot
Once you have a scatter plot there are certain characteristics of which you should take note. This information will be important later in this unit when you will analyze the data. Use the following five questions as a guide to make note of the important features of the graph.
The type of analysis you do to the data depends on the answers to these question. In the lessons that follow, we will fit a line through the points. As long as you have two points and their x values are not the same it is mathematical possible to find the line of best fit through those points. However, to get meaningful information from the analysis you must use the scatter plot.
The scatter plots represented has unemployment in Buncombe County as the independent variable and the divorce rate as the response. Each point represents the average for a year, data collected from 1980 to 2005 (NC office of State Budget and Management).
This particular scatter plot is shows that there is not a strong relationship between those two variables. The above questions questions can all be answered in the negative.
Comparing Means with a Scatter Plot
Although excel is not set up to have the independent variable be a category, it is still possible to do this. Assign a number to each category, and treat your the column of 1's, 2's etc. as your independent variable.
Assume you wish to compare unemployment rate of urban vs. rural counties. You can assign the value of 1 to any rural county (a county with fewer than 150 people per square mile) and 2 to any county with an urban area. Then proceed using your column of 1's and 2's as a numeric variable. The results are given in the image below. (NC office of State Budget and Management).
According to the graph unemployment rates in urban centers tend to be lower and not as varied as those in the counties.
Plot bivariate data is tedious unless you use technology, and Excel offers a quick and convenient way to get a scatter plot. Assuming you have the data stored in two columns, you can get a scatter plot in five simple steps.
If the two columns of data are not conveniently next to each other, you can enter the two variables separately. To do so, after you have chosen the scatter plot sub-type, you need to go to the "series" tab enter the information of the columns in the appropriate box.
For each of the following pairs of variables decide which variable should be the response variable and which one the predictor.