|
|
Course 1, Unit 2 - Patterns in Data
Overview
Patterns in Data is an introduction to the analysis of univariate
(one variable) data. Throughout this unit students will be developing
tools and strategies that will help them make sense of data and communicate
their conclusions. The focus is on displaying data (to observe shape,
center, and variability/spread) and then computing and interpreting summary
statistics such as measures of center (mean, median, and mode) and measures
of variability (range, interquartile range, and standard deviation).
Key Ideas from Course 1, Unit 2
-
Dot plot (or number line plot): A way of organizing one-variable
data. Dot plots are particularly useful when the data set is small
and/or spread out. Shown below is the dot plot for the lengths of
100 male bears. (See page 76.)

-
Histogram: A way of organizing one-variable data. For example,
in the histogram below of test scores, 3 students have a score
of at least 10 but less than 20, 7 students have a score of
at least 20 but less than 30, and so on. (See page 78.)

-
Relative frequency histogram: This type of histogram has
the proportion or percentage that fall into each bar on the vertical
axis rather than the frequency or count. This plot is particularly
useful if the sample is very large. (See page 80.)

-
Shape of the distribution: Distributions of one-variable
data can be symmetric or skewed. (See page 77.)

-
Center: We can use mean, median or mode for the measure of
center, depending on which is most appropriate. Mean = (sum
of the data values) / (number of data values). Median = middle
data value in the ordered list. Mode = most frequently
occurring data value. (See pages 84 and 94.)
-
Percentiles: Percentiles are often used to measure the position
of a data value in the distribution. Percentiles are typically used
only when there are a very large or infinite number of possible values,
such as with heights. So, for example, look at the growth chart for
girls on page 105. For this chart, you can see that a 15-year-old
girl who weighs about 105 lbs would be at the 25th percentile.
This means that about 75% of the girls her age weigh more than 105 lbs. (See
pages 103-105.)
-
Five-number summary (minimum, 1st quartile, median, 3rd quartile,
maximum): Using our example, we can determine the five-number
summary as follows. Put the values in order and count to the middle;
this is the median. The median is 31. Count to the middle of the
first (lower) 50% of the data; this data value is the first quartile,
Q1. Q1 is 23. Count to the middle of the second (upper) 50% of
the data; this data value is the third quartile, Q3. Q3 is 40. (See
pages 106-108.)
-
Box plot: Use the five-number summary to make a box plot.
You need a scale on the horizontal axis to make sense of the graph.
The box contains the middle 50% of the data values, starting at the
first quartile and ending at the third quartile. Interquartile
range = Q3 - Q1 = 40 - 23 = 17. (See
pages 108-112.)

-
Outliers: Data values that are far from, and separated from
the rest of the distribution. If the data are represented by a box
plot, then any value that is more than 1.5 times interquartile range
(see above) above Q3 or below Q1 will be represented as a dot, separated
from the other data. (See pages 113-116.)
-
Spread of a distribution: Spread (or variability) could be
measured by the range, by the IQR, or by the standard deviation (see
pages 116-123).
|