In many cases groups can gain an advantage by misleading people with the misuse of statistics.
Common techniques used include:
Three dimensional graphs.
Axes that do not start at zero.
Axes without scales.
Graphic images that convey a negative or positive mood.
Assumption that a correlation shows a necessary causality.
Using statistics that are not truly representative of the entire population.
Using misconceptions of mathematical concepts
For example, the following pairs of graphs show identical information but look very different. Explain why.
Exercises - misuse of statistics
A company has tried to give a visual representation of the increase in their earnings from one year to the next. Does the graph below convince you? Critically analyse the graph.
Click here for the solution
In a study conducted on a busy highway, data was collected about drivers breaking the speed limit and the colour of the car they were driving. The data were collected during a 20 minute time interval during the middle of the day, and are presented in a table and pie chart below.
Conclusions made by a novice based on the data are summarised as follows:
“People driving white cars are more likely to break the speed limit.”
“Drivers in blue and red cars are more likely to stick to the speed limit.”
A record label produces a graphic, showing their advantage in sales over their competitors. Identify at least three devices they have used to influence and mislead the readers impression.
Click here for the solution
In an effort to discredit their competition, a tour bus company prints the graph shown below. Their claim is that the competitor is losing business. Can you think of a better explanation?
Click here for the solution
To test a theory, 8 different offices were monitored for noise levels and productivity of the employees in the office. The results are graphed below.
The following statement was then made:
“If an office environment is noisy, this leads to poor productivity.”Explain the flaws in this thinking.
Click here for the solution
Summary of definitions
The mean of a data set,
, denoted by
, is the average of the data values, and is calculated as:
The median is the centre data value in a data set that has been ordered from lowest to highest
The mode is the data value that occurs most often in a data set.
The following presentation summarises what you have learnt in this chapter. Ignore the chapter number and any exercise numbers in the presentation.
Summary
Data types
Collecting data
Samples and populations
Grouping data
TallyFrequency
bins
Graphing data
Bar and compound bar graphsHistograms and frequency polygons
Pie chartsLine and broken line graphs
Summarising data
Central tendency
MeanMedian
ModeDispersion
RangeQuartiles
Inter-quartile rangePercentiles
An engineering company has designed two different types of engines for motorbikes. The two different motorbikes are tested for the time it takes (in seconds) for them to accelerate from 0 km/h to 60 km/h.
Test 1
Test 2
Test 3
Test 4
Test 5
Test 6
Test 7
Test 8
Test 9
Test 10
Average
Bike 1
1.55
1.00
0.92
0.80
1.49
0.71
1.06
0.68
0.87
1.09
Bike 2
0.9
1.0
1.1
1.0
1.0
0.9
0.9
1.0
0.9
1.1
What measure of central tendency should be used for this information?
Calculate the average you chose in the previous question for each motorbike.
Which motorbike would you choose based on this information? Take note of accuracy of the numbers from each set of tests.
A company wanted to evaluate the training programme in its factory. They gave the same task to trained and untrained employees and timed each one in seconds.
Trained
121
137
131
135
130
128
130
126
132
127
129
120
118
125
134
Untrained
135
142
126
148
145
156
152
153
149
145
144
134
139
140
142
Find the medians and quartiles for both sets of data.
Find the Interquartile Range for both sets of data.