Victorian Crash Statistics Redux
Since the publication of my analysis into Victorian crash data, I have been made aware of a more comprehensive crash data set, which includes vehicle information. However this always struck me as useless without a means to ‘baseline’ this information — what didn’t occur to me was that I could just email VicRoads and ask them for anonymised vehicle registration information. And less than 24 hours later, they emailed me an Excel spreadsheet full of all but the most obscure makes and models on Victorian roads. So, I present Victorian Crash Statistics: Redux.
Introduction
I ride bikes, and for the most part like it. The major downside to it is interacting with cars and their ignorant drivers. Over my riding history, I have noticed that some cars seem to be higher risk to be near than others. This post will explore whether an automobile-profiling approach really is a viable harm-reduction strategy, or if all cars are essentially created equal. NB: I look mainly at the incidence of incidents, and not the severity, which though catalogued in grades 1–5, falls into a classic government data publication trap: there is no key for these 1–5 grades.
About the Incidents
This is a large data set, spanning all accidents reported to or attended by police from 2006-01-01 to 2017-06-30. In all there are 157,285 unique incidents here, of which 16,183 involved a bicycle. 337 involved more than one bike, and 1,106 involved only bikes.
Highest Incidence, by Model and Year
The table below gives the weighted top 10 (occurrence/registration volume) for all vehicles in the dataset. Given that older models would have more time to cause crashes, I am very surprised to see a 2012 and a 2009 model of the same car rank so highly.
VEHICLE_YEAR_MANUF | VEHICLE_MODEL | COUNT | WEIGHTED_OCCURRENCE |
2000 | BERLINA | 107 | 4.4583 |
2001 | BERLINA | 129 | 3.1463 |
2003 | BERLINA | 94 | 3.1333 |
2012 | RANGER | 67 | 2.913 |
2009 | RANGER | 63 | 2.7391 |
2002 | BERLINA | 98 | 2.5128 |
1996 | CHEROKEE | 50 | 2.2727 |
2008 | RANGER | 115 | 2.2115 |
2014 | RANGER | 53 | 2.12 |
2013 | RANGER | 59 | 2.1071 |
Risks for Cyclists
This table describes the highest-risk vehicles for cyclists, based on our crash data. Trucks feature much more heavily here, which is of particular concern, given that car-truck incidents are much safer than cyclist-truck. Note that I have not given greater weighting for the number of incidents; if a truck driver cleans up four cyclists in a fit of lane-change-rule ignorance, that’s given a higher risk rating than a regularly negligent driver who has hit three bikes in three separate incidents.
Year | Make | Model | total_cyclists | volume | risk_rating |
1997 | HYNDAI | FX | 4 | 21 | 0.1905 |
2003 | SCANIA | L94UB | 4 | 26 | 0.1538 |
2006 | SCANIA | L94UB | 3 | 24 | 0.125 |
2010 | MAZDA | 3B | 4 | 32 | 0.125 |
2005 | MITSUB | BE649J | 3 | 25 | 0.12 |
2003 | STERLI | LT7500 | 2 | 21 | 0.0952 |
1996 | SSANGY | MUSSO | 2 | 21 | 0.0952 |
1999 | MERC B | 0405NH | 3 | 32 | 0.0938 |
1994 | MAZDA | MPV | 2 | 22 | 0.0909 |
1991 | HOLDEN | EXEC | 2 | 22 | 0.0909 |
The following table removes the years of vehicles, for an ‘all time’ result
rank | Make | Model | total_cyclists | volume | risk_factor |
1 | MERCEDES BENZ | 0405NH | 3 | 32 | 9.375 |
2 | MERCEDES BENZ | A160A | 2 | 22 | 9.0909 |
3 | ISUZU | FSD700 | 2 | 24 | 8.3333 |
4 | FORD | EA | 2 | 34 | 5.8824 |
5 | SCANIA | L94UB | 11 | 202 | 5.4455 |
6 | MITSUBISHI | BE649J | 6 | 115 | 5.2174 |
7 | MERCEDES BENZ | VARIO | 1 | 20 | 5 |
8 | HOLDEN | 8VK | 1 | 20 | 5 |
9 | BMW | 3SER92 | 1 | 21 | 4.7619 |
10 | PEUG | 20601B | 1 | 21 | 4.7619 |
No Trucks Allowed
Ok, so trucks are pretty dangerous for bikes. But what are the kinds of cars to look out for?
rank | Make | total_cyclists | volume | risk_factor |
1 | DAEWOO | 70 | 5591 | 1.252 |
2 | SMART | 3 | 291 | 1.0309 |
3 | STERLING | 4 | 491 | 0.8147 |
4 | DAIHATSU | 24 | 4479 | 0.5358 |
5 | HINO | 43 | 10297 | 0.4176 |
6 | CITROËN | 21 | 5083 | 0.4131 |
7 | ALFA ROMEO | 24 | 6051 | 0.3966 |
8 | VOLVO | 106 | 28226 | 0.3755 |
9 | FORD | 2032 | 553006 | 0.3674 |
10 | CHRYSLER | 39 | 10645 | 0.3664 |
The results appear to be heavily skewed towards unusual/less common cars. In particular, I am surprised to see the Smart car rated so poorly!
Conclusion
It looks like my ‘gut feeling’ of avoiding every single twin-cab I see on the road isn’t held up by the data (unless every cyclist acts like me, and that’s the reason nobody is getting hit).
Bonus: Holden vs Ford
Holdens have an accumulated number of 1,948 cyclist-related incidents for 656,930 registered cars; Fords, 2,042 for 553,006. But should we accept the hypothesis that these are significantly different proportions from the same population? In MATLAB:
x1 = 1948; n1 = 656930; x2 = 2042; n2 = 553006;
p1 = x1/n1; p2 = x2/n2;
diff = p1 - p2;
p = (x1+x2)/(n1+n2);
se = sqrt( p * (1-p) * (1/n1 + 1/n2));
z = diff/se;
p_val = normcdf(z)*2
p_val = 3.6335e-12
That is, p << 0.05, and we can reject the idea that there’s no difference between the number of Holden and Ford crashes with cyclists. Just as I suspected, Ford drivers.