Since the publication of my analysis into Victorian crash data, I have been made aware of a more comprehensive crash data set, which includes vehicle information. However this always struck me as useless without a means to ‘baseline’ this information — what didn’t occur to me was that I could just email VicRoads and ask them for anonymised vehicle registration information. And less than 24 hours later, they emailed me an Excel spreadsheet full of all but the most obscure makes and models on Victorian roads. So, I present Victorian Crash Statistics: Redux.

Introduction

I ride bikes, and for the most part like it. The major downside to it is interacting with cars and their ignorant drivers. Over my riding history, I have noticed that some cars seem to be higher risk to be near than others. This post will explore whether an automobile-profiling approach really is a viable harm-reduction strategy, or if all cars are essentially created equal. NB: I look mainly at the incidence of incidents, and not the severity, which though catalogued in grades 1–5, falls into a classic government data publication trap: there is no key for these 1–5 grades.

About the Incidents

This is a large data set, spanning all accidents reported to or attended by police from 2006-01-01 to 2017-06-30. In all there are 157,285 unique incidents here, of which 16,183 involved a bicycle. 337 involved more than one bike, and 1,106 involved only bikes.

Highest Incidence, by Model and Year

The table below gives the weighted top 10 (occurrence/registration volume) for all vehicles in the dataset. Given that older models would have more time to cause crashes, I am very surprised to see a 2012 and a 2009 model of the same car rank so highly.

VEHICLE_YEAR_MANUF VEHICLE_MODEL COUNT WEIGHTED_OCCURRENCE
2000 BERLINA 107 4.4583
2001 BERLINA 129 3.1463
2003 BERLINA 94 3.1333
2012 RANGER 67 2.913
2009 RANGER 63 2.7391
2002 BERLINA 98 2.5128
1996 CHEROKEE 50 2.2727
2008 RANGER 115 2.2115
2014 RANGER 53 2.12
2013 RANGER 59 2.1071

Risks for Cyclists

This table describes the highest-risk vehicles for cyclists, based on our crash data. Trucks feature much more heavily here, which is of particular concern, given that car-truck incidents are much safer than cyclist-truck. Note that I have not given greater weighting for the number of incidents; if a truck driver cleans up four cyclists in a fit of lane-change-rule ignorance, that’s given a higher risk rating than a regularly negligent driver who has hit three bikes in three separate incidents.

Year Make Model total_cyclists volume risk_rating
1997 HYNDAI FX 4 21 0.1905
2003 SCANIA L94UB 4 26 0.1538
2006 SCANIA L94UB 3 24 0.125
2010 MAZDA 3B 4 32 0.125
2005 MITSUB BE649J 3 25 0.12
2003 STERLI LT7500 2 21 0.0952
1996 SSANGY MUSSO 2 21 0.0952
1999 MERC B 0405NH 3 32 0.0938
1994 MAZDA MPV 2 22 0.0909
1991 HOLDEN EXEC 2 22 0.0909

The following table removes the years of vehicles, for an ‘all time’ result

rank Make Model total_cyclists volume risk_factor
1 MERCEDES BENZ 0405NH 3 32 9.375
2 MERCEDES BENZ A160A 2 22 9.0909
3 ISUZU FSD700 2 24 8.3333
4 FORD EA 2 34 5.8824
5 SCANIA L94UB 11 202 5.4455
6 MITSUBISHI BE649J 6 115 5.2174
7 MERCEDES BENZ VARIO 1 20 5
8 HOLDEN 8VK 1 20 5
9 BMW 3SER92 1 21 4.7619
10 PEUG 20601B 1 21 4.7619

No Trucks Allowed

Ok, so trucks are pretty dangerous for bikes. But what are the kinds of cars to look out for?

rank Make total_cyclists volume risk_factor
1 DAEWOO 70 5591 1.252
2 SMART 3 291 1.0309
3 STERLING 4 491 0.8147
4 DAIHATSU 24 4479 0.5358
5 HINO 43 10297 0.4176
6 CITROËN 21 5083 0.4131
7 ALFA ROMEO 24 6051 0.3966
8 VOLVO 106 28226 0.3755
9 FORD 2032 553006 0.3674
10 CHRYSLER 39 10645 0.3664

The results appear to be heavily skewed towards unusual/less common cars. In particular, I am surprised to see the Smart car rated so poorly!

Conclusion

It looks like my ‘gut feeling’ of avoiding every single twin-cab I see on the road isn’t held up by the data (unless every cyclist acts like me, and that’s the reason nobody is getting hit).

Bonus: Holden vs Ford

Holdens have an accumulated number of 1,948 cyclist-related incidents for 656,930 registered cars; Fords, 2,042 for 553,006. But should we accept the hypothesis that these are significantly different proportions from the same population? In MATLAB:

x1 = 1948; n1 = 656930; x2 = 2042; n2 = 553006;
p1 = x1/n1; p2 = x2/n2;
diff = p1 - p2;
p = (x1+x2)/(n1+n2);

se = sqrt( p * (1-p) * (1/n1 + 1/n2));
z = diff/se;
p_val = normcdf(z)*2

p_val = 3.6335e-12

That is, p << 0.05, and we can reject the idea that there’s no difference between the number of Holden and Ford crashes with cyclists. Just as I suspected, Ford drivers.