Victorian Crash Statistics Redux

Since the publication of my analysis into Victorian crash data, I have been made aware of a more comprehensive crash data set, which includes vehicle information. However this always struck me as useless without a means to ‘baseline’ this information — what didn’t occur to me was that I could just email VicRoads and ask them for anonymised vehicle registration information. And less than 24 hours later, they emailed me an Excel spreadsheet full of all but the most obscure makes and models on Victorian roads. So, I present Victorian Crash Statistics: Redux.

Introduction

I ride bikes, and for the most part like it. The major downside to it is interacting with cars and their ignorant drivers. Over my riding history, I have noticed that some cars seem to be higher risk to be near than others. This post will explore whether an automobile-profiling approach really is a viable harm-reduction strategy, or if all cars are essentially created equal. NB: I look mainly at the incidence of incidents, and not the severity, which though catalogued in grades 1–5, falls into a classic government data publication trap: there is no key for these 1–5 grades.

About the Incidents

This is a large data set, spanning all accidents reported to or attended by police from 2006-01-01 to 2017-06-30. In all there are 157,285 unique incidents here, of which 16,183 involved a bicycle. 337 involved more than one bike, and 1,106 involved only bikes.

Highest Incidence, by Model and Year

The table below gives the weighted top 10 (occurrence/registration volume) for all vehicles in the dataset. Given that older models would have more time to cause crashes, I am very surprised to see a 2012 and a 2009 model of the same car rank so highly.

VEHICLE_YEAR_MANUF	VEHICLE_MODEL	COUNT	WEIGHTED_OCCURRENCE
2000	BERLINA	107	4.4583
2001	BERLINA	129	3.1463
2003	BERLINA	94	3.1333
2012	RANGER	67	2.913
2009	RANGER	63	2.7391
2002	BERLINA	98	2.5128
1996	CHEROKEE	50	2.2727
2008	RANGER	115	2.2115
2014	RANGER	53	2.12
2013	RANGER	59	2.1071

Risks for Cyclists

This table describes the highest-risk vehicles for cyclists, based on our crash data. Trucks feature much more heavily here, which is of particular concern, given that car-truck incidents are much safer than cyclist-truck. Note that I have not given greater weighting for the number of incidents; if a truck driver cleans up four cyclists in a fit of lane-change-rule ignorance, that’s given a higher risk rating than a regularly negligent driver who has hit three bikes in three separate incidents.

Year	Make	Model	total_cyclists	volume	risk_rating
1997	HYNDAI	FX	4	21	0.1905
2003	SCANIA	L94UB	4	26	0.1538
2006	SCANIA	L94UB	3	24	0.125
2010	MAZDA	3B	4	32	0.125
2005	MITSUB	BE649J	3	25	0.12
2003	STERLI	LT7500	2	21	0.0952
1996	SSANGY	MUSSO	2	21	0.0952
1999	MERC B	0405NH	3	32	0.0938
1994	MAZDA	MPV	2	22	0.0909
1991	HOLDEN	EXEC	2	22	0.0909

The following table removes the years of vehicles, for an ‘all time’ result

rank	Make	Model	total_cyclists	volume	risk_factor
1	MERCEDES BENZ	0405NH	3	32	9.375
2	MERCEDES BENZ	A160A	2	22	9.0909
3	ISUZU	FSD700	2	24	8.3333
4	FORD	EA	2	34	5.8824
5	SCANIA	L94UB	11	202	5.4455
6	MITSUBISHI	BE649J	6	115	5.2174
7	MERCEDES BENZ	VARIO	1	20	5
8	HOLDEN	8VK	1	20	5
9	BMW	3SER92	1	21	4.7619
10	PEUG	20601B	1	21	4.7619

No Trucks Allowed

Ok, so trucks are pretty dangerous for bikes. But what are the kinds of cars to look out for?

rank	Make	total_cyclists	volume	risk_factor
1	DAEWOO	70	5591	1.252
2	SMART	3	291	1.0309
3	STERLING	4	491	0.8147
4	DAIHATSU	24	4479	0.5358
5	HINO	43	10297	0.4176
6	CITROËN	21	5083	0.4131
7	ALFA ROMEO	24	6051	0.3966
8	VOLVO	106	28226	0.3755
9	FORD	2032	553006	0.3674
10	CHRYSLER	39	10645	0.3664

The results appear to be heavily skewed towards unusual/less common cars. In particular, I am surprised to see the Smart car rated so poorly!

Conclusion

It looks like my ‘gut feeling’ of avoiding every single twin-cab I see on the road isn’t held up by the data (unless every cyclist acts like me, and that’s the reason nobody is getting hit).

Bonus: Holden vs Ford

Holdens have an accumulated number of 1,948 cyclist-related incidents for 656,930 registered cars; Fords, 2,042 for 553,006. But should we accept the hypothesis that these are significantly different proportions from the same population? In MATLAB:

x1 = 1948; n1 = 656930; x2 = 2042; n2 = 553006;
p1 = x1/n1; p2 = x2/n2;
diff = p1 - p2;
p = (x1+x2)/(n1+n2);

se = sqrt( p * (1-p) * (1/n1 + 1/n2));
z = diff/se;
p_val = normcdf(z)*2

p_val = 3.6335e-12

That is, p << 0.05, and we can reject the idea that there’s no difference between the number of Holden and Ford crashes with cyclists. Just as I suspected, Ford drivers.