IEEE ICDM 2010 Contest

by Kyriakos Chatzidimitriou | Sep 8, 2010 09:25 | tutorials

competitiondata miningmachine learning

Just for fun, I participated in the IEEE ICDM 2010 Contest - Traffic track, with a couple of R scripts, at first using linear regression and later neural networks. Mainly due to summer vacations limiting time available, the approach was nothing too fancy, ending up in the 17th place out of 101 active participants.

The task was to predict traffic in 10 road segments, 2 ways each, for 1000 60-minutes long windows between the 41st and the 50th minute, knowing only the first 30 minutes. Historical data were provided in the form of 100 10-hour windows (60000 rows) with 20 values per row, corresponding to the traffic observed in a minute of one of the 10 road segments x 2 ways.

My best result in the competition was obtained using the following procedure:

a. Preprocessing: Transform the training and test datasets, having corresponding to 10-minutes intervals rather than 1-minute intervals. Normalize all value to [0,1]. b. Modelling: Make the problem a supervised learning problem. I used 60 attributes, 20 for time t+1 to t+10, 20 for time t+11 to t+20, and 20 to t+21 to t+30, to predict one of the 20 traffic values at time t+41 to t+50. Thus 20 such datasets were created, one for each road segment and way. c. Training: 20 Feed-Forward Neural Nets (FFNNs) were trained for each one of the above 20 datasets, and 20 more were trained the same way, using a reduced dataset with 15 attributes instead of 60. This was achieved by using ReliefF feature selection algorithm in WEKA and maintaining the top 15 attributes. Each one of the 40 FFNNs had its weights randomly initialized. The former 20 FFNNs had 15 hidden units, while the later 30. Decay rate was also used. d. Predicting: Predictions were made for each one of the 20 target values using all 40 NN. The final prediction was the mean value of the 40 predictions.