This article could be useful to know how to identify and monitor the physical activity of users of smartphones. We will share some methods of cleaning education information, feature choice, proper selecting classification algorithm and model validating. You will see the process of development of the recognition system for mobiles.

Introduction

The stunning activity smartphones apps could do is to sense present users physical activity like walking, driving, or staying. Identification of activities features numerous apps from fitness or health monitoring to context-based marketing and staff tracking. Context-aware apps are able to personalize activities depending on present process. As an example, if the user is looking for nearby companies it could use bigger radius when driving, and smaller when walking.

One of the apps created for training data accumulating is the Sensor Logger by IBM. It extracts the accelerometer appr. 50 times a second and writes results into local file. Also, it writes captured from GPS present speed. This application was presented to 20 volunteers to install on their mobile phones and use this app for data reporting in the process of exercises.

Feature Recovery

Files recorded by Sensor Logger were examined with the feature recovery program. This program divided logged information into parts of 3 sec, then computed representative feature of every part. These features were commonly determined by frequency analysis of the accelerometer data. It was calculated with a fast Fourier transform (FFT). The FFT functions were after that measured by splitting the series of FFT coefficients into subranges and getting the sums of coeffs in every range. To illustrate, one among the arrangements was the sums of coeffs in a range of 1 Hz, 2 Hz so on. It was calculated a few optional sets of features in the range of minimum and maximum frequencies.

Other characteristics used together with FFT were the mean, the variation, and the energy of the signal, and additionally velocity provided by GPS. The measurements were saved in the database, records were made every 3 second-part with acquired features and the activities from which information was taken. Data connected with log file was also saved in the database. It consisted of a phone model, the OS, the username, and the track name.

Cleaning Data

Before accepting a new log file into the testing data it should be examined for accuracy of classifying with decision tree which was created using validated data. In case the classification accuracy of a new log file is lower an obtained from random partitioning it needs to be audited. To fix the issue the log file should be edited and all wrong data which is determined from inspection should be deleted.

Model Validating

Log files of every track are randomly split between training and testing sets while randomly partitioned. Consequently, training and testing sets may be related, because of consist of records of the same file. Data which is part of the same track logged by the same user with the same phone on the same situation, are usually near each other regarding a variety of features. For that reason examination with random partitioning could be not really indicative and could not show overfitting. A preferable method to define the accuracy is to divide by users – delete several users from training information and apply the data recorded by these users for testing. In a minimum one instance, one of the feature sets provided by random partitioning showed good results, however, results were not as positive with dividing by users, possibly because of overfitting happened by random partitioning.

OS Independency Validating

If you accumulate data from Android, iOS and Windows Phone OS it is crucial to check the possibility to mix the results and create one common decision tree. We did some testing with dividing data by operating systems, just to illustrate the creation of a decision tree from information obtained from Android and use it for iOS. Results proved the same behavior of accelerometer in every OS, no big contrast between data obtained from these operating systems. This approval granted mixing the data and creating a single decision tree.

The tree could be exported as text, HTML or a PMML file. The text file is a basic textual depiction of the tree in which internal nodes are shown by the predicate which is connected with the node and line indentation is needed for branching structure.
Recognition model begins with collecting data from accelerometer and GPS. Every 3 seconds the information is calculated for feature extracting by using FFT. Eventually, the original decision tree code is used to get a classification result – the recognition activities.