One of the project goals is to examine the possibility to identify Swedish bird species from birdsong. This is planned to be done through signal processing. The audio file of a birdsong recording will first be cut into smaller samples and then Fourier transforms are utilized to analyze the frequencies in each sample. From here all the data is analyzed to obtain the three key features:frequency, pattern and length.
Another goal, time permitted, was plans for a smartphone application for common smartphones. The application was thought to give the user the ability to identify live bird songs out in the field. A market research was done for the application to see what functions the user would want.
With the methods used, it proved possible to identify species with a success rate of about 90 % in a noise free environment. This success rate was less effective when using recordings from a more natural environment; we got a success rate of 50 % – 75 % with only a few disturbing noises. In cases where the user was able to catch only parts of the birdsong, the success rate decreased to 20 % -40 %.
METHODS AND THEORY
Previous Research About Bird Recognition by Sound:
In the earliest research in pattern recognition of bird songs, features were extracted from frequency-time plots called sonograms by ornithologists, where an element is a burst of sound separated by a distinct pause termed an inter-element interval.
Interviews and Market Research:
Significant effort was made to survey the interests among ornithologists and others interested. A long interview was completed with ornithologist hobbyist, Bill Karlström, and former doctor and hunter, Gunnar Stenström. Shorter interviews were also made with a random selection of six people to cover different categories of individuals.
The idea of the smartphone application was to port the identification algorithm to a handheld device in order to make it practically useful in real life situations. While a finished, functioning app was not in the scope of the project much time and thought was put into how an eventual app would function and look like.
The Android platform is based on the GNU Linux Operating System (OS). The Linux core controls the internal, while the Android libraries control the external. The external consists of telephone, video, graphics, and user interface and is engineered by the programmer.
An android app is composed of four main components: Activities, Services, Content providers and Broadcast receivers.
- The activity is written as a single Java class, with the main activity extending Android’s Activity class and forms the base.
- The service runs in the background and does not interact with the user.
- The content provider saves data between startups and changes.
- The broadcast receiver reacts to actions taken by the phone or new system conditions.
In order to evaluate the effectiveness of the classification system a test database containing 47 different species of birds was created with a corresponding training database. The test database consisted of 220 bird recordings that were compared to the recordings in the training database.
Threshold and Overload
In order to distinguish which samples was just noise and which contained a signal a threshold and overload value was chosen. The threshold was set from the normalized value of the FFT amplitudes, by varying the threshold value it was observed that a threshold of circa 40% of the peak value gave the best results.
After the midterm meeting with the examiner it was made clear it was not something that should be considered a priority. There was a basic interface created but the main goal of making the classification system work on Android was dropped.
THE FUTURE OF THE PROJECT
The market surveys showed there was a clear interest for an app of this kind. One of the most obvious areas this app could be used is for educational purposes by providing both children and adults with a way to identify the birds around them, combining an interest in nature and the outdoors with technology that is increasingly in the homes and classrooms.
The performance could be improved by different segmentation, such as distinguishing individual tones instead of the fix 0.2 second limit. Like in human speech recognition there is much important information in the higher frequencies even though that’s not where the bulk of the energy lies. This project did not examine any weighting methods to extract that information, but it is possible.
Source: Chalmers University of Technology
Authors: Anthon Liljeroth | Alexander Tholin | Mårten Hernebring