How to Easily Develop Your Own iOS Sound Classification App with Create ML

Recently I tried Create ML, my purpose was simply to develop a sample app that can distinguish snoring from other types of sounds.

Create ML save me a lot of time by providing the right tools to create and train the classifier model that was later on implemented in my iOS project.

In this tutorial, I will share my work with you, we will train an on-device machine learning model to be able to classify sounds, and then you will use it to develop a small iOS app that analyze the user sound in real time to detect snoring 😉

Apple describes Create ML as the tool that takes the complexity out of model training while producing powerful Core ML models.

Launch Create ML (available from macOS 10.14+). Select File/New Project and choose the Sound Classifier template.

Create ML provide a wide range of classifier models, Sound Classifier is one of them.

Give your model a name (SnoringClassifier seems to be a good one) and save.

Finish setup of new machine learning model classifier

Once finished, your project dashboard will look like this:

There is several sections in there, let’s explore them:

1/ Create ML allows us to create several models in one project, this is useful because you can easily compare models results. You can create a new model source by right clicking the current model and choose Duplicate.

2/ Create ML tells you that there is no data yet to train your model, you will fix this in a moment 🙂

3/ A model is fed with data, you will provide a data set and Create ML will split the data into 3 samples, the wide one will be used to train, while the second data sample will be used to validate and the third one to test the model before you can leverage a ready to use trained model.

Time to feed the monster 😉

Download the data here. Unzip the folder, I put in there two classes for our data, a class is a labeled data set that is used by Create ML to train the model, the two classes you just downloaded are labeled Snoring and Background.

Why a Background class?

Snoring sounds are mandatory to train the model in order to be able to recognize snoring, in addition, it is also important to train our model to differentiate between snoring and non-snoring voices. For this you provide a Background samples data set.

Select the Training Data section under Data Inputs and select the two folders (Snoring and Background) you just downloaded.

There is now 19 audio samples to train the model, you may notice that the Train button at the top is no longer disabled, we can now launch the training process 😉

Note: It is important to mention that the naming of the folders (classes) should be expressive and reflect the integrity of the data, the names of the classes are actually the labels that our machine learning model will use to identify the processed sound.

Click the Train button to begin the training process. This should take few moments, at the end of the process you should get a 100% trained model like below.

Create ML will use the provided data to train, validate and test a model. At the end, a 100% trained model is produced and ready to be exported.

Create ML model training session is composed of two main parts:

  • Feature extraction phase: in this phase Create ML reduces the set of raw data to more manageable groups for processing. This is also useful to reduce the amount of redundant data.
  • Training phase: Create ML effectively starts the training processing by splitting data for training, validation and testing.

You can learn more about feature extraction here.

Now that the model is ready, you can export it with a simple drag and drop to the place of your choice.

Once Create ML finishes the training session, a ready to use model (*.mlmodel) is available for export.

Testing your model from within Create ML

Download here the sample audio files we will use to test the accuracy of our model before shipping it in the iOS app.

Switch to the Testing tab and select the folder you just downloaded.

The testing view in Create ML allows to test the trained model with sample dataset to assess its accuracy
The Testing view in Create ML allows to test the trained model with sample dataset to assess its accuracy

Here you loaded 6 items (3 for background and 3 for snoring sounds). Click the Test Model button to launch the testing process.

Feel free to improve the testing results by providing more items to test (more background and snoring audio files).

Precision.. Recall

Metrics here are very important to get an idea of how your model is performing. To fully evaluate the effectiveness of a model, we must examine both precision and recall.

We provided 3 snoring and 3 background audio files, the model succeeded to identify 3 out of the 3 snoring dataset (100% precision) while identified 89% of the background dataset. In the other side, the program recall refers to the percentage of total relevant results correctly classified by the model, in this case it is 84% for the snoring sounds and 100% of the background sounds.

Now that we tested the model directly from Create ML, let’s integrate our model in a iOS project and test it out.

Note: Many tech enthusiasts say that more data means better models, in this way, feel free to start over again with a new model, train it with larger set of data and compare the results 😉

Open up Xcode and create a new Single View App, with Swift and Storyboard.

Create a new Xcode project

Select Main.storyboard from the Project navigator, then add a label and a button to the screen and center them horizontally.

You are going to use the native SoundAnalysis framework to be able to stream the audio data and analyse it.

Select the ViewController.swift file and change its content to the following:

import UIKit
import SoundAnalysis

class ViewController: UIViewController, SNResultsObserving {
    // 1
    @IBOutlet weak var activityLabel: UILabel!
    var snoringClassifier: SnoringClassifier?
    var model: MLModel?
    var audioEngine: AVAudioEngine?
    var inputBus: AVAudioNodeBus?
    var inputFormat: AVAudioFormat?
    var streamAnalyzer: SNAudioStreamAnalyzer?
    // 2
    override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view.
        snoringClassifier = SnoringClassifier()
        model = snoringClassifier?.model
        audioEngine = AVAudioEngine()
        inputBus = AVAudioNodeBus(0)
        inputFormat = audioEngine?.inputNode.inputFormat(forBus: inputBus!)
        // Create a new stream analyzer.
        streamAnalyzer = SNAudioStreamAnalyzer(format: inputFormat!)
        do {
            // Start the stream of audio data.
            try audioEngine?.start()
        } catch {
            print("Unable to start AVAudioEngine: \(error.localizedDescription)")
        }
    }

    @IBAction func startRecording(_ sender: Any) {
        // 3
        // Create a new observer that will be notified of analysis results.
        do {
            // Prepare a new request for the trained model.
            let request = try SNClassifySoundRequest(mlModel: model!)
            try streamAnalyzer?.add(request, withObserver: self)
            // Serial dispatch queue used to analyze incoming audio buffers.
            let analysisQueue = DispatchQueue(label: "com.apple.AnalysisQueue")
            // Install an audio tap on the audio engine's input node.
            audioEngine!.inputNode.installTap(onBus: inputBus!,
                                              bufferSize: 8192, // 8k buffer
            format: inputFormat) { buffer, time in
                
                // Analyze the current audio buffer.
                analysisQueue.async {
                    self.streamAnalyzer!.analyze(buffer, atAudioFramePosition: time.sampleTime)
                }
            }
        } catch {
            print("Unable to prepare request: \(error.localizedDescription)")
            return
        }
    }
    // 4
    // MARK- SNResultsObserving
    func request(_ request: SNRequest, didProduce result: SNResult) {
        
        // Get the top classification.
        guard let result = result as? SNClassificationResult,
            let classification = result.classifications.first else { return }
        
        // Determine the time of this result.
        let formattedTime = String(format: "%.2f", result.timeRange.start.seconds)
        print("Analysis result for audio at time: \(formattedTime)")
        
        let confidence = classification.confidence * 100.0
        let percent = String(format: "%.2f%%", confidence)

        // Print the result as Instrument: percentage confidence.
        print("\(classification.identifier): \(percent) confidence.\n")
        DispatchQueue.main.async {
            self.activityLabel.text = classification.identifier
        }
    }
    func request(_ request: SNRequest, didFailWithError error: Error) {
        print("The the analysis failed: \(error.localizedDescription)")
    }
    
    func requestDidComplete(_ request: SNRequest) {
        print("The request completed successfully!")
    }
}

Let’s clarify what we just did:

1// Among the stored properties you declared, snoringClassifier is a reference to the ML model file you just crafted in Create ML (you will import it in a few seconds). model property is a reference to the model associated to the file itself.

2// Setup work went here, you just extracted the model from the ML file and setup the audio engine required to start streaming the audio data from the device microphone.

3// Here you create a new sound classifier request (SNClassifySoundRequest), this request object need to know which model you are going to use to classify your sound data, that’s why you pass the model object as a dependency while initializing the request. You also set the view controller as the observer for any changes that the stream analyser would emit.

4// Since our view controller conform to the SNResultsObserving protocol, here you implement the SNResultsObserving protocol methods to respond accordingly to the stream analyser notifications. In this case, you just logged the output of the stream analyzer to the label to show on the screen and printed it to the console.

Don’t forget to bind the button and the label to the code.

Build and run, you should get a compiler error (Use of undeclared type ‘SnoringClassifier’) this is because the compiler couldn’t find the ML file we created earlier. To fix this, just drag and drop the SnoringClassifier.mlmodel file from your Finder to Xcode.

Before we run the project, we need to add the NSMicrophoneUsageDescription to the info.plist file to ask for user permission to use the microphone.

Open info.plist file and add the key NSMicrophoneUsageDescription and description:

Cool, build and run the app, you should be prompted with the system asking for your permission to grant the app access to the microphone, then click the “Start analyzing” button and watch your model results in the label as well as in the console, try to snore in front of the microphone and see wether your model recognize it haha 😉

Where to go from here?

You can download the final project here.

You can train another model to classify other types of sounds, like lough, crying, applause etc. Feel free to explore this and change your dataset accordingly.

This was my experience using Create ML, sound classification is just one of the areas Apple Machine Learning technology supports, you can explore Speech Recognition, Computer Vision and Natural Language.

Malek
iOS developer with over than 11 years of extensive experience working on several projects with different sized startups and corporates.