How Can Apple Improve Navigation With Machine Learning?

Reading Time: 5 minutes.

Apple presentation slide on CoreML, a framework for adding machine learning to third party appsA day ago, I posted a tongue-in-cheek how-to video. It was met with raucous applause and laughter. I kid, it was mostly met with comments like “Danielle, this video is too short to have the same joke three times, you need to work on your timing.”

But, there was a very serious point I wanted to make about navigating our devices on iOS. It’s simply not as intuitive as Apple would have you believe. And I wouldn’t mock it if I didn’t have a solution. The solution, as any software engineer would likely excitedly tell you, is artificial intelligence. Specifically, it’s a deep learning algorithm that would take into account a wide variety of inputs. Namely, the category of the app, its name, the developer behind the app, the color of the app icon, the way the user interacts with their apps, and, most obviously, the location of the user’s finger on the screen.

So how could this work, and how could it have improved the experience I demonstrated in my video yesterday? I’m glad you didn’t ask.

Machine Learning

Bender, a robot from the TV show Futurama, reading a book

Ask yourself: are our machines learning?

 

It’s 8am on a Friday. Ask yourself: are your machines learning? Because they should be. It would be foolish to have the technology and the experience required to do some simple machine learning tasks on our phones and choose not to do them. We’ve allowed AI to make suggestions for our morning routines, tell us the weather, and improve our photographs. But can machine learning make software navigation itself useful? I argue of course.

Think about every high-tech software you’ve seen in futuristic movies or TV shows. The user interfaces (UIs) are often driven by esoteric gestures, like pushing a window away with a flippant wave of a hand, voice commands that are natural language, or obscure requests that could only be understood with context about a conversation happening in the room.

If we want any of those to work, we’re going to need AI to interpret our actions. Even pre-defined gestures, like waving a windows away, need to be understood because not everyone dismisses something the same way. If we want our futuristic UIs, we’re going to have to start working on getting our machines to do not exactly what we’re telling them to do, but instead what they think we want them to do.

That’s right, we’re going to have to teach our devices to misbehave. Turns out, Bender wasn’t such a bad robot after all, huh?

App Categories

The first and perhaps easiest thing to do is to categorize apps. There are many ways to do this. However, if we know possible reasons someone would have for grouping apps together, we can better predict whether or not they’re trying to move an app past a folder or into a folder.

The obvious way to do this is app categories on the app store. Developers already categorize their apps into categories like photography, games, utilities, education, and others. Using these categories, we can predict whether or not you’re going to lump apps together.

A list of tools in Core ML, like natural language processing, image recognition, and other toolkitsHowever, there are other categories we can get to as well. We can create sub-categories using natural language processing (NLP) on their app descriptions to find common themes. Games that discuss retro graphics might go together in an arcade category. Ones that involve guns will go into shooters. Jumping and puzzle solving? Sounds like a platformer! We can do this with other categories as well. Some photography apps will be photo editors. Others will be camera apps. Some others will be social networks for photographers. By finding these topics in the descriptions of the apps, we can find other similarities someone might use to group apps together.

Then there are the more esoteric ones. You can look at developers. Maybe someone wants to group all the apps together from one developer. You can look at app colors. There are plenty of people who have organized their home screens by the colors of the app icons. The possibilities are limitless. That’s why, on top of the manual creation of categories, we’d also need our devices to discern categories on their own.

Feedback Loop

The Core ML iconThis would be a feedback loop, used to train a deep learning model. Here, the AI wouldn’t necessarily “know” the category. It wouldn’t have a human-readable name for it. Instead, it might notice that you frequently group blue photography apps with red elements in the icon from South American app developers into categories (this is an obvious exaggeration). When your phone sees you’ve downloaded another such app, it will know you might group it with the others.

We can look at usage habits as well. Perhaps you group the apps you use before bedtime together. You’ve got a clock app, a sleep tracking app, and a Siri shortcut for turning off all your lights grouped together. The AI wouldn’t necessarily know these are “nighttime” apps, but it would know you use them every day around the same time. So, when you’re reorganizing your apps, it could predict that you may put them together.

This means your phone will be unique to you. The OS itself will customize how it works based on your preferences and usage. It’s a feedback loop, and it would make everyone’s phones unique. But what would our phones do with these predictions?

Finger Location

A "tap map" from SwiftKey, their predictions for how I type and the typos I frequently make. The iPhone keyboard isn’t what it seems. Since the very first version of iOS, it has shown users one thing but did another thing under the surface. Using word predictions, Apple has predicted what letters you might type next. While the keyboard itself won’t change size, the underlying areas that iOS recognizes as a particular letter will change. If you type a ‘Q,’ for example, iOS knows the next letter you’re going to type is likely ‘U.’ Therefore, if your finger hits the ‘Y,’ ‘J,’ or ‘I’ keys, but you’re close to the ‘U’ key, it should interpret your press as a typo, and enter the letter ‘U’ for you.

Using the machine learning and app categorization above, we can do a similar thing with navigation in iOS. Let’s call it “fuzzy locations.”

Fuzzy Locations

Fuzzy locations would represent the areas you’re most likely to tap or drag an app to. It could increase the tap area around a folder, on the edge of your iPhone screen, or outside an app grouping. For example, in the video I made, the Fortnight app belonged with the other games by Epic. So, when I dragged the app near that folder, instead of going to the previous page, trying to place the app around the folder, or trying to close the folder when I was attempting to put the app on a second page, it would keep me in the folder. Obviously, I could pull my finger further away and it would cancel the action. I would still be in control. However, iOS would increase the margin of error. Perhaps I’m doing this while being jostled around on the subway, or while walking in my apartment? iOS would be able to counter my inaccuracy with predictions.

Can iPhones do This?

An Apple slide showing Core ML works for macOS, iOS, watchOS, and tvOSAbsolutely! The human-curated app categories would require next to no processing. The deep learning models that require a user feedback loop might require a bit more processing, but this is nothing compared to the capabilities of modern iPhones. The machine learning required to unlock your phone, take a photo, or identify objects and text in photos is many magnitudes more difficult than this. Furthermore, the subcategorization of apps based on NLP and topic detection for apps within the App Store wouldn’t happen on a user’s device. Instead, Apple would run this categorization tool when the app is submitted to the store, offloading some of the more complex machine learning that isn’t unique to the user. This would improve performance while protecting your privacy.

Our operating systems and applications are already making small decisions for us. They’re already trying to figure out our intentions. It’s time we take those predictions and expand upon them. Perfect user interfaces of the future will be customized for the individual user, unique as they are. We’re not going to get Minority Report interfaces until we start using machine learning to predict our intentions.