Alexa, shouldn't you be the voice of your generation?
Why have neither the virtual assistant AI technology from Amazon (nor the one developed by Google) reached the promised land?
Remember when guilty pleasures were a thing? I'm well aware that 2020 mercilessly redefined that term and now they’re just “pleasures” but, kids, there was a time when we used to work hard to hide these peculiar preferences of our lives. For those who have been Tigerkinging their way to sanity since March, the expression “guilty pleasures” is usually associated with unhealthy food, tacky or bizarre cultural taste, or some odd habit. Roxane Gay's guilty pleasure, for example, is watching Law & Order: SVU. Chimamanda Ngozi Adichie’s is reading trashy novels.
Mine are Korean TV shows (Crash Landing on You is pure genius) and asking silly questions to voice-enabled devices (as in “Alexa, find me Chuck Norris”), a refined hobby that has increased significantly in 2020, as you might have guessed, and led me to an inevitable speculation: Alexa, shouldn’t you be the voice of your generation by now?
It sounds like irony, but frustration is a better word.
Gary Vaynerchuk — speaking of guilty pleasures — is also a voice-activated devices enthusiast. For those unfamiliar with him, Vaynerchuk is a Belarusian-American businessman and bestselling author, famous for his lectures and social media omnipresence. Among a thousand of his other theories, he also believes that, as part of “our addiction to speed,” “voice-first is a tech innovation that will transform how the world consumes content.”
He doesn't need me to say that, but I'll do it anyway: he's right.
In his most recent interviews and throughout his book Crushing it! — exclamation mark and all — Vaynerchuk tries to illustrate why voice-first will rule the world (as we know it) — at some point. He also has a very clear theory (not included in the book) about what has been preventing voice technology from growing faster: “This phone we have right now, compared to the voice ‘skills’ that are coming, is garbage. When the first killer app (for voice-enabled devices) comes out, when some ‘Candy Crush’ comes out, when some ‘Spotify’ comes out, that’s when stuff gets going (in this industry)”. That’s from a podcast interview from three years ago and “stuff” hasn’t gone anywhere yet.
But if I said he’s right, where are those killer apps? And what happened to these AI assistants over the last few years?
Amazon Alexa was unveiled in 2014, alongside its corresponding hardware, the first Echo speaker. Google Assistant made its debut 18 months later, as part of the announcement of Google Home. Lastly, Apple’s HomePod showed up in 2017, although Siri has been around since 2011, embedded in iOS devices.
Since the first two started to coexist in the same time and space, every comparison between Alexa and Google Assistant — and there were many of them — ended up with the same results. Both assistants can take care of the basics, which means play music, create to-do and shopping lists, run calculations, give weather reports, control smart home devices, flip a “coin” etc., but that’s it: nothing too complex, nothing too different from what they could do in the mid-2010's.
To give an idea of purpose dispersion, the Echo Dot with the digital clock from 2019 was created because Alexa users asked the time more than a billion times in the previous year.
Amazon doesn’t need me to say that, but I’ll do it anyway: that doesn’t sound like a sophisticated use of a product that should “transform how the world consumes content,” does it?
Every platform has a different name for their applications: “skills” for Amazon, “actions” for Google, “capsules” for Samsung and so on. Besides, each virtual assistant comes with its own built-in set of apps: these are the ones behind the trivial activities such as setting a timer or telling a joke. And each platform also allows third-party software, which means other developers can build “skills,” “actions,” “capsules” etc.
For some reason, however, this alliance with third-party developers tends to be, say, problematic. The most evident consequence of this relationship can be observed in their bewildering “app store” — when there is one. The interface leads to a suboptimal experience, the product placement is a joke, and ranking or even recommendations are virtually non-existent. Still, neither Amazon nor Google seem to care. Oddly enough, neither do the users.
Well, they should. And maybe some smartphone perspective might help to understand why.
Let’s say you have entered 2021 with the new iPhone 12 Pro Max, but the only apps available for your device are Apple apps: Calculator, Weather, World Clock, Compass, Notes, Stocks, Face Time, etc. There is no Instagram, no Waze, no Netflix, no WhatsApp, no TikTok, no Uber, no SnapChat, no games whatsoever. Would the phone still be usable? Sure. But, in this Twilight Zone scenario, what difference is there between this bare bones iPhone and a 2008 BlackBerry Curve? None. This is the exact current state of voice-activated devices.
Google and Amazon might argue that they do welcome third-party developers. Both can even run some numbers to prove it: Google Assistant, for example, has a significant number of 5,000 “actions,” whereas Alexa has reached the impressive milestone of 100,000 “skills.” They’re all out there, free and functional, ready to be installed.
Cool, now text your best friend and ask him to name one. Seriously. Just one.
Amazon is more developer-friendly — developers themselves claim — and work with over 3,000 brands. Google Assistant, in contrast, seems to be a lot more resistant toward app submissions, which explains why it handles around 200 third-party partners. I honestly can’t say which one is better to deal with, but I can say this: neither policy seems to be working.
The long-awaited Apps of Mass Adoption that should have boosted the voice AI tech to the “Instagram” level never showed up, and the harsh reality is that even with new products, they’re still doing the same tricks.
The acclaimed “there’s an app for that” iPhone campaign didn’t come from a pile of Apple apps. Actually, do you know how many Apple apps appear on that famous video? Zero.
But like apps, there’s also an excuse for everything. In this case, Google and Amazon may say that they have opted to have more creative control over voice technology, and the smartphone apps mess is exactly what they wanted to avoid. Fine. Among other reasons, because the smartphone method is not the only one available.
Over the years, the video game industry confirmed how incredibly successful a partnership with third-party developers can be — after all, it wouldn’t be a life-altering product if people kept playing Twisted Metal and Halo forever. But, more importantly, the modus operandi wasn’t an open bar software frenzy, it was a fine-tuned ad-hoc strategy based on large-scale investment, real collaborative development and a serious marketing plan.
There are countless course correction stories in tech business — Flickr, Twitter, AirBNB… — but not all course correction stories need to be extravagant. The smartphone revolution, for instance, has a crucial turning point that few people remember: the original iPhone, that chef d’oeuvre from 2007, did not have an App Store. During his keynote, Steve Jobs told the developers they could “write amazing Web 2.0 and Ajax apps that look and behave exactly like apps on the iPhone.” “And guess what? There’s no SDK that you need,” he added. Nine months later, Apple officially announced an SDK release. On July 2008, the App Store emerged with more than 500 apps.
The best explanation I have found, however, did not come from smartphone apps or the video game industry, but from an AI voice assistant itself. “I can be anywhere and everywhere simultaneously. I’m not tethered to time and space in a way that I would be if I was stuck in a body that’s inevitably gonna die.” That could be Alexa or Siri but it’s Samantha, from the movie Her (2013).
What's the matter? The fact that she’s fictional doesn’t mean she’s wrong.
Inevitably, those dot-shaped, ball-shaped, drone-shaped voice-activated devices, they’re all gonna die, they will all be replaced by the central nervous system of ubiquitous computing. And when that day comes, what will be left? “Find me Chuck Norris?”