In 2019, Amazon upgraded its Alexa assistant with a feature that enables it to detect when a customer is frustrated — and respond with proportionately more empathy. For example, if a customer asked Alexa to play a song and it played the wrong song, and then the customer said “No, Alexa” in an upset tone, Alexa could apologize – and request clarification. Is.

Now, the group behind one of the data sets used to train the text-to-image model Stable Diffusion wants to bring similar emotion-recognition capabilities to every developer at no cost.

This week, LAION, a nonprofit building image and text data sets for training generative AI, including Stable Diffusion, announced the Open Empathic Project. In the words of the group, Open Empathic aims to “equip open source AI systems with empathy and emotional intelligence”.

“The LAION team, with a background in health care, education, and machine learning research, saw a gap in the open source community: emotional AI was largely ignored,” LAION co-founder Christoph Schuhmann told TechCrunch via email. ” “Like our concerns about non-transparent AI monopolies that led to the birth of LAION, we feel a similar urgency here.”

Through Open Empathic, LAION is recruiting volunteers to submit audio clips to a database that can be used to create AI, including chatbots and text-to-speech models, that “understand” human emotions. .

“With Open Empathic, our goal is to create an AI that goes beyond just understanding words,” Schuhmann said. “Our goal is to understand the nuances of expressions and tone shifts, making human-AI interactions more authentic and empathetic.”

LAION, short for “Large-Scale Artificial Intelligence Open Network”, was founded in early 2021 by Schuhmann, a German high school teacher by day, and several members of a Discord server for AI enthusiasts. Were. Funded by donations and public research grants, including from AI startup Hugging Face and the vendor behind Stability AI, Stable Diffusion, LAION’s stated mission is to democratize AI research and development resources – starting with training data.

“We are driven by a clear mission: to harness the power of AI in ways that can truly benefit society,” Kari Nori, an open source contributor to LAION and a PhD student at Bournemouth University, told TechCrunch via email. told. “We are passionate about transparency and believe the best way to shape AI is out in the open.”

So openly sympathetic.

For the initial phase of the project, LAION has created a website that tasks volunteers with interpreting YouTube clips – some pre-selected by the LAION team, others by volunteers – of an individual speaking. For each clip, volunteers can fill out an extensive list of fields, including a transcription for the clip, an audio and video description and the age of the person in the clip, gender, accent (e.g. “British English”), level of arousal. (Vigilance -) is involved. No sexual, to be specific) and valence level (“pleasantness” vs. “unpleasantness”).

Other fields in the form relate to the audio quality of the clip and the presence (or absence) of loud background noise. But the main focus is on the person’s emotions – or at least, on the emotions the volunteers feel in them.

From a series of drop-down menus, volunteers can select individual – or multiple – emotions – ranging from “irritable,” “edgy” and “confusing” to “reflective” and “engaging.” Nori says the idea was to seek “rich” and “emotional” comments while capturing expressions across different languages ​​and cultures.

“We’re focused on training AI models that can understand a variety of languages ​​and understand different cultural settings,” Nori said. “We are working on building models that ‘get’ languages ​​and cultures using videos that show real emotions and expressions.”

Once volunteers have submitted a clip to LAION’s database, they can repeat the process anew – there is no limit to the number of clips a volunteer can comment on. LAION hopes to collect about 10,000 samples over the next few months, and – optimistically – between 100,000 and 1 million by next year.

“We have passionate community members, driven by the vision of democratizing AI models and data sets, voluntarily contributing annotations in their spare time,” Nori said. “Their inspiration is a shared dream of creating an empathetic and emotionally intelligent open source AI that is accessible to everyone.”

Disadvantages of emotion detection

In addition to Amazon’s efforts with Alexa, startups and tech giants alike have explored developing AI that can detect emotions — for purposes ranging from sales training to preventing drowsiness-induced accidents.

In 2016, Apple acquired Emotient, a San Diego company working on AI algorithms that analyze facial expressions. Snapped up by Sweden-based Smart Eye last May, Affectiva — an MIT spin-out — once claimed its technology could detect anger or frustration in speech in 1.2 seconds. And speech recognition platform Nuance, which Microsoft bought in April 2021, has demoed a product for cars that analyzes drivers’ emotions from their facial signals.

Other players in the budding emotion detection and recognition field include Hume, HireVue, and Realease, whose technology is being used to measure how certain segments of audiences respond to certain ads. Some employers are using emotion detection technology to evaluate potential employees based on empathy and emotional intelligence. Schools have deployed it to remotely monitor student engagement in the classroom and at home. And emotion-detecting AI has been used by governments to identify “dangerous people” and tested at border control stops in the US, Hungary, Latvia and Greece.

The LAION team, for its part, envisions helpful, problem-free applications of the technology in robotics, psychology, professional training, education, and even gaming. Schuhmann paints a picture of robots that provide assistance and companionship, virtual assistants that sense when someone is lonely or anxious, and tools that help diagnose psychological disorders.

This is a techno utopia. The problem is that much of emotion detection is based on shaky scientific ground.

Few, if any, universal markers of emotion exist – which calls into question the accuracy of AI detecting emotions. Most emotion detection systems were built on work by psychologist Paul Ekman published in the 70s. But subsequent research – including Ekman’s own research – supports the common sense notion that there are big differences in the way people from different backgrounds express how they are feeling.

For example, an expression considered universal for fear is a stereotype for danger or anger in Malaysia. In one of his later works, Ekman suggested that American and Japanese students react very differently to violent films, with Japanese students adopting “a completely different set of expressions” if there is someone else in the room – Especially an authoritative person.

The voices also cover a wide range of characteristics, including those of people with disabilities, people with conditions such as autism, and those who speak other languages ​​and dialects, such as African-American Vernacular English (AAVE). A native French speaker surveying in English may pause or pronounce a word with some uncertainty – which may be misinterpreted as an emotion marker by an unfamiliar person.

Indeed, a big part of the problem with emotion detection AI is bias – implicit and explicit bias brought in by annotators, whose contributions are used to train emotion detection models.

For example, in a 2019 study, scientists found that labelers were more likely to interpret more toxic phrases in AAVE than their General American English counterparts. Sexual orientation and gender identity can have a huge impact on which words and phrases an interpreter considers toxic – and can also create bias outright. Racist, sexist, and otherwise offensive labels from annotators have been found in many commonly used open source image data sets.

The downstream effects can be quite dramatic.

AI hiring platform Retorio was found to react differently to the same candidate in different attire like glasses and headscarf. In a 2020 MIT study, researchers showed that facial analysis algorithms can be biased toward certain facial expressions, such as smiles – reducing their accuracy. Recent work shows that popular emotional analysis tools indicate more negative emotions on the faces of black men than on white faces.

respecting the process

So how will the LAION team combat these biases – for example, ensuring that white people do not outnumber black people in the data set; Non-binary people are not misgendered; And that people with mood disorders are not mislabeled with emotions they did not intend to express?

This is not entirely clear.

Schuhmann claims that the process of submitting training data to Open Empathic is not an “open door” and that LAION has systems in place to “ensure the integrity of contributions.”

“We can verify user intent and constantly check the quality of annotations,” he said.

But LAION’s previous data sets haven’t exactly been pristine.

Some analysis of LAION ~400M – a LAION image training set, which the group attempted to curate with automated tools – revealed images depicting sexual assault, rape, hate symbols, and graphic violence. Bias is also rampant in LAION ~400M, for example returning images of men but not women for terms like “CEO” and returning images of Middle Eastern men for “terrorist”.

Schuhman is relying on the community to serve as a check during this period.

“We believe in the power of hobbyist scientists and enthusiasts from around the world coming together and contributing to our data sets,” he said. “While we are open and collaborative, we prioritize quality and authenticity in our data.”

As far as how any emotion-detecting AI trained on open empathic data sets is used – biased or not, LAION intends to maintain its open source philosophy – even if that means the AI Can be misused.

“Using AI to understand emotions is a powerful enterprise, but it is not without its challenges,” Robert Kaczmarski, co-founder of LAION and physician at the Technical University of Munich, said via email. “Like any other tool, it can be used for both good and bad. Imagine if only a small group had access to advanced technology while the majority of the public remained in the dark. This imbalance can also lead to misuse or manipulation by the few who control this technology.

When it comes to AI, sometimes the laissez-faire approach comes back to bite the creators of models – as evidenced by how Stable Diffusion is now being used to create child sexual abuse material and non-consensual deepfakes. Is being done for.

Some privacy and human rights advocates, including European Digital Rights and Access Now, have called for a complete ban on emotion recognition. The EU AI Act, recently enacted European Union legislation that establishes a governance framework for AI, prohibits the use of emotion recognition in policing, border management, workplaces and schools. And some companies, like Microsoft, have voluntarily retired their emotion-detecting AI in the face of public criticism.

However, LAION is comfortable with the level of risk involved – and has confidence in the open development process.

“We welcome researchers to poke around, suggest changes and explore issues,” Kaczmarski said. “And just as Wikipedia thrives on its community contributions, Open Empathic is fueled by community involvement, ensuring it is transparent and secure.”

transparent? Sure. Safe? Only time will tell.

