This week in AI: Generative AI and the problem of creator compensation

Keeping up with such a fast-paced industry AI is a big task. Until an AI can do it for you, here’s a handy roundup of the latest stories from the world of machine learning, as well as notable research and experiments that we haven’t covered alone.

By the way – TechCrunch plans to publish an AI newsletter soon. Stay tuned.

This week in AI, eight prominent U.S. newspapers owned by investment giant Alden Global Capital, including the New York Daily News, Chicago Tribune and Orlando Sentinel, sued OpenAI and Microsoft for copyright infringement related to the companies’ use of generative AI technology. You, like the New York Times in theirs ongoing lawsuit against OpenAIaccuse OpenAI and Microsoft of scraping their intellectual property without permission or compensation to create generative models such as B. to create and commercialize GPT-4.

“We have spent billions of dollars collecting information and reporting news in our publications, and we cannot allow OpenAI and Microsoft to expand the big tech playbook of stealing our work to build their own businesses at our expense “Frank Pine, the editor in chief who oversees Alden’s newspapers, said in a statement.

Given OpenAI’s situation, the lawsuit seems likely to end with a settlement and a licensing agreement existing partnerships with Publisher and his reluctance to make his entire business model dependent on it Fair use argument. But what about the rest of the content creators whose works are put into model training without payment?

Apparently OpenAI is thinking about it.

A recently published study Paper Co-authored by Boaz Barak, a scientist at The OpenAI Superalignment teamproposes a framework to compensate copyright holders “proportionally to their contributions to the creation of AI-generated content.” How? Through Cooperative game theory.

The framework evaluates the extent to which the content of a training data set – e.g. B. Text, images or other data – influences the generation of a model using a game theory concept known as Shapley value. Then, based on this assessment, the “rightful share” (i.e. compensation) of the content owners is determined.

Let’s say you have an image-generating model that was trained on artwork from four artists: John, Jacob, Jack, and Jebediah. They ask him to draw a flower in Jack’s style. The framework allows you to determine the influence of each artist’s works on the art produced by the model and therefore the compensation each should receive.

However, the framework has one disadvantage: it is computationally intensive. Researchers’ workarounds are based on compensation estimates rather than precise calculations. Would this satisfy content creators? I’m not sure. If OpenAI puts it into practice one day, we will surely find out.

Here are some other notable AI stories from recent days:

  • Microsoft reiterates ban on facial recognition: Language has been added to the terms of service for Azure OpenAI Service, Microsoft’s fully managed wrapper for OpenAI technology, that more clearly prohibits integrations from being used “by or for” police departments for facial recognition in the United States
  • The Nature of AI Native Startups: AI startups face different challenges than a typical software-as-a-service company. That was the message from Rudina Seseri, founder and managing partner of Glasswing Ventures, last week at the TechCrunch Early Stage Event in Boston; Ron has the whole story.
  • Anthropic presents a business plan: AI startup Anthropic is launching a new paid plan for businesses and a new iOS app. Team – the Enterprise plan – offers customers higher priority access to Anthropic Claude 3 Family of generative AI models and additional administrative and user management controls.
  • CodeWhisperer no longer: Amazon CodeWhisperer is now Q developera part of Amazon’s Q family of business-focused generative AI chatbots. Available on AWS, Q ​​Developer helps developers with some of the tasks they do in their daily work, such as debugging and updating apps, similar to CodeWhisperer.
  • Just leave Sam’s Club: Walmart-owned Sam’s Club says it’s relying on AI to accelerate its “exit technology.” Instead of requiring store staff to check members’ purchases against their receipts as they leave a store, Sam’s Club customers who pay either at a checkout or through the Scan & Go mobile app can now leave certain store locations without their Purchases should be checked again.
  • Fish harvest, automated: Harvesting fish is an inherently messy affair. Shinkei is working to improve it with an automated system that ships fish more humanely and reliably, which could lead to a completely different seafood economy, Devin reports.
  • Yelp’s AI Assistant: Yelp this week announced a new AI-powered chatbot for consumers — based on OpenAI models, according to the company — that will help them connect with relevant businesses for their tasks (e.g. installing lighting fixtures, upgrading outdoor areas, etc.). The company is rolling out the AI ​​assistant in its iOS app under the Projects tab and plans to expand it to Android later this year.

More machine learning

Photo credit: US Department of Energy

Sounds like there is Quite a party at Argonne National Lab This winter, they brought together a hundred experts from the AI ​​and energy sectors to talk about how the rapidly evolving technology could help the country’s infrastructure and research and development in this area. The resulting report is more or less what you would expect from this audience: a lot of futuristic stuff, but still informative.

Looking at nuclear energy, the grid, carbon management, energy storage and materials, the themes that emerged from this meeting were, first, that researchers need access to powerful computational tools and resources; second, learn to recognize the weaknesses of the simulations and predictions (including those enabled by the first thing); Third, there is a need for AI tools that can integrate and access data from multiple sources and in many formats. We’ve seen all of these things in different ways throughout the industry, so it’s not a huge surprise, but nothing gets done at the federal level without a few geniuses putting out an article, so it’s good to have it on the record.

Georgia Tech and Meta are partially working on this with a major new database called OpenDAC, a stack of reactions, materials and calculations designed to help scientists make carbon capture processes easier. The focus is on metal-organic frameworks, a promising and popular material type for carbon capture, but which has thousands of variations that have not yet been widely tested.

The Georgia Tech team partnered with Oak Ridge National Lab and Metas FAIR to simulate quantum chemical interactions on these materials. Around 400 million computing hours were required – far more than a university can easily manage. Hopefully it will help climate scientists working in this field. It’s all documented here.

We hear a lot about AI applications in the medical field, although most have a sort of advisory role, helping experts notice things they might not otherwise have seen or recognize patterns that would have taken a technician hours to identify to find. This is partly because these machine learning models simply find connections between statistics without understanding what caused or led to what. Researchers from Cambridge and the Ludwig Maximilian University of Munich are working on this because overcoming fundamental correlational relationships could be extremely helpful in creating treatment plans.

The aim of the work, led by Professor Stefan Feuerriegel from LMU, is to create models that can identify causal mechanisms and not just correlations: “We give the machine rules to recognize the causal structure and correctly formalize the problem.” Then the machine has to learn to recognize the effects of interventions and, so to speak, understand how the real consequences are reflected in the data that is fed into the computers,” he said. It’s early days for them and they are aware of it, but they believe their work is part of an important decade-long development period.

Over at the University of Pennsylvania, graduate student Ro Encarnación is working on a new perspective in the field of “algorithmic justice”. We’ve seen pioneering work over the last seven or eight years (mostly by women and people of color). Her work focuses more on the users than the platforms and documents what she calls “emergent auditing.”

What do users do when Tiktok or Instagram publishes a somewhat racist filter or an image generator that causes something sensational? They may complain, but they continue to use it and learn to work around or even exacerbate the problems it encodes. It may not be a “solution” as we imagine, but it shows the diversity and resilience of the user side of the equation – it is not as fragile or passive as you might think.

Source link