AI

Instacart's AI Recipes Look Literally Impossible (404media.co) 36

An anonymous reader shares a report: I hate cookbooks without pictures. We eat with our eyes first, as chefs love to say, but what's more important to me is that if I'm making a dish for the first time, I want to see what the final product should look like to know I did it right. It's not so much about presentation as it is about knowing that I browned the chicken skin enough. An image of a recipe will not be this useful, I think, if it was AI-generated, and especially so if the fact that the image was AI-generated wasn't disclosed by the recipe. That, to my surprise, is exactly the case with thousands of recipes the grocery delivery service Instacart is suggesting to its users. Some of the recipes include unheard of measurements and ingredients that don't appear to exist.

[...] As I was browsing, I noticed that Instacart was offering me recipes that appeared to complement the ingredients I was looking at. The concept doesn't make a ton of sense to me -- I'm going to Instacart for the ingredients I know I need for the food I know I'm going to make, not for food inspo -- but I had to click on a recipe for "Watermelon Popsicle with Chocolate Chips" because it looked weird in the thumbnail. Since I have eyeballs with optical nerves that are connected to a semi-functioning brain I can tell that the image was generated by AI. To be more specific, I can see that the top corner of the plate doesn't match its square shape, that the table-ish looking thing it's resting on is made up of jumbled slats (AI is particularly bad at making these series of long, straight lines), and then there are the titular watermelon popsicles, which defy physical reality. They clip into each other like bad 3D models in a video game, one of them to the left appears hollow, and for some reason they are skewered by what appears to be asparagus spears on the bottom end and capped by impossible small watermelon rinds at the top.

Windows

Windows 11 Users Herded Toward 23H2 Via Automatic Upgrade (theregister.com) 87

Windows 11 users still clinging to the past are to be dragged into a bright, 23H2-shaped future by Microsoft, whether they want to or not. From a report: Microsoft has added a notification to its Release Health dashboard warning Windows 11 users that it is time for the beatings automatic upgrades to begin. "We are starting to update eligible Windows 11 devices automatically to version 23H2."

As for what eligible means, according to Microsoft, this is "Windows 11 devices that have reached or are approaching end of servicing." Support for Windows 11 21H2 came to an end last year on October 10, 2023, and version 22H2 is due to end on October 8, 2024. Win 11 23H2 itself will endure until November 11, 2025, or just after the plug gets pulled on Windows 10. The update comes shortly after Microsoft quashed the last of its compatibility holds in Windows 11 23H2, which affected customers attempting to use the Co-pilot preview with multiple monitors. Icons tended to move unexpectedly between monitors.

Security

US Health Tech Giant Change Healthcare Hit by Cyberattack (techcrunch.com) 17

U.S. healthcare technology giant Change Healthcare has confirmed a cyberattack on its systems. In a brief statement, the company said it was "experiencing a network interruption related to a cyber security issue." From a report: "Once we became aware of the outside threat, in the interest of protecting our partners and patients, we took immediate action to disconnect our systems to prevent further impact," Change Healthcare wrote on its status page. "The disruption is expected to last at least through the day."

The incident began early on Tuesday morning on the U.S. East Coast, according to the incident tracker. The specific nature of the cybersecurity incident was not disclosed. Most of the login pages for Change Healthcare were inaccessible or offline when TechCrunch checked at the time of writing. Michigan local newspaper the Huron Daily Tribune is reporting that local pharmacies are experiencing outages due to the Change Healthcare cyberattack.

AI

Google Pauses AI Image-generation of People After Diversity Backlash (ft.com) 198

Google has temporarily stopped its latest AI model, Gemini, from generating images of people (non-paywalled link) , as a backlash erupted over the model's depiction of people from diverse backgrounds. From a report: Gemini creates realistic images based on users' descriptions in a similar manner to OpenAI's ChatGPT. Like other models, it is trained not to respond to dangerous or hateful prompts, and to introduce diversity into its outputs. However, some users have complained that it has overcorrected towards generating images of women and people of colour, such that they are featured in historically inaccurate contexts, for instance in depictions of Viking kings.

Google said in a statement: "We're working to improve these kinds of depictions immediately. Gemini's image generation does generate a wide range of people. And that's generally a good thing because people around the world use it. But it's missing the mark here." It added that it would "pause the image-generation of people and will re-release an improved version soon."

AI

Google Admits Gemini Is 'Missing the Mark' With Image Generation of Historical People 67

Google's Gemini AI chatbot is under fire for generating historically inaccurate images, particularly when depicting people from different eras and nationalities. Google acknowledges the issue and is actively working to refine Gemini's accuracy, emphasizing that while diversity in image generation is valued, adjustments are necessary to meet historical accuracy standards. 9to5Google reports: The Twitter/X post in particular that brought this issue to light showed prompts to Gemini asking for the AI to generate images of Australian, American, British, and German women. All four prompts resulted in images of women with darker skin tones, which, as Google's Jack Krawcyczk pointed out, is not incorrect, but may not be what is expected.

But a bigger issue that was noticed in the wake of that post was that Gemini also struggles to accurately depict human beings in a historical context, with those being depicted often having darker skin tones or being of particular nationalities that are not historically accurate. Google, in a statement posted to Twitter/X, admits that Gemini AI image generation is "missing the mark" on historical depictions and that the company is working to improve it. Google also does say that the diversity represented in images generated by Gemini is "generally a good thing," but it's clear some fine-tuning needs to happen.
Further reading: Why Google's new AI Gemini accused of refusing to acknowledge the existence of white people (The Daily Dot)
Transportation

Waymo's Application To Expand California Robotaxi Operations Paused By Regulators (techcrunch.com) 15

The California Public Utilities Commission's Consumer Protection and Enforcement Division (CPED) has suspended Waymo's application to expand its robotaxi service in Los Angeles and San Mateo counties, putting "an abrupt halt to the company's aspirations to expand where it can operate -- at least until June 2024," reports TechCrunch. It does not, however, change the autonomous car company's ability to commercially operate its fleet in San Francisco. From the report: The CPED said on its website that the application has been suspended for further staff review. The "suspension" of an advice letter is a procedural part of the CPUC's standard and robust review process, according to Waymo. San Mateo County Board of Supervisors vice president David J. Canepa took a different stance, however.

"Since Waymo has stalled any meaningful discussions on its expansion plans into Silicon Valley, the CPUC has put the brakes on its application to test robotaxi service virtually unfettered both in San Mateo and Los Angeles counties," Canepa said. "This will provide the opportunity to fully engage the autonomous vehicle maker on our very real public safety concerns that have caused all kinds of dangerous situations for firefighters and police in neighboring San Francisco."

Waymo noted that it has reached out to two dozen government and business organizations as part of its outreach effort, including officials in cities throughout San Mateo County such as Burlingame, Daly City and Foster City, the San Mateo County Sheriff's Office and local chambers of commerce. [...] The city of South San Francisco, Los Angeles County Department of Transportation, San Francisco County Transportation Authority, San Mateo County Office of the County Attorney and the San Francisco Taxi Workers Alliance have sent letters opposing the expansion.

Businesses

Reddit To Offer Shares In IPO To 75,000 of Its Most Active Users (marketwatch.com) 38

According to the Wall Street Journal (paywalled), Reddit plans to sell a chunk of its IPO shares to 75,000 of its most loyal users. Reuters reports: It aims to reserve an as-yet-undetermined number of shares for 75,000 of its most prolific so-called redditors when it goes public next month, the report said, citing people familiar with the matter. The users will have the opportunity to buy Reddit shares at its initial public offering (IPO) price before the stock starts trading, a privilege normally reserved only for big investors, the report said. Reddit's IPO, which has been in the works for more than three years now, would be the first from a major social media company since Pinterest's debut in 2019.
Bug

Firefly Software Snafu Sends Lockheed Satellite on Short-Lived Space Safari (theregister.com) 25

A software error on the part of Firefly Aerospace doomed Lockheed Martin's Electronic Steerable Antenna (ESA) demonstrator to a shorter-than-expected orbital life following a botched Alpha launch. From a report: According to Firefly's mission update, the error was in the Guidance, Navigation, and Control (GNC) software algorithm, preventing the system from sending the necessary pulse commands to the Reaction Control System (RCS) thrusters before the relight of the second stage. The result was that Lockheed's payload was left in the wrong orbit, and Firefly's engineers were left scratching their heads.

The launch on December 22, 2023 -- dubbed "Fly the Lightning" -- seemed to go well at first. It was the fourth for the Alpha, and after Firefly finally registered a successful launch a few months earlier in September, initial indications looked good. However, a burn of the second stage to circularize the orbit did not go to plan, and Lockheed's satellite was left in the wrong orbit, with little more than weeks remaining until it re-entered the atmosphere.

As it turned out, the Lockheed team completed their primary mission objectives. The payload was, after all, designed to demonstrate faster on-orbit sensor calibration. Just perhaps not quite that fast. Software issues aboard spacecraft are becoming depressingly commonplace. A recent example was the near disastrous first launch of Boeing's CST-100 Starliner, where iffy code could have led, in NASA parlance, to "spacecraft loss." In a recent interview with The Register, former Voyager scientist Garry Hunt questioned if the commercial spaceflight sector of today would take the same approach to quality as the boffins of the past.

AI

Google Launches Two New Open LLMs (techcrunch.com) 15

Barely a week after launching the latest iteration of its Gemini models, Google today announced the launch of Gemma, a new family of lightweight open-weight models. From a report: Starting with Gemma 2B and Gemma 7B, these new models were "inspired by Gemini" and are available for commercial and research usage. Google did not provide us with a detailed paper on how these models perform against similar models from Meta and Mistral, for example, and only noted that they are "state-of-the-art."

The company did note that these are dense decoder-only models, though, which is the same architecture it used for its Gemini models (and its earlier PaLM models) and that we will see the benchmarks later today on Hugging Face's leaderboard. To get started with Gemma, developers can get access to ready-to-use Colab and Kaggle notebooks, as well as integrations with Hugging Face, MaxText and Nvidia's NeMo. Once pre-trained and tuned, these models can then run everywhere. While Google highlights that these are open models, it's worth noting that they are not open-source. Indeed, in a press briefing ahead of today's announcement, Google's Janine Banks stressed the company's commitment to open source but also noted that Google is very intentional about how it refers to the Gemma models.

AI

Google DeepMind Alumni Unveil Bioptimus: Aiming To Build First Universal Biology AI Model (venturebeat.com) 5

An anonymous reader quotes a report from VentureBeat: As the French startup ecosystem continues to boom -- think Mistral, Poolside, and Adaptive -- today the Paris-based Bioptimus, with a mission to build the first universal AI foundation model for biology, emerged from stealth following a seed funding round of $35 million. The new open science model will connect the different scales of biology with generative AI -- from molecules to cells, tissues and whole organisms. Bioptimus unites a team of Google DeepMind alumni and Owkin scientists (AI biotech startup Owkin is itself a French unicorn) who will take advantage of AWS compute and Owkin's data generation capabilities and access to multimodal patient data sourced from leading academic hospitals worldwide. According to a press release, "this all gives the power to create computational representations that establish a strong differentiation against models trained solely on public datasets and a single data modality that are not able to capture the full diversity of biology."

In an interview with VentureBeat, Jean-Philippe Vert, co-founder and CEO of Bioptimus, chief R&D Officer of Owkin and former research lead at Google Brain, said as a smaller, independent company, Bioptimus can move faster than Google DeepMind to gain direct access to the data needed to train biology models. "We have the advantage of being able to more easily and securely collaborate with partners, and have established a level of trust in our work by sharing our AI expertise and making models available to them for research," he said. "This can be hard for big tech to do. Bioptimus will also leverage some of the strongest sovereignty controls in the market today."

Rodolphe Jenatton, a former research scientist at Google DeepMind, has also joined the Bioptimus team, telling VentureBeat the Bioptimus work will be released as open source/open science, at a similar level to Mistral's model releases. "Transparency and sharing and community will be key elements for us," he said. Currently, AI models are limited to specific aspects of biology, Vert explained. "For example, several companies are starting to build language models for protein sequences," he said, adding that there are also initiatives to build a foundation model for images of cells.

However, there is no holistic view of the totality of biology: "The good news is that the AI technology is converging very quickly, with some architectures that allow to have all the data contribute together to a unified model," he explained. "So this is what we want to do. As far as I know that it does not exist yet. But I'm certain that if we didn't do it, someone else would do it in the near future." The biggest bottleneck, he said, is access to data. "It's very different from training an LLM on text on the web," he said. And that access, he pointed out, is what Bioptimus has in spades, through its Owkin partnership.

EU

EU Opens Formal Investigation Into TikTok Over Possible Online Content Breaches (reuters.com) 18

An anonymous reader quotes a report from Reuters: The European Union will investigate whether ByteDance's TikTok breached online content rules aimed at protecting children and ensuring transparent advertising, an official said on Monday, putting the social media platform at risk of a hefty fine. EU industry chief Thierry Breton said he took the decision after analyzing the short video app's risk assessment report and its replies to requests for information, confirming a Reuters story. "Today we open an investigation into TikTok over suspected breach of transparency & obligations to protect minors: addictive design & screen time limits, rabbit hole effect, age verification, default privacy settings," Breton said on X.

The European Union's Digital Services Act (DSA), which applies to all online platforms since Feb. 17, requires in particular very large online platforms and search engines to do more to tackle illegal online content and risks to public security. TikTok's owner, China-based ByteDance, could face fines of up to 6% of its global turnover if TikTok is found guilty of breaching DSA rules. TikTok said it would continue to work with experts and the industry to keep young people on its platform safe and that it looked forward to explaining this work in detail to the European Commission.

The European Commission said the investigation will focus on the design of TikTok's system, including algorithmic systems which may stimulate behavioral addictions and/or create so-called 'rabbit hole effects'. It will also probe whether TikTok has put in place appropriate and proportionate measures to ensure a high level of privacy, safety and security for minors. As well as the issue of protecting minors, the Commission is looking at whether TikTok provides a reliable database on advertisements on its platform so that researchers can scrutinize potential online risks.

Microsoft

Microsoft Develops AI Server Gear To Lessen Reliance on Nvidia (reuters.com) 3

Microsoft is developing a new network card that could improve the performance of its Maia AI server chip and potentially reduce the company's reliance on chip designer Nvidia, The Information reported on Tuesday. Reuters: Microsoft CEO Satya Nadella has tapped Pradeep Sindhu, who co-founded networking gear developer Juniper Networks, to spearhead the network card effort, the report said citing a person with knowledge of the matter. Microsoft acquired Sindhu's server chip startup, Fungible, last year. The new network card is similar to Nvidia's ConnectX-7 card, which the chip developer sells alongside its graphic processor units (GPUs), the report added. The equipment could take more than a year to develop and, if successful, could lessen the time it takes for OpenAI to train its models on Microsoft servers as well as make the process less expensive, according to the report.
Google

This Tiny Website Is Google's First Line of Defense in the Patent Wars (wired.com) 45

A trio of Google engineers recently came up with a futuristic way to help anyone who stumbles through presentations on video calls. They propose that when algorithms detect a speaker's pulse racing or "umms" lengthening, a generative AI bot that mimics their voice could simply take over. That cutting-edge idea wasn't revealed at a big company event or in an academic journal. Instead, it appeared in a 1,500-word post on a little-known, free website called TDCommons.org that Google has quietly owned and funded for nine years. WIRED: Until WIRED received a link to an idea on TDCommons last year and got curious, Google had never spoken with the media about its website. Scrolling through TDCommons, you can read Google's latest ideas for coordinating smart home gadgets for better sleep, preserving privacy in mobile search results, and using AI to summarize a person's activities from their photo archives. And the submissions aren't exclusive to Google; about 150 organizations, including HP, Cisco, and Visa, also have posted inventions to the website.

The website is a home for ideas that seem potentially valuable but not worth spending tens of thousands of dollars seeking a patent for. By publishing the technical details and establishing "prior art," Google and other companies can head off future disputes by blocking others from filing patents for similar concepts. Google gives employees a $1,000 bonus for each invention they post to TDCommons -- a tenth of what it awards its patent seekers -- but they also get an immediately shareable link to gloat about otherwise secretive work.

Businesses

International Nest Aware Subscriptions Jump in Price, as Much As 100% (arstechnica.com) 43

Google's "Nest Aware" camera subscription is going through another round of price increases. From a report: This time it's for international users. There's no big announcement or anything, just a smattering of email screenshots from various countries on the Nest subreddit. 9to5Google was nice enough to hunt down a pile of the announcements. Nest Aware is a monthly subscription fee for Google's Nest cameras. Nest cameras exclusively store all their video in the cloud, and without the subscription, you aren't allowed to record video 24/7.

There are two sets of subscriptions to keep track of: the current generation subscription for modern cameras and the "first generation Nest Aware" subscription for older cameras. To give you an idea of what we're dealing with, in the US, the current free tier only gets you three hours of "event" video -- meaning video triggered by motion detection. Even the basic $8-a-month subscription doesn't get you 24/7 recording -- that's still only 30 days of event video. The "Nest Aware Plus" subscription, at $15 a month in the US, gets you 10 days of 24/7 video recording. The "first-generation" Nest Aware subscription, which is tied to earlier cameras and isn't available for new customers anymore, is doubling in price in Canada. The basic tier of five days of 24/7 video is going from a yearly fee of CA$50 to CA$110 (the first-generation sub has 24/7 video on every tier). Ten days of video is jumping from CA$80 to CA$160, and 30 days is going from CA$110 to CA$220. These are the prices for a single camera; the first-generation subscription will have additional charges for additional cameras. The current Nest Aware subscription for modern cameras is getting jumps that look similar to the US, with Nest Aware Plus, the mid-tier, going from CA$16 to CA $20 per month, and presumably similar raises across the board.

IT

Adobe Acrobat Adds Generative AI To 'Easily Chat With Documents' (theverge.com) 31

Adobe is adding a new generative AI experience to its Acrobat PDF management software, which aims to "completely transform the digital document experience" by making information in long documents easier to find and understand. From a report: Announced in Adobe's press release as "AI Assistant in Acrobat," the new tool is described as a "conversational engine" that can summarize files, answer questions, and recommend more based on the content, allowing users to "easily chat with documents" to get the information they need. It's available in beta starting today for paying Acrobat users.

The idea is that the chatbot will reduce the time-consuming tasks related to working with massive text documents -- such as helping students quickly find information for research projects or summarizing large reports into snappy highlights for emails, meetings, and presentations. AI Assistant in Acrobat can be used with all document formats supported by the app, including Word and PowerPoint. The chatbot abides by Adobe's data security protocols, so it won't store data from customer documents or use it to train AI Assistant.
The new AI Assistant experience is available for Acrobat customers on Standard ($12.99 per month) and Pro ($19.99 per month) plans.
Transportation

Biden Administration Is Said To Slow Early Stage of Shift To Electric Cars 343

An anonymous reader shares a report: In a concession to automakers and labor unions, the Biden administration intends to relax elements of one of its most ambitious strategies to combat climate change, limits on tailpipe emissions that are designed to get Americans to switch from gas-powered cars to electric vehicles, according to three people familiar with the plan. Instead of essentially requiring automakers to rapidly ramp up sales of electric vehicles over the next few years, the administration would give car manufacturers more time [non-paywalled source], with a sharp increase in sales not required until after 2030, these people said. They asked to remain anonymous because the regulation has not been finalized. The administration plans to publish the final rule by early spring.

The change comes as President Biden faces intense crosswinds as he runs for re-election while trying to confront climate change. He is aiming to cut carbon dioxide emissions from gasoline-powered vehicles, which make up the largest single source of greenhouse gases emitted by the United States. At the same time, Mr. Biden needs cooperation from the auto industry and political support from the unionized auto workers who backed him in 2020 but now worry that an abrupt transition to electric vehicles would cost jobs. Meanwhile, consumer demand has not been what automakers hoped, with potential buyers put off by sticker prices and the relative scarcity of charging stations.
The EPA last year proposed the toughest-ever limits on tailpipe emissions. The rules would be so strict, the only way car makers could comply would be to sell a tremendous number of zero-emissions vehicles in a relatively short time frame. The E.P.A. designed the proposed regulations so that 67% of sales of new cars and light-duty trucks would be all-electric by 2032, up from 7.6% in 2023, a radical remaking of the American automobile market.
Transportation

Why Are California's EV Sales Dropping? (msn.com) 315

"After years of rapid expansion, California's booming EV market may be showing signs of fatigue," reports the Los Angeles Times, "as high vehicle prices, unreliable charging networks and other consumer headaches appear to dampen enthusiasm for zero-emission vehicles.

"For the first time in more than a decade, electric vehicle sales dropped significantly in the last half of 2023..." Sales of all-electric cars and light trucks in California had started off strong in 2023, rising 48% in the first half of the year compared with a year earlier. By that time, California EV sales numbered roughly 190,807 — or slightly more than a quarter of all EV sales in the nation, according to the California New Car Dealers Assn. But it's what happened in the second half of last year though that's generating jitters. Sales in the third quarter fell by 2,840 from the previous period — the first quarterly drop for EVs in California since the Tesla Model S was introduced in 2012. And the fourth quarter was even worse: Sales dropped 10.2%, from 100,151 to 89,933...

Propelled by the sales success of Tesla, and boosted by electric vehicles from other automakers entering the market, consumer acceptance of EVs had seemed like a given until recently. In fact, robust sales growth is a key assumption in the state's zero-emission vehicle plan... Under the no-gas mandate, zero-emission vehicles must account for 35% of all new vehicle sales by model year 2026.... Nationally, EV sales growth also has slowed as automakers such as Ford and General Motors cut back — at least temporarily — on EV and battery production plans. Hertz, the rental car giant, is also pulling back on plans to shift heavily toward EVs. Hertz several years ago announced plans to buy 100,000 Teslas but is now selling off its EV fleet.

Corey Cantor, EV analyst at Bloomberg BNEF, an energy research firm, said that although recent sales figures are worrisome, there's plenty of momentum behind the EV transition, as evidenced by government mandates around the globe and massive investments by motor vehicle manufacturers and their suppliers. Those investments total $616 billion globally over five years, according to consulting firm AlixPartners.

But EVs haven't reached "price parity" with gas-powered engines, the article points out, so just 7.6% of the vehicles sold last year in the U.S. were electric — while in California, the market share for EVS was 20.1%.

The article also quantifies concerns about reliability of California's public charging system, which "according to studies from academic researchers and market analysts, can be counted on to malfunction at least 20% of the time." After $1 billion in state money for charger companies, the state's Energy Commission will now also start collecting reliability statistics, according to the article. But the article also cites wait times at the chargers. "Even if they were reliable, there aren't enough chargers to go around. EV sales have outpaced public charger installation."

Some good news? The federal government is spending $5 billion nationally to put fast chargers on major highways at 50-mile intervals. California will receive $384 million. Seven major automakers have also teamed up to build a North American charging network of their own, called Ionna. The joint venture plans to install at least 30,000 chargers — which would be open to any EV brand — at stations that will provide restrooms, food service and retail stores on site or nearby.
AI

Can Robots.txt Files Really Stop AI Crawlers? (theverge.com) 97

In the high-stakes world of AI, "The fundamental agreement behind robots.txt [files], and the web as a whole — which for so long amounted to 'everybody just be cool' — may not be able to keep up..." argues the Verge: For many publishers and platforms, having their data crawled for training data felt less like trading and more like stealing. "What we found pretty quickly with the AI companies," says Medium CEO Tony Stubblebin, "is not only was it not an exchange of value, we're getting nothing in return. Literally zero." When Stubblebine announced last fall that Medium would be blocking AI crawlers, he wrote that "AI companies have leached value from writers in order to spam Internet readers."

Over the last year, a large chunk of the media industry has echoed Stubblebine's sentiment. "We do not believe the current 'scraping' of BBC data without our permission in order to train Gen AI models is in the public interest," BBC director of nations Rhodri Talfan Davies wrote last fall, announcing that the BBC would also be blocking OpenAI's crawler. The New York Times blocked GPTBot as well, months before launching a suit against OpenAI alleging that OpenAI's models "were built by copying and using millions of The Times's copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more." A study by Ben Welsh, the news applications editor at Reuters, found that 606 of 1,156 surveyed publishers had blocked GPTBot in their robots.txt file.

It's not just publishers, either. Amazon, Facebook, Pinterest, WikiHow, WebMD, and many other platforms explicitly block GPTBot from accessing some or all of their websites.

On most of these robots.txt pages, OpenAI's GPTBot is the only crawler explicitly and completely disallowed. But there are plenty of other AI-specific bots beginning to crawl the web, like Anthropic's anthropic-ai and Google's new Google-Extended. According to a study from last fall by Originality.AI, 306 of the top 1,000 sites on the web blocked GPTBot, but only 85 blocked Google-Extended and 28 blocked anthropic-ai. There are also crawlers used for both web search and AI. CCBot, which is run by the organization Common Crawl, scours the web for search engine purposes, but its data is also used by OpenAI, Google, and others to train their models. Microsoft's Bingbot is both a search crawler and an AI crawler. And those are just the crawlers that identify themselves — many others attempt to operate in relative secrecy, making it hard to stop or even find them in a sea of other web traffic.

For any sufficiently popular website, finding a sneaky crawler is needle-in-haystack stuff.

In addition, the article points out, a robots.txt file "is not a legal document — and 30 years after its creation, it still relies on the good will of all parties involved.

"Disallowing a bot on your robots.txt page is like putting up a 'No Girls Allowed' sign on your treehouse — it sends a message, but it's not going to stand up in court."
Social Networks

Reddit Has Reportedly Signed Over Its Content to Train AI Models (mashable.com) 78

An anonymous reader shared this report from Reuters: Reddit has signed a contract allowing an AI company to train its models on the social media platform's content, Bloomberg News reported, citing people familiar with the matter... The agreement, signed with an "unnamed large AI company", could be a model for future contracts of a similar nature, Bloomberg reported.
Mashable writes that the move "means that Reddit posts, from the most popular subreddits to the comments of lurkers and small accounts, could build up already-existing LLMs or provide a framework for the next generative AI play." It's a dicey decision from Reddit, as users are already at odds with the business decisions of the nearly 20-year-old platform. Last year, following Reddit's announcement that it would begin charging for access to its APIs, thousands of Reddit forums shut down in protest... This new AI deal could generate even more user ire, as debate rages on about the ethics of using public data, art, and other human-created content to train AI.
Some context from the Verge: The deal, "worth about $60 million on an annualized basis," Bloomberg writes, could still change as the company's plans to go public are still in the works.

Until recently, most AI companies trained their data on the open web without seeking permission. But that's proven to be legally questionable, leading companies to try to get data on firmer footing. It's not known what company Reddit made the deal with, but it's quite a bit more than the $5 million annual deal OpenAI has reportedly been offering news publishers for their data. Apple has also been seeking multi-year deals with major news companies that could be worth "at least $50 million," according to The New York Times.

The news also follows an October story that Reddit had threatened to cut off Google and Bing's search crawlers if it couldn't make a training data deal with AI companies.

The Courts

New Bill Would Let Defendants Inspect Algorithms Used Against Them In Court (theverge.com) 47

Lauren Feiner reports via The Verge: Reps. Mark Takano (D-CA) and Dwight Evans (D-PA) reintroduced the Justice in Forensic Algorithms Act on Thursday, which would allow defendants to access the source code of software used to analyze evidence in their criminal proceedings. It would also require the National Institute of Standards and Technology (NIST) to create testing standards for forensic algorithms, which software used by federal enforcers would need to meet.

The bill would act as a check on unintended outcomes that could be created by using technology to help solve crimes. Academic research has highlighted the ways human bias can be built into software and how facial recognition systems often struggle to differentiate Black faces, in particular. The use of algorithms to make consequential decisions in many different sectors, including both crime-solving and health care, has raised alarms for consumers and advocates as a result of such research.

Takano acknowledged that gaining or hiring the deep expertise needed to analyze the source code might not be possible for every defendant. But requiring NIST to create standards for the tools could at least give them a starting point for understanding whether a program matches the basic standards. Takano introduced previous iterations of the bill in 2019 and 2021, but they were not taken up by a committee.

Slashdot Top Deals