AI

OpenAI Puzzled as New Models Show Rising Hallucination Rates 98

OpenAI's latest reasoning models, o3 and o4-mini, hallucinate more frequently than the company's previous AI systems, according to both internal testing and third-party research. On OpenAI's PersonQA benchmark, o3 hallucinated 33% of the time -- double the rate of older models o1 (16%) and o3-mini (14.8%). The o4-mini performed even worse, hallucinating 48% of the time. Nonprofit AI lab Transluce discovered o3 fabricating processes it claimed to use, including running code on a 2021 MacBook Pro "outside of ChatGPT." Stanford adjunct professor Kian Katanforoosh noted his team found o3 frequently generates broken website links.

OpenAI says in its technical report that "more research is needed" to understand why hallucinations worsen as reasoning models scale up.
Movies

Netflix Revenue Rises To $10.5 Billion Following Price Hike (theverge.com) 15

Netflix's Q1 revenue rose to $10.5 billion, a 13% increase from last year, while net income grew to $2.9 billion. The company says it expects more growth in the coming months when it sees "the full quarter benefit from recent price changes and continued growth in membership and advertising revenue." The Verge reports: Netflix raised the prices across most of its plans in January, with its premium plan hitting $24.99 per month. It also increased the price of its Extra Member option -- its solution to password sharing -- to $8.99 per month. Though Netflix already rolled out the increase in the US, UK, and Argentina, the streamer now plans to do the same in France. This is the first quarter that Netflix didn't reveal how many subscribers it gained or lost. It decided to only report "major subscriber milestones" last year, as other streams of revenue continue to grow, like advertising, continue to grow. Netflix last reported having 300 million global subscribers in January.

During an earnings call on Thursday, Netflix co-CEO Greg Peters said the company expects to "roughly double" advertising revenue in 2025. The company launched its own advertising technology platform earlier this month. There are some changes coming to Netflix, too, as Peters confirmed that its homepage redesign for its TV app will roll out "later this year." He also hinted at adding an "interactive" search feature using "generative technologies," which sounds a lot like the AI feature Bloomberg reported on last week.
Further reading: Netflix CEO Counters Cameron's AI Cost-Cutting Vision: 'Make Movies 10% Better'
AI

Study Finds 50% of Workers Use Unapproved AI Tools 18

An anonymous reader quotes a report from SecurityWeek: An October 2024 study by Software AG suggests that half of all employees are Shadow AI users, and most of them wouldn't stop even if it was banned. The problem is the ease of access to AI tools, and a work environment that increasingly advocates the use of AI to improve corporate efficiency. It is little wonder that employees seek their own AI tools to improve their personal efficiency and maximize the potential for promotion. It is frictionless, says Michael Marriott, VP of marketing at Harmonic Security. 'Using AI at work feels like second nature for many knowledge workers now. Whether it's summarizing meeting notes, drafting customer emails, exploring code, or creating content, employees are moving fast.' If the official tools aren't easy to access or if they feel too locked down, they'll use whatever's available which is often via an open tab on their browser.

There is almost also never any malicious intent (absent, perhaps, the mistaken employment of rogue North Korean IT workers); merely a desire to do and be better. If this involves using unsanctioned AI tools, employees will likely not disclose their actions. The reasons may be complex but combine elements of a reluctance to admit that their efficiency is AI assisted rather than natural, and knowledge that use of personal shadow AI might be discouraged. The result is that enterprises often have little knowledge of the extent of Shadow IT, nor the risks it may present.
According to an analysis from Harmonic, ChatGPT is the dominant gen-AI model used by employees, with 45% of data prompts originating from personal accounts (such as Gmail). Image files accounted for 68.3%. The report also notes that 7% of empmloyees were using Chinese AI models like DeepSeek, Baidu Chat and Qwen.

"Overall, there has been a slight reduction in sensitive prompt frequency from Q4 2024 (down from 8.5% to 6.7% in Q1 2025)," reports SecurityWeek. "However, there has been a shift in the risk categories that are potentially exposed. Customer data (down from 45.8% to 27.8%), employee data (from 26.8% to 14.3%) and security (6.9% to 2.1%) have all reduced. Conversely, legal and financial data (up from 14.9% to 30.8%) and sensitive code (5.6% to 10.1%) have both increased. PII is a new category introduced in Q1 2025 and was tracked at 14.9%."
AI

Actors Who Sold AI Avatars Stuck In Black Mirror-Esque Dystopia (arstechnica.com) 16

Some actors who sold their likenesses to AI video companies like Synthesia now regret the decision, after finding their digital avatars used in misleading, embarrassing, or politically charged content. Ars Technica reports: Among them is a 29-year-old New York-based actor, Adam Coy, who licensed rights to his face and voice to a company called MCM for one year for $1,000 without thinking, "am I crossing a line by doing this?" His partner's mother later found videos where he appeared as a doomsayer predicting disasters, he told the AFP. South Korean actor Simon Lee's AI likeness was similarly used to spook naive Internet users but in a potentially more harmful way. He told the AFP that he was "stunned" to find his AI avatar promoting "questionable health cures on TikTok and Instagram," feeling ashamed to have his face linked to obvious scams. [...]

Even a company publicly committed to ethically developing AI avatars and preventing their use in harmful content like Synthesia can't guarantee that its content moderation will catch everything. A British actor, Connor Yeates, told the AFP that his video was "used to promote Ibrahim Traore, the president of Burkina Faso who took power in a coup in 2022" in violation of Synthesia's terms. [...] Yeates was paid about $5,000 for a three-year contract with Synthesia that he signed simply because he doesn't "have rich parents and needed the money." But he likely couldn't have foreseen his face being used for propaganda, as even Synthesia didn't anticipate that outcome.

Others may not like their AI avatar videos but consider the financial reward high enough to make up for the sting. Coy confirmed that money motivated his decision, and while he found it "surreal" to be depicted as a con artist selling a dystopian future, that didn't stop him from concluding that "it's decent money for little work." Potentially improving the climate for actors, Synthesia is forming a talent program that it claims will give actors a voice in decision-making about AI avatars. "By involving actors in decision-making processes, we aim to create a culture of mutual respect and continuous improvement," Synthesia's blog said.

AI

Netflix CEO Counters Cameron's AI Cost-Cutting Vision: 'Make Movies 10% Better' 24

Netflix Co-CEO Ted Sarandos pushed back on director James Cameron's recent assertion that AI could slash film production costs by half, arguing instead for quality improvements over cost reduction during Netflix's first-quarter earnings call Thursday. "I read the article too about what Jim Cameron said about making movies 50% cheaper," Sarandos said. "I remain convinced that there's an even bigger opportunity to make movies 10% better."

Sarandos pointed to Netflix's current AI implementations in set references, pre-visualization, VFX sequence preparation, and shot planning. He said AI-powered tools have democratized high-end visual effects that were once exclusive to big-budget productions. The executive cited 2019's "The Irishman" as a benchmark, noting its "very cutting-edge, very expensive de-aging technology that still had massive limitations." In contrast, he referenced cinematographer Rodrigo Prieto's directorial debut "Pedro Paramo," which employed AI-powered de-aging at "a fraction" of The Irishman's cost. "The entire budget of the film was about what the VFX cost on The Irishman," Sarandos explained. "Same creator using new tools, better tools, to do what was impossible five years ago."
Science

The Most-Cited Papers of the Twenty-First Century (nature.com) 13

Nature has published an analysis of the 21st century's most-cited scientific papers, revealing a surprising pattern: breakthrough discoveries like mRNA vaccines, CRISPR, and gravitational waves don't make the list. Instead, a 2016 Microsoft paper on "deep residual learning" networks claims the top spot, with citations ranging from 103,756 to 254,074 depending on the database.

The list overwhelmingly features methodology papers and software tools rather than groundbreaking discoveries. AI research dominates with four papers in the top ten, including Google's 2017 "Attention is all you need" paper that underpins modern language models.

The second-most-cited paper -- a 2001 guide for analyzing gene expression data -- was explicitly created to be cited after journal reviewers rejected references to a technical manual. As sociologist Misha Teplitskiy noted, "Scientists say they value methods, theory and empirical discoveries, but in practice the methods get cited more."
AI

AI Support Bot Invents Nonexistent Policy (arstechnica.com) 50

An AI support bot for the code editor Cursor invented a nonexistent subscription policy, triggering user cancellations and public backlash this week. When developer "BrokenToasterOven" complained about being logged out when switching between devices, the company's AI agent "Sam" falsely claimed this was intentional: "Cursor is designed to work with one device per subscription as a core security feature."

Users took the fabricated policy as official, with several announcing subscription cancellations on Reddit. "I literally just cancelled my sub," wrote the original poster, adding that their workplace was "purging it completely." Cursor representatives scrambled to correct the misinformation: "Hey! We have no such policy. You're of course free to use Cursor on multiple machines." Cofounder Michael Truell later apologized, explaining that a backend security change had unintentionally created login problems.
Moon

ESA Video Game Trains AI To Recognize Craters On the Moon 4

Longtime Slashdot reader Qbertino writes: German public news outlet Tagesschau reports (source: YouTube) on an ESA video game that helps train a future moon lander's guidance AI to spot craters. Games have already helped collect visual data on millions of craters. The University Darmstadt developed the game, called IMPACT, to support ESA's efforts to establish a base on the moon. An older article from August 2024 provides further details on the project.
Australia

Q-CTRL Unveils Jam-Proof Positioning System That's 50x More Accurate Than GPS (interestingengineering.com) 101

schwit1 shares a report from Interesting Engineering: Australia's Q-CTRL developed a new system called "Ironstone Opal," which uses quantum sensors to navigate without GPS. It's passive (meaning it doesn't emit signals that could be detected or jammed) and highly accurate. Instead of relying on satellites, Q-CTRL's system can read the Earth's magnetic field, which varies slightly depending on location (like a magnetic fingerprint or map). The system can determine where you are by measuring these variations using magnetometers. This is made possible using the company's proprietary quantum sensors, which are incredibly sensitive and stable. The system also comes with special AI-based software, which filters out interference like vibrations or electromagnetic noise (what they call "software ruggedization"). The system is small and compact and could, in theory, be installed in drones or cars and, of course, aircraft.

Q-CTRL ran some live tests on the ground and in the air to validate the technology. As anticipated, they found that it could operate completely independently of GPS. Moreover, the company reports that its quantum GPS was 50 times more accurate than traditional GPS backup systems (like Inertial Navigation Systems or INS). The systems also delivered navigation precision on par with hitting a bullseye from 1,000 yards. Even when the equipment was mounted inside a plane, where interference is much worse, it outperformed existing systems by at least 11x. This is the first time quantum technology has been shown to outperform existing tech in a real-world commercial or military application, a milestone referred to as achieving "quantum advantage."

AI

Police Using AI Personas to Infiltrate Online Activist Spaces, Records Reveal (wired.com) 77

samleecole shares a report from 404 Media and Wired: American police departments near the United States-Mexico border are paying hundreds of thousands of dollars for an unproven and secretive technology that uses AI-generated online personas designed to interact with and collect intelligence on "college protesters," "radicalized" political activists, and suspected drug and human traffickers, according to internal documents, contracts, and communications 404 Media obtained via public records requests. Massive Blue, the New York-based company that is selling police departments this technology, calls its product Overwatch, which it markets as an "AI-powered force multiplier for public safety" that "deploys lifelike virtual agents, which infiltrate and engage criminal networks across various channels." According to a presentation obtained by 404 Media, Massive Blue is offering cops these virtual personas that can be deployed across the internet with the express purpose of interacting with suspects over text messages and social media. [...]

While the documents don't describe every technical aspect of how Overwatch works, they do give a high-level overview of what it is. The company describes a tool that uses AI-generated images and text to create social media profiles that can interact with suspected drug traffickers, human traffickers, and gun traffickers. After Overwatch scans open social media channels for potential suspects, these AI personas can also communicate with suspects over text, Discord, and other messaging services. The documents we obtained don't explain how Massive Blue determines who is a potential suspect based on their social media activity. Salzwedel, of Pinal County, said "Massive Blue's solutions crawl multiple areas of the Internet, and social media outlets are just one component. We cannot disclose any further information to preserve the integrity of our investigations." [...] Besides scanning social media and engaging suspects with AI personas, the presentation says that Overwatch can use generative AI to create "proof of life" images of a person holding a sign with a username and date written on it in pen.

AI

Microsoft Researchers Develop Hyper-Efficient AI Model That Can Run On CPUs 59

Microsoft has introduced BitNet b1.58 2B4T, the largest-scale 1-bit AI model to date with 2 billion parameters and the ability to run efficiently on CPUs. It's openly available under an MIT license. TechCrunch reports: The Microsoft researchers say that BitNet b1.58 2B4T is the first bitnet with 2 billion parameters, "parameters" being largely synonymous with "weights." Trained on a dataset of 4 trillion tokens -- equivalent to about 33 million books, by one estimate -- BitNet b1.58 2B4T outperforms traditional models of similar sizes, the researchers claim.

BitNet b1.58 2B4T doesn't sweep the floor with rival 2 billion-parameter models, to be clear, but it seemingly holds its own. According to the researchers' testing, the model surpasses Meta's Llama 3.2 1B, Google's Gemma 3 1B, and Alibaba's Qwen 2.5 1.5B on benchmarks including GSM8K (a collection of grade-school-level math problems) and PIQA (which tests physical commonsense reasoning skills). Perhaps more impressively, BitNet b1.58 2B4T is speedier than other models of its size -- in some cases, twice the speed -- while using a fraction of the memory.

There is a catch, however. Achieving that performance requires using Microsoft's custom framework, bitnet.cpp, which only works with certain hardware at the moment. Absent from the list of supported chips are GPUs, which dominate the AI infrastructure landscape.
Education

Google Is Gifting Gemini Advanced To US College Students 30

Google is offering all U.S. college students a free year of its Gemini Advanced AI tools through its Google One AI Premium plan, as part of a push to expand Gemini's user base and compete with ChatGPT. It includes access to the company's Pro models, Veo 2 video generation, NotebookLM, Gemini Live and 2TB of Drive storage. Ars Technica reports: Google has a new landing page for the deal, allowing eligible students to sign up for their free Google One AI Premium plan. The offer is valid from now until June 30. Anyone who takes Google up on it will enjoy the free plan through spring 2026. The company hasn't specified an end date, but we would wager it will be June of next year. Google's intention is to give students an entire school year of Gemini Advanced from now through finals next year. At the end of the term, you can bet Google will try to convert students to paying subscribers.

As for who qualifies as a "student" in this promotion, Google isn't bothering with a particularly narrow definition. As long as you have a valid .edu email address, you can sign up for the offer. That's something that plenty of people who are not actively taking classes still have. You probably won't even be taking undue advantage of Google if you pretend to be a student -- the company really, really wants people to use Gemini, and it's willing to lose money in the short term to make that happen.
Privacy

ChatGPT Models Are Surprisingly Good At Geoguessing (techcrunch.com) 15

An anonymous reader quotes a report from TechCrunch: There's a somewhat concerning new trend going viral: People are using ChatGPT to figure out the location shown in pictures. This week, OpenAI released its newest AI models, o3 and o4-mini, both of which can uniquely "reason" through uploaded images. In practice, the models can crop, rotate, and zoom in on photos -- even blurry and distorted ones -- to thoroughly analyze them. These image-analyzing capabilities, paired with the models' ability to search the web, make for a potent location-finding tool. Users on X quickly discovered that o3, in particular, is quite good at deducing cities, landmarks, and even restaurants and bars from subtle visual clues.

In many cases, the models don't appear to be drawing on "memories" of past ChatGPT conversations, or EXIF data, which is the metadata attached to photos that reveal details such as where the photo was taken. X is filled with examples of users giving ChatGPT restaurant menus, neighborhood snaps, facades, and self-portraits, and instructing o3 to imagine it's playing "GeoGuessr," an online game that challenges players to guess locations from Google Street View images. It's an obvious potential privacy issue. There's nothing preventing a bad actor from screenshotting, say, a person's Instagram Story and using ChatGPT to try to doxx them.

AI

Bot Students Siphon Millions in Financial Aid from US Community Colleges (voiceofsandiego.org) 47

Fraud rings using fake "bot" students have infiltrated America's community colleges, stealing over $11 million from California's system alone in 2024. The nationwide scheme, which began in 2021, targets open-admission institutions where scammers enroll fictitious students in online courses to collect financial aid disbursements.

"We didn't used to have to decide if our students were human," said Eric Maag, who has taught at Southwestern College for 21 years. Faculty now spend hours vetting suspicious enrollees and analyzing AI-generated assignments. At Southwestern in Chula Vista, professor Elizabeth Smith discovered 89 of her 104 enrolled students were fraudulent. The California Community College system estimates 25% of all applicants statewide are bots. Community college administrators describe fighting an evolving technological battle against increasingly sophisticated fraud tactics. The fraud crisis has particularly impacted asynchronous online courses, crowding real students out of classes and fundamentally altering faculty roles.
Facebook

Meta Blocks Apple Intelligence in iOS Apps (9to5mac.com) 22

Meta has disabled Apple Intelligence features across its iOS applications, including Facebook, WhatsApp, and Threads, according to Brazilian tech blog Sorcererhat Tech. The block affects Writing Tools, which enable text creation and editing via Apple's AI, as well as Genmoji generation. Users cannot access these features via the standard text field interface in Meta apps. Instagram Stories have also lost previously available keyboard stickers and Memoji functionality.

While Meta hasn't explained the decision, it likely aims to drive users toward Meta AI, its own artificial intelligence service that offers similar text and image generation capabilities. The move follows failed negotiations between Apple and Meta regarding Llama integration into Apple Intelligence, which reportedly collapsed over privacy disagreements. The companies also maintain ongoing disputes regarding App Store policies.
Television

LG TVs' Integrated Ads Get More Personal With Tech That Analyzes Viewer Emotions (arstechnica.com) 122

LG is partnering with Zenapse to integrate AI-driven emotional intelligence into its smart TVs, enabling hyper-targeted ads based on viewers' psychological traits, emotions, and behaviors. Ars Technica reports: The upcoming advertising approach comes via a multi-year licensing deal with Zenapse, a company describing itself as a software-as-a-service marketing platform that can drive advertiser sales "with AI-powered emotional intelligence." LG will use Zenapse's technology to divide webOS users into hyper-specific market segments that are supposed to be more informative to advertisers. LG Ad Solutions, LG's advertising business, announced the partnership on Tuesday.

The technology will be used to inform ads shown on LG smart TVs' homescreens, free ad-supported TV (FAST) channels, and elsewhere throughout webOS, per StreamTV Insider. LG will also use Zenapse's tech to "expand new software development and go-to-market products," it said. LG didn't specify the duration of its licensing deal with Zenapse. Zenapse's platform for connected TVs (CTVs), ZenVision, is supposed to be able to interpret the types of emotions shown in the content someone is watching on TV, partially by using publicly available information about the show's or movie's script and plot, StreamTV Insider reported. ZenVision also analyzes viewer behavior, grouping viewers based on their consumption patterns, the publication noted. Under the new partnership, ZenVision can use data that LG has gathered from the automatic content recognition software in LG TVs.

With all this information, ZenVision will group LG TV viewers into highly specified market segments, such as "goal-driven achievers," "social connectors," or "emotionally engaged planners," an LG spokesperson told StreamTV Insider. Zenapse's website for ZenVision points to other potential market segments, including "digital adopters," "wellness seekers," "positive impact & environment," and "money matters." Companies paying to advertise on LG TVs can then target viewers based on the ZenVision-specified market segments and deliver an "emotionally intelligent ad," as Zenapse's website puts it.

Businesses

OpenAI In Talks To Buy Windsurf For About $3 Billion (reuters.com) 5

According to Bloomberg (paywalled), OpenAI is in talks to buy AI-assisted coding tool Windsurf for about $3 billion. "The deal would be OpenAI's largest to date, the terms of which have not yet been finalized," notes Reuters. From a report: Windsurf was in talks with investors such as Kleiner Perkins and General Catalyst to raise funding at a $3 billion valuation, the report added. It closed a $150 million funding round led by General Catalyst last year, valuing it at $1.25 billion.
Google

Google Used AI To Suspend Over 39 Million Ad Accounts Suspected of Fraud (techcrunch.com) 25

An anonymous reader quotes a report from TechCrunch: Google on Wednesday said it suspended 39.2 million advertiser accounts on its platform in 2024 -- more than triple the number from the previous year -- in its latest crackdown on ad fraud. By leveraging large language models (LLMs) and using signals such as business impersonation and illegitimate payment details, the search giant said it could suspend a "vast majority" of ad accounts before they ever served an ad.

Last year, Google launched over 50 LLM enhancements to improve its safety enforcement mechanisms across all its platforms. "While these AI models are very, very important to us and have delivered a series of impressive improvements, we still have humans involved throughout the process," said Alex Rodriguez, a general manager for Ads Safety at Google, in a virtual media roundtable. The executive told reporters that a team of over 100 experts assembled across Google, including members from the Ads Safety team, the Trust and Safety division, and researchers from DeepMind.
"In total, Google said it blocked 5.1 billion ads last year and removed 1.3 billion pages," adds TechCrunch. "In comparison, it blocked over 5.5 billion ads and took action against 2.1 billion publisher pages in 2023. The company also restricted 9.1 billion ads last year, it said."
AI

OpenAI Debuts Codex CLI, an Open Source Coding Tool For Terminals (techcrunch.com) 9

OpenAI has released Codex CLI, an open-source coding agent that runs locally in users' terminal software. Announced alongside the company's new o3 and o4-mini models, Codex CLI directly connects OpenAI's AI systems with local code and computing tasks, enabling them to write and manipulate code on users' machines.

The lightweight tool allows developers to leverage multimodal reasoning capabilities by passing screenshots or sketches to the model while providing access to local code repositories. Unlike more ambitious future plans for an "agentic software engineer" that could potentially build entire applications from descriptions, Codex CLI focuses specifically on integrating AI models with command-line interfaces.

To accelerate adoption, OpenAI is distributing $1 million in API credits through a grant program, offering $25,000 blocks to selected projects. While the tool expands AI's role in programming workflows, it comes with inherent risks -- studies show AI coding models frequently fail to fix security vulnerabilities and sometimes introduce new bugs, particularly concerning when given system-level access.
AI

OpenAI Unveils o3 and o4-mini Models (openai.com) 2

OpenAI has released two new AI models that can "think with images" during their reasoning process. The o3 and o4-mini models represent a significant advancement in visual perception, enabling them to manipulate images -- cropping, zooming, and rotating -- as part of their analytical process.

Unlike previous models, o3 and o4-mini can agentically use all of ChatGPT's tools, including web search, Python code execution, and image generation. This allows them to tackle multi-faceted problems by selecting appropriate tools based on the task at hand.

The models have set new state-of-the-art performance benchmarks across multiple domains. On visual tasks, o3 achieved 86.8% accuracy on MathVista and 78.6% on CharXiv-Reasoning, while o4-mini scored 91.6% on AIME 2024 competitions. In expert evaluations, o3 made 20% fewer major errors than its predecessor on complex real-world tasks. ChatGPT Plus, Pro, and Team users will see o3, o4-mini, and o4-mini-high in the model selector starting today, replacing o1, o3â'mini, and o3â'miniâ'high.

Slashdot Top Deals