Apple AI Researchers Boast Useful On-Device Model That 'Substantially Outperforms' GPT-4 (9to5mac.com) 40
Zac Hall reports via 9to5Mac: In a newly published research paper (PDF), Apple's AI gurus describe a system in which Siri can do much more than try to recognize what's in an image. The best part? It thinks one of its models for doing this benchmarks better than ChatGPT 4.0. In the paper (ReALM: Reference Resolution As Language Modeling), Apple describes something that could give a large language model-enhanced voice assistant a usefulness boost. ReALM takes into account both what's on your screen and what tasks are active. [...] If it works well, that sounds like a recipe for a smarter and more useful Siri.
Apple also sounds confident in its ability to complete such a task with impressive speed. Benchmarking is compared against OpenAI's ChatGPT 3.5 and ChatGPT 4.0: "As another baseline, we run the GPT-3.5 (Brown et al., 2020; Ouyang et al., 2022) and GPT-4 (Achiam et al., 2023) variants of ChatGPT, as available on January 24, 2024, with in-context learning. As in our setup, we aim to get both variants to predict a list of entities from a set that is available. In the case of GPT-3.5, which only accepts text, our input consists of the prompt alone; however, in the case of GPT-4, which also has the ability to contextualize on images, we provide the system with a screenshot for the task of on-screen reference resolution, which we find helps substantially improve performance."
So how does Apple's model do? "We demonstrate large improvements over an existing system with similar functionality across different types of references, with our smallest model obtaining absolute gains of over 5% for on-screen references. We also benchmark against GPT-3.5 and GPT-4, with our smallest model achieving performance comparable to that of GPT-4, and our larger models substantially outperforming it." Substantially outperforming it, you say? The paper concludes in part as follows: "We show that ReaLM outperforms previous ap- proaches, and performs roughly as well as the state- of-the-art LLM today, GPT-4, despite consisting of far fewer parameters, even for onscreen references despite being purely in the textual domain. It also outperforms GPT-4 for domain-specific user utterances, thus making ReaLM an ideal choice for a practical reference resolution system that can exist on-device without compromising on performance."
Apple also sounds confident in its ability to complete such a task with impressive speed. Benchmarking is compared against OpenAI's ChatGPT 3.5 and ChatGPT 4.0: "As another baseline, we run the GPT-3.5 (Brown et al., 2020; Ouyang et al., 2022) and GPT-4 (Achiam et al., 2023) variants of ChatGPT, as available on January 24, 2024, with in-context learning. As in our setup, we aim to get both variants to predict a list of entities from a set that is available. In the case of GPT-3.5, which only accepts text, our input consists of the prompt alone; however, in the case of GPT-4, which also has the ability to contextualize on images, we provide the system with a screenshot for the task of on-screen reference resolution, which we find helps substantially improve performance."
So how does Apple's model do? "We demonstrate large improvements over an existing system with similar functionality across different types of references, with our smallest model obtaining absolute gains of over 5% for on-screen references. We also benchmark against GPT-3.5 and GPT-4, with our smallest model achieving performance comparable to that of GPT-4, and our larger models substantially outperforming it." Substantially outperforming it, you say? The paper concludes in part as follows: "We show that ReaLM outperforms previous ap- proaches, and performs roughly as well as the state- of-the-art LLM today, GPT-4, despite consisting of far fewer parameters, even for onscreen references despite being purely in the textual domain. It also outperforms GPT-4 for domain-specific user utterances, thus making ReaLM an ideal choice for a practical reference resolution system that can exist on-device without compromising on performance."
lol (Score:3)
Apple boasts. (Score:3)
And Apple has never inflated or otherwise cherry-picked benchmarks or other performance data that happily fits their marketing message, while not disclosing less-than-favorable results.
Let's worry about the performance once they ship it, and less biased critics can review it in totality and say what it's really worth.
Re: (Score:2)
Re: (Score:1, Flamebait)
Not disagreeing with you here but it _is_ interesting that this is on-device; nothing is sent back to servers...
No, no. Nothing needs to be sent back to the servers. That doesn't mean it won't be.
Paranoid? Sorry. I'm living in the now. If they say they aren't collecting your data? They're lying.
Re: (Score:2)
No, no. Nothing needs to be sent back to the servers. That doesn't mean it won't be.
It does if Apple says it does, because they actually have a track record of stuff not going back to servers if you don't want it to.
If they say they aren't collecting your data? They're lying.
Maybe you are incapable of running a traffic proxy to verify but many are not. Apple does keep their word on this.
Re: Apple boasts. (Score:2)
It does if Apple says it does
Oh man, this made me spit out some of my soda from chuckling too hard.
Re: (Score:1)
Re: Apple boasts. (Score:2)
Re: (Score:3)
What was in it? Only Apple knows. There is literally no way to verify what is or isn't sent back.
Sure there is. It’s a device you control, on your network. You simply tell the device to trust a CA you set up, serve up a fake cert, MitM yourself, and then read the unencrypted packets. You can even do it in realtime if you use something more powerful than a Pi. This isn’t rocket science and there are plenty of tools to help researchers do this sort of thing. You can even find people who strip ads out of encrypted packets going between the YouTube app on an Apple TV and Google’s servers [ericdraken.com]
Re: (Score:2)
https://cybershack.com.au/cybershack/apple-caught-spying-on-you-again-iphone-data-not-safe/ [cybershack.com.au]
https://www.wired.com/story/apple-surveillance-technology/ [wired.com]
https://www.wired.com/story/apple-iphone-privacy-analytics-security-roundup/ [wired.com]
Re: (Score:2)
Seriously are all apple cult members this stupid and gullible?
Your username may be "ACForever", but you must be new here if you aren't yet aware that there are pedants on this site.
I refuted an objectively incorrect claim. Nothing more said, nothing more intended, so for you to label me as "stupid and gullible" in response to me providing a factually accurate correction to an objectively false claim made by the previous poster, the only conclusion I can reach is that you either suffer from poor reading comprehension or you are so steeped in your tribalistic thinking t
Re: Apple boasts. (Score:1)
You didn't refute what you think you did.
iDevices send back encrypted data, and you're assuming the code that handles that super secret transaction will be configurable to trust any CA you feed it.
Joke's on you.
Re: (Score:2)
iDevices send back encrypted data, and you're assuming the code that handles that super secret transaction will be configurable to trust any CA you feed it.
Yeah, because it is. You don’t need to take my word on it. You can verify this yourself. If you’re unable to decrypt any packets using the technique I laid out, you”ll know it didn’t accept a different CA. But it does, hence why the contents of those packets have been disclosed.
Re: (Score:2)
You have any proof?
Or you just looking to take drive-by shots based on absolutely nothing but "they COULD BE!!!"?
Why are you throwing shade at someone for making an assumption, when you yourself made the opposite assumption? Or did you even realize that was what you were doing?
In this, as in many other facets of life: if you are going to accuse someone of something, maybe have some proof.
Re: (Score:2)
Unless they are using some bespoke encryption inside the payload instead of relying on HTTPS / TLS, you can man-in-the-middle that by setting up a HTTPS proxy with certificates you sign and load onto the device. Then you can sit at the proxy server and dump whatever you like.
There are several network security products that do this at scale, as a feature. But there's no reason you can't hack together something similar using a free CA such as CloudFlare's open "cfssl" CA (https://github.com/cloudflare/cfssl
Re: (Score:2)
Unless they are using some bespoke encryption inside the payload instead of relying on HTTPS / TLS, you can man-in-the-middle that by setting up a HTTPS proxy with certificates you sign and load onto the device. Then you can sit at the proxy server and dump whatever you like.
There are several network security products that do this at scale, as a feature. But there's no reason you can't hack together something similar using a free CA such as CloudFlare's open "cfssl" CA (https://github.com/cloudflare/cfssl) where you can create the certificate authority, create a server cert for a squid proxy, load the CA cert onto your phone in a management profile along with the proxy url, and then run tcpdump on the proxy.
That's why we know that xgerrit is a lying sack of shit; or, more likely, some dodgy App is doing the Chatting. What IP is it targeting? That for sure isn't "Encrypted"!
Do you have any idea how many Haters would LOVE to catch Apple doing some sort of Data Dump like this?!?
Re: (Score:2)
Re: (Score:2)
We have no idea what gets sent back to Apple. I ran a proxy server for a development project and happened to notice that around 1am every night my iPhone sent a sizable encrypted payload to Apple. What was in it? Only Apple knows. There is literally no way to verify what is or isn't sent back. Plus, their terms of service and privacy policy could easily have holes to allow anything, and even that's assuming you don't specifically change one of the many default settings to allow sending data back during setup or any OS updates.
Funny. You're the only one that noticed?
Prove it.
Re: (Score:2)
Not disagreeing with you here but it _is_ interesting that this is on-device; nothing is sent back to servers...
No, no. Nothing needs to be sent back to the servers. That doesn't mean it won't be.
Paranoid? Sorry. I'm living in the now. If they say they aren't collecting your data? They're lying.
Sigh.
Re: (Score:2)
I'm sure GeekBench will also confirm that the LLM is 150% better than GPT4, also that the iPhone 14 has faster multicore performance than a dual socket 128 core Epyc server.
Re: (Score:2)
Apple has never inflated or otherwise cherry-picked benchmarks or other performance data that happily fits their marketing message, while not disclosing less-than-favorable results.
Apple ^H^H^H^H^H Everybody has never inflated or otherwise cherry-picked benchmarks or other performance data [. . .]
FTFY.
Let's worry about the performance once they ship it, and less biased critics can review it in totality and say what it's really worth.
Less biased? So, we won't be sending it to you, then, right?
Re: (Score:2)
1. I never claimed to be a product reviewer, and I don't want to be one.
2. You have no idea what my biases are.
3. I call companies out on their marketing bullshit, when they have a history of projecting rose-colored images that don't match easily observable reality. I do the same when Samsung is caught repeatedly cheating on benchmarks too. And the most rose-colored of marketing bullshit is pre-announcement of capability or performance when there is absolutely no way to independently verify.
So basically y
Boasts are cheap (Score:1)
Meh ... (Score:3)
Hyping everything just because it's from Apple is a little silly, only the LLM in a Flash paper deserved it (though it's mostly a variation of the theme in "Low-Rank Approximations for Conditional Feedforward Computation in Deep Neural Networks").
Making a digital assistant work better with random app/websites is nice, but outperforming GPT-4 on something it was never designed for is hardly impressive. The whole one-shot, unlabeled data spiel from the early GPT era was just hype, multimodal doesn't make the difference ... good labeling and supervised training/finetuning makes the difference.
Re: Meh ... (Score:3)
Re: (Score:2)
Apple really really wants to be seen as a technology company (despite dropping "computer" from their name) instead of being seen as "just" a consumer brand, so they need to attach themselves to AI. They actually have some interesting ML features and I love that they do things on-device, but between the Apple Car project cancellation and the predictable lackluster response to the Apple Vision Pro, they don't have anything big to get Wall Street and investors excited for the future. The fact that they're breaking their taboo of talking about unannounced products and research is a pretty strong sign that actual AI products are really far off. So yeah⦠this is all hype.
Go intercept some more Encrypted Packets "to Apple".
Moron.
Re: (Score:2)
Re: (Score:2)
Hyping everything just because it's from Apple is a little silly,
So does Hating everything just because it's from Apple; but that's been the status quo on Slashdot (and apparently now, the EU and the DOJ) for some time now.
Now what?
Re: (Score:2)
Ya add more HATERS to your super secret enemies list
Who fucking cares? (Score:2)
The only thing I use Siri for is setting a reminder like going to get laundry out of the dryer or to check on the oven. F
Re: (Score:2)
A lot of Apple's features are little different than what AWS releases: a very minimal viable product that's just sufficient enough for a 'feature release' - which then languishes with no useful additions for many years.
A couple examples:
* alarms set at one point in time will forever be in the list of alarms, regardless of whether they're active. If you set alarms by voice, this means you'll have an endless scrolling list unless you manually and arduously delete them.
* Siri features which haven't materially
Re: (Score:2)
A lot of Apple's features are little different than what AWS releases: a very minimal viable product that's just sufficient enough for a 'feature release' - which then languishes with no useful additions for many years.
A couple examples: ...
* alarms set at one point in time will forever be in the list of alarms, regardless of whether they're active. If you set alarms by voice, this means you'll have an endless scrolling list unless you manually and arduously delete them.
* Siri features which haven't materially improved since Siri's launch (Siri was cool when it launched; by the time Google came out with their voice assistant, it was second rate. Alexa still manages to be worse despite being markedly more featureful).
{Emphasis mine}
While I largely agree with Caimlas, they hit on what is, for me, one of the few cases where Siri really helps.
"Hey Siri, delete all alarms."
Re: (Score:2)
Wait, that actually works? Goddamn. That's one of the last things i'd have expected. Thanks! lol
Re: (Score:2)
The worst thing is that even for alarm clock, it may not work. Let's say you have a alarm setup for 8AM every monday, inactive. If on tuesday you ask Siri "Hey, wake me up tomrrow morning at 8" it will activate your already existing alarm, oblivious to the fact that this alarm only rings on mondays.
If you trust Siri, you won't even wake up.
Re: (Score:2)
So until your 'personal digital assistant' can actually assist me with the most basic of requests:
I
DON'T
FUCKING
CARE!
Well, there's your problem: You didn't phrase your request as a Question, nor a Command that Siri could possibly grant without some nonstandard Peripherals. . .
I think the key here is "domain specific". (Score:2)
I think this is key. That is, I suspect Apple is not saying "we've created something that can do everything GPT-4 can do, but do it on the device." I suspect Apple is saying "for a very limited domain--such as asking your iPhone to call someone, or when asking for directions, Apple's AI outperforms GPT-4.
Apple's AI won't be composing poems in the style of Edger Allen Poe about Penguins (my favorite thing to do with ChatGPT), but it may be a
That's going to be awesome (Score:2)
Re: More WOKE TRASH incoming (Score:2)
They really ought to be able to do this (Score:2)
My conversation with Siri today
Siri, Show all alarms
(long list)
Show active alarms
-> I found one, a 15:30 alarm
Turn it off
-> I turned off your 15:30 alarm
This seems to be quite an advancement, in that there is persistence of memory, minimally.
Sure it fails at anything beyond this but without LLMs just Siri and existing technology they ought to be able to handle a lot of tasks.
It also works at "Play (song name) on Youtube Music" though it picks the audio only version not vvideo and requires me to log in