BBC Optimizing UHD Video Streaming Over IP (bbc.co.uk) 72
johnslater writes: A friend at the BBC has written a short description of his project to deliver UHD video over IP networks. The application bypasses the OS network stack, and constructs network packets directly in a buffer shared with the network hardware, achieving a ten-fold throughput improvement. He writes: "Using this technique, we can send or receive uncompressed UHD 2160p50 video (more than 8 Gbps) using a single CPU core, leaving all the rest of the server's cores free for video processing." This is part of a broader BBC project to develop an end-to-end IP-based studio system.
Re: (Score:2)
How? Please explain.
I would expect that bypassing the network stack is no small feat.
Re: (Score:2, Informative)
IP packets and Ethernet frames are really quite small data structures, so the cost of processing is vastly overshadowed by the cost of scheduling, context switching, memory management, etc.. Because modern CPUs are limited by RAM access latency, they're only fast when they can either loop or stream. If there isn't enough of that between the overhead, performance tanks.
Re: (Score:3)
At 10Gb/s, the amount of data getting shuffled around in a normal network stack is enough to push the limits of the databuses. Most network stacks copy the data something like 4 times. That works as a multiplier and changes 10Gb/s into 40Gb/s. Context switching causes cache trashing and can consume more cycles than the actual data getting processed. A single context switch can consume about 1,000 cycles on a modern CPU.
Re: Credulity (Score:2)
40Gb is ~4GBps which is a fraction of what any bus can handle on a modern x64 processor. It's 1/10th the bandwidth of dual channel DDR3-1600 memory which is the slowest a Skylake processor goes. It's 4 lanes of PCIe 3.
Re: (Score:2)
FreeBSD is working on a new API to allow the network stack to work along with the network card such that the CPU-core that gets interrupted by the
Interesting (Score:5, Interesting)
Kernel bypass plus zero copy are, of course, old-hat. Worked on such stuff at Lightfleet, back when it did this stuff called work. Infiniband and the RDMA Consortium had been working on it for longer yet.
What sort of performance increase can you achieve?
Well, Ethernet latencies tend to run into milliseconds for just the stack. Tens, if not hundreds, of milliseconds for anything real. Infiniband can achieve eight microsecond latencies. SPI can get down to two milliseconds.
So you can certainly achieve the sorts of latency improvements quoted. It's hard work, especially when operating purely in software, but it can actually be done. It's about bloody time, too. This stuff should have been standard in 2005, not 2015! Bloody slowpokes. Back in my day, we had to shovel our own packets! In the snow! Uphill! Both ways!
Re: (Score:2)
Yeah, you just keep spouting off your unused testosterone.
We'll learn from gramps and profit from his experience. Just keep mouthing off to people who have accomplished things before your time instead of learning from them. I'm sure some one will realize your greatness some day and place the crown on your head as you truly deserve.
Re: (Score:3)
70ns for a signal to propagate over 10m of twisted pair copper. Start there.
I have Gigabit connection over my LAN and I still get...lemme test... 1ms to the router, 2ms to a random other machine. Over WAN, 109ms to Slashdot, 16ms to the BBC. World of Tanks EU server goes between 40-130ms, depending on how busy it is and whether my son is video skyping....
Re: (Score:2)
I'm guessing your using standard ping there, well, the problem is that the packet being generated and the time sent and received times are coming from timers most likely in the app itself, it's doing the calculation, so if you ask the system for time 1 and it goes "00:00:00:00" and you ask for the time again and it says "00:00:00:01" it'll get reported at 1ms, but the packet may have entered the system a lot faster than that, it's only because you're using a 1ms accuracy time stamp that you're getting 1ms.
the above is wrong (Score:2)
Standard linux distros support timestamping of the packet by the kernel when the packet is received. When userspace reads the packet it can also obtain the kernel timestamp of that packet.
Re: (Score:2)
Sorry, what?
Even the kernel isn't accurate at doing this. On heavily loaded systems I've seen 20ms wait before a packet is stamped before. Pre-emptive kernels and everything else means that a packet might be sitting on the network card or in a buffer without it being collected and stamped by the system. The only way to have accurate timestamps is to have something like a Napatech or Myricom card using a third party time source.
Re: (Score:2)
yep, I used the shell "Ping" command (Win7HP). I have no clue as to the inner mechanics.
Re: (Score:2)
According to my switch a 64byte frame is 0.0023ms(2.3us) port to port
According to a research paper, 1Gb Ethernet over 1km of fiber is 0.01476ms(14.67us) and 10Gb Ethernet is 0.0056ms(5.6us), one way, not RTT
Desktop to Router through switch 0.12ms(120us) as measured in Windows via hrping
Akamai CDN in ISP 1.25ms
ISP DHCP server 1.5ms
Chicago 6ms
Slashdot 6ms
Minneapolis 7ms
New York City 30ms
Atlanta 30ms
Miami 40ms
Houston 45ms
San Jose 60ms
San Francisco 65ms
Seattle 70ms
Lo
Re: (Score:2)
Re: Interesting (Score:4, Informative)
Yeah, this kind of thing has been around for a while.
These days the added latency of going through the kernel IP stack is generally measured in micro rather than milliseconds but the difference is still the same order of magnitude. Solarflare, Mellanox and others will happily sell you expensive Ethernet network cards that come bundled with drivers that let you bypass the kernel IP stack. The stack itself isn't especially slow but the system call and extra memcpys still do all add up. I've also seen an in-house user space stack built largely on top of lwIP [wikipedia.org].
So I'd agree that none of this particularly new, but I reckon it's still interesting that the BBC is using it. Maybe that'll help spur more widespread adoption.
Re: (Score:2)
Re: (Score:2)
Care to explain a little of the overall design?
MTU (Score:2)
If people would just accept a decent MTU none of this would matter.
The max is 64 K but we're stuck with 1500 (including overhead) because you can't be sure that every hop will support your MTU.
Internally you can enable jumbo frames and shit will work, but once you need to go out over the internet all bets are off, so you limit your shit to 1500 and your performance goes to all hell.
We're basically delivering UHD movies via telegram.
Re: (Score:2)
The use case here is moving uncompressed video within a studio environment. In here, you have full control over the hardware and Internet does not come into play. I'd think that in such cases they have no problems in going to jumbo frames.
Re: (Score:1)
Jumbo frames play very badly when you have other stuff going over the same link though. Each connection can't send a packet until the previous one has finished sending, and those gaps are much further apart when using jumbo frames. Yes it does improve throughput, but only when you're using it for approximately one thing.
Re: (Score:2)
If people would just accept a decent MTU none of this would matter.
The max is 64 K but we're stuck with 1500 (including overhead) because you can't be sure that every hop will support your MTU.
Internally you can enable jumbo frames and shit will work, but once you need to go out over the internet all bets are off, so you limit your shit to 1500 and your performance goes to all hell.
We're basically delivering UHD movies via telegram.
Packet size is a tradeoff - for high throughput you want big packets, for low latency you want small packets. So fine, just tailor the packet size to your application - well no, when you're sharing a network, the packet sizes used by other applications have a significant impact.
So lets say you're doing something that requires a low latency, such as VoIP. And lets say you've got QoS set up to ensure the small VoIP packets are always inserted in front of any big packets, since that's a sensible thing to do.
Re: (Score:2)
Re: (Score:2)
Packet size is a tradeoff - for high throughput you want big packets, for low latency you want small packets.
There'd be no such trade-off if routers and computers pipelined packets, starting (or queuing) to forward as soon as the destination IP address is read and an interface route determined, possibly also waiting to check the TCP/IP header checksum.
Re: (Score:2)
64 * 1024 * 8 / 2 / 1000 / 1000 = 262 ms worst case, not 328 ms.
And routers should know the capability of the links and can split up the jumbo frames into multiple packets to let VOIP through ahead without wasting much bandwidth at all. Hell, my shitty D-Link does this - every boot it scans the link to determine connection speed and uses that in its QoS engine.
Further, the use case in the article is 8 Gbps in a studio environment. They can dedicate the entire link to video. 8 Gbps down a 2 Mbps pipe is
Re: (Score:2)
Re: (Score:2)
And for a 10 Mbps connection you drop the max MTU and split the packets. Routers in the middle of a path can do this.
Video Streaming Service A sends a 64 KB packet to ISP B over a 100 Mbps link, ISP B knows Customer C is on the Shit Tier package and can handle 10 Mbps, and decides to split up the 64 KB packet into 4 KB or whatever packets, Customer C gets their shit.
4 KB / 10 Mbps 64 KB / 100 Mbps, no additional jitter. Without even inspecting the traffic to see if it's Netflix or Skype or their own VoIP
Re: (Score:2)
I found my own way to protest. (Score:2)
https://birds-are-nice.me/musi... [birds-are-nice.me]
I show how the concept of the public domain has been crushed by demonstrating just how little popular music exists in it.
Re: (Score:2)
Damnit, posted to the wrong story! That was supposed to go to the one about TPP.
Re: (Score:3)
I also put the wrong link in.
Re: (Score:2)
I show how the concept of the public domain has been crushed by demonstrating just how little popular music exists in it.
Are you sure it wouldn't suck anyway?
Re: (Score:2)
Musical styles change. Entire genres of music have been invented in the last seventy years, the current copyright term for music here. There's no justification for a duration so long - in what way does it promote the creation of new music? It doesn't.
Baby steps. (Score:3)
How about the BBC stop requiring Flash for videos. That would be a better place to start.
Re: (Score:3)
Okey dokey [bbc.co.uk].
Re: (Score:3)
4k is aiming a bit low anyway. NHK, the Japanese equivalent of the BBC, I'd going directly to 8k for the 2020 Olympics. Test broadcasts will begin in 2018, about 2.5 years from now. 4k is going to be short lived.
Re: (Score:1)
Meanwhile, 4k is going to be needed (I use the phrase in the loosest possible way) and 2020 is 5 years away. Solving the problems at 4k are steps towards solving them with 8k and faster processors / faster network infrastructure / better equipment / better compression / better decompression.
I can't wait (Score:3)
to be able to watch Eastenders in Ultra HD...
To the Beeb's credit though, the Sky at Night in UHD would definitely be a lot more interesting, surely. But out of thousands of mediocre shows and movies released year after year after year, is it worth buying a new tv to marvel at a dozen really good programs? Somehow this don't seem to be a good value proposition.
Re:I can't wait (Score:5, Informative)
This isn't for you to watch UHD. It's for internal use in production, so they can shunt live UHD video around their studios. That way they keep full quality right up until the final stage before distribution, when it gets resized according to the end device. Your TV will get plain old 1080p as always - but they'll have UHD capability ready to go for transmitting to cinemas or sending to big public displays, and they can archive a UHD version for future use so they can zoom in tighter on the action in future highlights.
Re: (Score:2)
Interestingly, the place to look for UHD content is YouTube (and recently Vimeo, as well). The flexibility of the 'amateur' video producer and that of the internet as a distribution platform really show in this area.
There is some beautiful and awesome stuff out there:
https://vimeo.com/115541651 [vimeo.com]
https://www.youtube.com/watch?... [youtube.com]
https://www.youtube.com/watch?... [youtube.com]
Given that the high-end smartphones are outputting UHD movies now as well, there is going to be an onslaught of UHD content.
UHD is that round antenna, right? (Score:1)
In order to pick up UHD I need to connect that round antenna to the back of the TV, right?
Intel DPDK (Score:1)
Re: (Score:3, Funny)
why not do 720p for everybody first, with no region locking or lockouts? fuck this uber hq shit. dvd quality is, quite frankly, good enough for all but the most anal of viewers.
Shush! If UHD doesn't take, we'll be forever stuck with 1080p computer monitors. Do not be the person who prevented 8 MPixels desktop monitors from becoming mainstream.
Dear Sirs,
I'm the head of the UHD Panel Manufacturers' Association. I'm sorry to say that, having read the grandparent post by an Anonymous Coward, our members have unanimously decided to cancel all further development and manufacturing of ludicrously high-definition panels, and to shut down the association. In fact, we've decided to stop bothering with 1080p panels as well and in future will just be selling 1280 x 720 displays.
The parent was correct in identifying the influence of a single post by an Anony
awesome! (Score:2)
now I can watch reruns of top gear and star trek TNG in UHD
Why aren't they using Intel's DPDK? (Score:2)
Intel's DPDK library is specifically built to bypass the OS and provide high-speed low-latency networking. Sems like a natural fit.
Re: (Score:1)