SoHo NAS With Good Network Throughput? 517
An anonymous reader writes "I work at a small business where we need to move around large datasets regularly (move onto test machine, test, move onto NAS for storage, move back to test machine, lather-rinse-repeat). The network is mostly OS X and Linux with one Windows machine (for compatibility testing). The size of our datasets is typically in the multiple GB, so network speed is as important as storage size. I'm looking for a preferably off-the shelf solution that can handle a significant portion of a GigE; maxing out at 6MB is useless. I've been looking at SoHo NAS's that support RAID such as Drobo, NetGear (formerly Infrant), and BuffaloTech (who unfortunately doesn't even list whether they support OS X). They all claim they come with a GigE interface, but what sort of network throughput can they really sustain? Most of the numbers I can find on the websites only talk about drive throughput, not network, so I'm hoping some of you with real-world experience can shed some light here."
None (Score:5, Interesting)
NAS disk architecture (Score:5, Interesting)
If you use a single disk NAS solution and you are doing sequential reads through your files and file system, your throughput can't be greater than the read/write speed of a single disk, which is no where near GigE (1000 Gbps is about 125 MB/second ignoring network protocol overhead). So you will need RAID (multiple disks) in your NAS, and you will want to use striped RAID (RAID 0) for performance. This means that you will not have any redundancy, unless you go with the very expensive striped mirror or mirrored stripes (1+0/0+1). RAID 5 gives you redundancy, and isn't bad for read, but will not be that great for writes.
As you compare/contrast NAS device performance, be sure that you understand the disk architecture in each case and see oranges to oranges comparisons (i.e, how does each one compare with the RAID architecture that you are interested in using - NAS devices that support RAID typically offer several RAID architectures). Also be sure that the numbers that you see are based on the kind of disk activity you will be using. It doesn't do much good to get a solution that is great at random small file reads (due to heavy use of cache and read-ahead) but ends up running out of steam when faced with steady sequential reads through the entire file system where cache is drained and read-ahead can't stay ahead.
Once you get past the NAS device's disk architecture, you should consider the file sharing protocol. Supposedly (I have no authoritative testing results) CIFS/SMB (Windows file sharing) has a 10% to 15% performance penalty compared to NFS (Unix file sharing). I have no idea how Apple's native file sharing protocol (AFP) compares, but (I think) OS X can do all three, so you have some freedom to select the best one for the devices that you are using. Of course, since there are multiple implementations of each file sharing protocol and the underlying TCP stacks, there are no hard and fast conclusions that you can draw about which specific implementation is better without testing. One vendor's NFS may suck, and hence another vendors good CIFS/SMB may beat its pants off, even if the NFS protocol is theoretically faster than the CIFS/SMB protocol.
Whichever file sharing protocol you choose, its very possible it will default to operation over TCP rather than UDP. If so, you should pay attention to how you tune your file sharing protocol READ/WRITE transaction sizes (if you can), and how you tune your TCP stack (windows sizes) to get the best performance possible. If you use an implementation over UDP, you still have to pay attention to how you set your READ/WRITE buffer sizes and how your system deals with IP fragmentation if the UDP PDU size exceeds what fits in a single IP packet due to the READ/WRITE sizes you set.
Finally, make sure that your network infrastructure is capable of supporting the data transfer rates you envision. Not all gigabit switches have full wire-speed non-blocking performance on all ports simultaneously, and the ones that do are very expensive. You don't necessarily need full non-blocking backplanes based on your scenario, but make sure that whatever switch you do use has enough backplane capacity to handle your file transfers and any other simultaneous activity you will have going through the same switch.
Understand your performance requirements (Score:4, Interesting)
How many gigabytes are "multiple" gigabytes? Seriously, moving around five GB is much easier than 50 GB and enormously easier than 500 GB.
Another thing to consider: how many consumers are there? A "consumer" is any process that requests the data. If this post is a disguised version of "how do I serve all my DVD rips to all the computers in my house" then you probably won't ever have too many consumers to worry about. On the other hand, I work for an algorithmic trading company; we store enormous data sets (real-time market data) that range anywhere from a few hundred MB to upwards of 20 GB per day. The problem is that the traders are constantly doing analysis, so they may kick off hundreds of programs that each read several files at a time (in parallel via threads).
From what I've gathered, when such a high volume of data is requested from a network store, the problem isn't the network, it's the disks themselves. I.e., with a single sequential transfer, it's quite easy to max out your network connection: disk I/O will almost always be faster. But with multiple concurrent reads, the disks can't keep up. And note that this problem is compounded when using something like RAID5 or RAID6, because not only does your data have to be read, but the parity info as well.
So the object is to actually get many smaller disks, as opposed to fewer huge disks. The idea is to get the highest number of spindles as possible.
If, however, your needs are more modest (e.g. serving DVD rips to your household), then it's pretty easy (and IMO fun) to build your own NAS. Just get:
You might also want to purse the Ars Technica Forums [arstechnica.com]. I've seen a number of informative NAS-related threads there.
One more note: lots of people jump immediately to the high performance, and high cost RAID controllers. I personally prefer Linux software RAID. I've had no problems with the software itself; my only problem is getting enough SATA ports. It's hard to find a non-server grade (i.e. cheap commodity) motherboard with more than six or eight SATA ports. It's even harder to find non-PCI SATA add-on cards. You don't want SATA on your PCI bus; maybe one disk is fine, but that bus is simply too slow for multiple modern SATA drives. It's not too hard to find two port PCI express SATA cards; but if you want to run a lot of disks, two ports/card isn't useful. I've only seen a couple [newegg.com] of four-port non-RAID PCIe SATA cards [newegg.com]. There's one eight port gem [newegg.com], but it requires PCI-X, which, again, is hard to find on non-server grade boards.
This is misleading. (Score:4, Interesting)
However, as you say, benchmarking is the only way to really tell. Highly recommended.
Re:You could roll your own. (Score:3, Interesting)
The OP stated they have a business need for moving gigabytes of data quickly around the office. Spending the extra money on a real server's power consumption would save them thousands of dollars a day worth of their time.
Even the cost of the power is pretty minimal for this... Figure 500 watts for 24 hours is 12 kWh. At worst you pay $0.20/kWh, which is a hair over $2/day, assuming 24 hours/day usage. My linux PC NAS in the basement saturates gigE and is under 100 watts active power consumption, or about $0.40/day by california pricing.
I think those little mini NAS boxes suck for all but the simplest home applications, and they're penny-wise/pound-foolish for data-intensive business applications.
Re:Go to SmallNetBuilder.com (Score:1, Interesting)
My onboard Realtek PCI-E GigE ethernet chip can consistently deliver 999mbit in benches (using iperf or others).
Re:SMB (Score:3, Interesting)
I concur with this. Anything that says "GigE" only means that it's offering an interface that is compliant to the specification, not that it can pass 1000Mb/s.
A few days ago, I went digging for some information on switches. I'm a big Cisco fan, and I have specs on everything that I use. I know which of my switches can handle more traffic than others. That's kind of important.
Someone (to remain nameless) bought a GigE "switch". A name brand, but consumer grade switch. He wanted GigE because he had large files to transfer between several machines simultaniously.
"switch" by their definition in the user manual simply means hub, except it can amplify the signal. No actual switching involved, other than the fact that it can "switch" between 10Mb/s, 100Mb/s and 1000Mb/s. {sigh}
And the pps rates were pathetic. Actually, very pathetic. I broke out my spreadsheet of Cisco specs, and had to scroll down to the slowest, oldest switches that I can get my hands on. A base model Cisco Catalyst 2924 (not enterprise firmware). The 2924 handles 3 times the pps than this spiffy keen new "GigE switch". {sigh}
I only looked into it due to other network problems. Cascaded consumer grade switches in what should be a high speed operating environment. Nothing even came close to the old Cisco 2924. While I'm not advocating running a new enterprise on old 2924's, and the fact that there are much faster ones laying around waiting for a home, wouldn't it be prudent to use something else.
So the moral of my story.... Figure out what you're really dealing with, and don't look only at the label.
I was having a discussion with someone who does SAN work. He was all happy about his piece of equipment. I found out the specs of the components, and then priced it out with better PC based stuff running Linux. His did run Linux, but on a custom board. It was easy to out perform anything he had with better hardware, and even better drives. If I recall correctly, he veto'd the idea of switching because he had once tried it with a Windows based SAN, and it wouldn't work. Tried once. With some 3rd party crap. It didn't work. {sigh}
I'm slowly prepping a friends place to have a Linux machine be the SAN. Decent parts, standard protocols (SMB, NFS, and iSCSI). The only "slowly" part is that there is no rush right now, so when I see something that'll do it well, we buy the parts. Once we have all the parts, it'll be a running machine.
Re:To this whole chain of comments, I would like (Score:4, Interesting)
The only shops that actually look at cost/GB as a measuring stick are small shops, or shops with very specific needs.
Large corporations, government and high tech companies are usually more concerned with management costs, retention, migration and so forth.
This is simply not true. There are plenty of commodity storage requirements that do not require Fibre Channel or even NetApp level NAS. On the other end of the spectrum, cost/GB might not be a huge factor, but the cost of getting necessary IOPS is certainly a factor.
I work on Wall St. and we have multiple PB of storage. We have tons of EMC. However, things like the Sun X4500 and similar products from HP are changing the game. Couple that with being able to do 48 ports of line-rate 10GigE in a 1 RMU stackable, per priority pause coming into use, and Data Center Ethernet down the road and you have many reasons to seriously reconsider the scope of your fibre channel deployment.
If supported by NAS.. NetBeui.. (Score:3, Interesting)
If the NAS supports the non-routable NetBeui protocal.
Install the optional "Netbeui" protocal stack located on the XP install disk. (same add-on will also work on Vista.)
Don't forget to disable (uncheck) the "QOS Packet Scheduler", it will limit you to 20-25% of max link speed.
Lastly, one must also disable the NetBIOS over TCP/IP, if it connects first you won't see any performance boost. (Option located in the TCP/IP Advanced/WINS dialog).
The older/non-routable NetBeui protocal stack in the NT/W2K days was roughly 10x more CPU efficient per byte than NetBios over TCP/IP.
In XP/Vista environments it's still 5x more CPU eff than NetBios over TCP/IP.