Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Music The Internet Technology

How 136 People Became 7 Million Illegal File-Sharers 313

Barence writes "The British government's official figures on the level of illegal file sharing in the UK come from questionable research commissioned by the music industry. The Radio 4 show named More or Less examined the government's claim that 7m people in Britain are engaged in illegal file sharing. The 7m figure actually came from a report written about music industry losses for Forrester subsidiary Jupiter Research. The report was privately commissioned by none other than the UK's music trade body, the BPI. The 7m figure had been rounded up from an actual figure of 6.7m, gleaned from a 2008 survey of 1,176 net-connected households, 11.6% of which admitted to having used file-sharing software — in other words, only 136 people. That 11.6% was adjusted upwards to 16.3% 'to reflect the assumption that fewer people admit to file sharing than actually do it.' The 6.7m figure was then calculated based on an estimated number of internet users that disagreed with the government's own estimate. The wholly unsubstantiated 7m figure was then released as an official statistic."
This discussion has been archived. No new comments can be posted.

How 136 People Became 7 Million Illegal File-Sharers

Comments Filter:
  • by girlintraining ( 1395911 ) on Friday September 04, 2009 @07:19PM (#29318757)

    They think that a single copy of a song is worth over a hundred thousand dollars too. They claim to lose more in revenue each month than the GDP of most countries. All because of those dyyyeaaarrrn pirates. Enron looks positively boring in comparison to the accounting techniques the recording industry uses. None of this is news. About the only people that buy this crap are judges and legislators -- the rest of us are almost universally of the mindset that a bag of potato chips has more value than most of the recording industry's portfolio.

  • by Trepidity ( 597 ) <[gro.hsikcah] [ta] [todhsals-muiriled]> on Friday September 04, 2009 @07:21PM (#29318781)

    Some of the estimation steps might be sketchy, but the basic practice of estimating a population proportion from a sample of that population is not particularly questionable. That's how almost all studies of populations work, because taking censuses of all people in a country is rarely feasible. We have century-old statistical theory on how to put bounds on the sampling error, too, assuming the sample was indeed random.

    You could have a whole slew of these stories if you really objected to that basic methodology, e.g. nearly every estimate of N million people suffering from a disease or disorder is based on a sample.

  • by Anonymous Coward on Friday September 04, 2009 @07:23PM (#29318797)

    Drawing inferences to the broader population from a sample of about a thousand is a totally accepted and scientific practice. There are innumerable ways you can screw it up (most having to do with the sampling procedure being biased) but in principle there's nothing wrong with saying that 12% of a sample of 1176 implies about 12% of a population. The upwards adjustment to 16% is plausible given that social desirability bias is a known problem with surveys but a bit more of a judgement call and the numbers adjusting for this should never be presented out of context of some pretty serious hedges.

  • Re:Story meaning? (Score:4, Informative)

    by wizardforce ( 1005805 ) on Friday September 04, 2009 @07:27PM (#29318829) Journal

    So could someone please explain *why* is it a questionable research.

    1. the same size is small.. probably too small to make the claims they did. 2. they altered the numbers on an estimate of how many people fileshare on the assumption that the number was under-reported. 3. conflict of interest... it's like the tobacco industry sponsoring studies claiming that smoking doesn't have anything to do with lung cancer... there is significant reason to believe that the study carries significant bias in favor of their conclusion and must at the least be repeated by other sources.

    So what is the point of this story? That statistics researches use only minor subset or people to do their research instead of asking from everyone? They always have.

    N. real statistics researchers know that this study has numerable crippling flaws and should not be held as gospel by anyone. Even a first year stats student can see it. The reason this story is important is that it may influence governmental policy and it's flawed... That's dangerous.

  • Re:Story meaning? (Score:2, Informative)

    by Loomismeister ( 1589505 ) on Friday September 04, 2009 @07:31PM (#29318863)
    The point isn't that they surveyed a small group of people and therefore the statistics aren't significant. If you RTFA you would see that they based the 7m number on the false statistic that 40m some people were using the internet that year when there was really only like 33.9m. They also bumped up the percentage of filesharing people based on the assumption that some people lied about whether they had programs like that or not. Really the lesson here is to read the featured articles because the slashdot summaries as a general rule are misleading.
  • Re:Story meaning? (Score:5, Informative)

    by Trepidity ( 597 ) <[gro.hsikcah] [ta] [todhsals-muiriled]> on Friday September 04, 2009 @07:37PM (#29318923)

    It doesn't really make sense to claim "sample size is small" for an 1,100-person sample. If the sampling was done in a random, unbiased manner, that size sample gives a margin of error of +/- 3%. If there are flaws in the sampling method, that's another thing, but the sample size alone doesn't seem problematic, unless you need accuracy better than +/- 3%.

  • Re:Story meaning? (Score:4, Informative)

    by Trepidity ( 597 ) <[gro.hsikcah] [ta] [todhsals-muiriled]> on Friday September 04, 2009 @07:58PM (#29319143)

    Basically, except that the confidence level for the interval is 95%, not 50%. Should've quoted that, but 95% is the usual assumed one.

  • Re:Story meaning? (Score:4, Informative)

    by Anonymous Coward on Friday September 04, 2009 @08:17PM (#29319303)

    A margin of error of +/- 3% is the Maximum margin of error for a random sample of 1100 drawn from a large enough population at the 95% significance level (actually its really +/-2.95%), i.e this is the margin of error when the observed % is 50% , The margin of error is less when the observed % approaches 0 or 100%.

    In the case of an observed % of 11.6 the margin of error is +/-1.9% so it is 95% likely that the population figure is between 9.8% and 13.5%

  • Re:Story meaning? (Score:5, Informative)

    by Atario ( 673917 ) on Friday September 04, 2009 @08:29PM (#29319397) Homepage

    it's A SMALL SAMPLE

    No, it's not.

    http://www.raosoft.com/samplesize.html [raosoft.com]

    About 60 million people in the UK, sample size of 1,176, confidence interval of 96% gives a margin of error of 2.99%. So, it's 96% likely that they got within 2.99% of the right answer (to the question of how many people admit to it).

    I hate seeing this "that's too small a sample size" objection to every single study, from people who clearly don't know enough about how sample sizes work.

  • Re:Story meaning? (Score:2, Informative)

    by Runaway1956 ( 1322357 ) on Friday September 04, 2009 @08:44PM (#29319527) Homepage Journal

    1 The 7m figure had actually been rounded up from an actual figure of 6.7m
    2 It gets worse. That 11.6% of respondents who admitted to file sharing was adjusted upwards to 16.3% "to reflect the assumption"
    3 The 6.7m figure was then calculated based on the estimated number of people with internet access in the UK.

    TFA is pretty clearly challenging those figures based on assumptions made, faulty estimates, and rounding up. The original "research" was clearly engineered to give a high number.

    Is there anything else I can help you with?

  • Re:Why the BBC rocks (Score:3, Informative)

    by FourthAge ( 1377519 ) on Friday September 04, 2009 @08:47PM (#29319549) Journal

    You can also avoid paying the licence fee if your TV can't receive over-the-air pictures, e.g. if it is disconnected from the aerial.

    There was once a "radio licence", you can still see a reference to it in one episode of Monty Python, but this was phased out when almost nobody owned a radio but not a TV.

    In the future, I expect the TV licence will be extended to include Internet connections as well, since those can now be used to receive BBC programmes too. At that point, we will see if the BBC can continue to convince people that it is worth the money.

  • by DarkOx ( 621550 ) on Friday September 04, 2009 @08:57PM (#29319647) Journal

    Not but you need some basis if you are going to make such an adjustment. There are ways to determine the rate of sampling error for instance and then use that. In this case that might be to much effort or get you into legally murky waters so what an honest researcher would write something like this:

    In my sample of XXXX, YY responded that they sometimes used p2p software in an illegal fashion. Based on this the number of extra legal file sharers in the total population would be ZZZZZZ. I would not expect a person who does not use p2p in an illegal way to respond to my survey in the affirmative while it is easier to image someone who does would respond in the negative; therefor the number may actually be greater than ZZZZZZ.

    ---
    Do so would present the numbers as clearly as they can actually be known; states its assumptions and bias in a consice way.

  • Re:Story meaning? (Score:4, Informative)

    by wizardforce ( 1005805 ) on Friday September 04, 2009 @09:23PM (#29319815) Journal

    I just don't understand the stance that most people on this board seem to take regarding this issue. How can everyone be so supportive of what very obviously amounts to theft?

    not everyone does obviously... most reasonable slashdotters advocate for reformed copyright pertly because of the unenforceable nature of longer copyright terms. many such as myself support the concept of a shorter more reasonable copyright term that does what the constitution requires: encourage the advancement of the arts.

    If you do indeed use all file-sharing applications for 100% legit purposes, please educate me what you use these services for that makes them so very essential to cause these very emotional posts here.

    most of the anger is directed toward the music/movie industry's response to piracy- weaken/destroy fair use, demonize all p2p [possibly restricting its use in the future out of fear] suing people as a scare tactic, excessive/un-constitutional fines, DRMed media etc...

  • Scoundrel Statistics (Score:5, Informative)

    by anyaristow ( 1448609 ) on Friday September 04, 2009 @09:37PM (#29319913)

    Even a first year stats student can see it.

    This is almost as cliche in arguments of statistics as the car analogy is on slashdot, and it's the sign of a scoundrel. If you actually had a first year stat student's understanding of stats you'd know where the weaknesses actually are, and where all the rest of the smoke blown in this discussion goes laghably wrong.

    So let's apply some first year stats to the issue.

    First, the sample size. Whether it is numerically large enough to be useful is a matter not only of it's size but also the number of positive results. IOW, a sample size of 1176 is too small if you found 3 of what you're looking for, but if you found 136 (11.6% of 1176), you have plenty of samples. The question is then only whether you had a representative sample.

    My next concern would be precision. Using data with three or four significant digits (136, 1176) to make conclusions to seven significant digits (11.56463%) is silly, but that doesn't seem to have happened here. The only number in all of this that is fishy is the 16.3% number. To get three significant digits they'd have to know the number of lying households to that precision. If they had another study that determined this number they might very well have a number to that precision, but I'm assuming they just guessed.

    That's still not a problem. If you guess, you run your confidence interval through your formulae (here it's a simple product) to put a range on your results. If it's a from-your-ass guess you might put a 100% failure estimate on your low end (i.e. there might be no lying households at all) to arrive at a conservative range. Here, it looks like they used an estimate of 40%. They should have (and might have; I didn't RTFA) run the un-adjusted 11.6% through the formulae to get a conservative low-end range.

    Anyway, the number they finally used was 7%. One significant digit. That doesn't imply the same precision as, say, 6.7% would. In fact, if their figure for the number of lying households really was accurate to one digit (i.e. 35-45%) then rounding their final result to one digit was the correct procedure. If it was just a guess they should have run the absolute low estimate (probably, zero lying households) through to get a range.

    So, with actual first year stat knowledge it's possible to actually state what might be wrong with the study, and not resort to "any first year stat student" hand-waving. It's clear that the most-cited criticism (the sample size) is the result of ignorance and group think, not actual knowledge of statistics.

  • Re:Why the BBC rocks (Score:4, Informative)

    by _Shad0w_ ( 127912 ) on Friday September 04, 2009 @09:55PM (#29320011)

    You only need one license, you can have as many tellies as you like. Portable tellies used in caravans and the like will be covered by the license for your home as well.

    If you have two houses, you will need two licenses though, afaicr - which is why students away at Uni need to buy a license - including if they're in halls - even though their permanent residence might still be their parent's house.

    I find the BBC great value and love it dearly. I suspect people will say that's because I'm white, middle class and liberal or something.

  • Re:Story meaning? (Score:3, Informative)

    by Bigjeff5 ( 1143585 ) on Friday September 04, 2009 @10:38PM (#29320215)

    The second objection, and this applies to other studies too that try to make grand claims from small samples, is that it's A SMALL SAMPLE. For your survey to be representative, your sample has to be representative. It's also difficult to choose people independently at random, and without that assumption, all your basic statistics fall apart. Perhaps they went through a list of BT subscribers and pulled names at random -- but what if downloaders are overrepresented amongst BT subscribers?

    You don't seem to understand the way good polling and statistics work. If you already have solid data on the demographic makeup of your population, it does not take a very large sample size at all to get accurate results. A sample size of 1000+ is more than enough to come within 3% accuracy (plus or minus) for any given study provided you already have good demographic information. To be accurate with a small sample size, you do NOT want to choose your survey takers at random, at least not completely. Specifically who takes the survey is random, but where they come from, what income level they fall under, how many computers they own, etc. should not be random at all. That's how you make a small sample size representative of the population, and can therefore get accurate results.

    For example, if a census (which has a near 100% sample rate) 5 years ago told you that 75% of the population owns a computer, and 75% of computer owners use the internet, and 50% of internet users have broadband, you can get very accurate results with a sample size a fraction of a percent of the size of the total population by simply making certain that your smaller sample breakdown matches the larger survey. 100% of people surveyed should own a computer, since the survey would need to be 30% larger to include those who don't have a computer and still get the same accuracy (accuracy would be slightly better, but almost certainly not worth the expense). 75% of those people should have internet (you could start here instead with still very high accuracy), and 50% of those people should have broadband.

    A result of 10% of people share files from the study that followed demographics and only used 1,000 people is going to be exponentially more accurate than a survey of 10,000 people chosen completely at random. To get any kind of accuracy with a pure random sampling you would need to sample a very large percentage of the total population. This is impractical and idiotic and not very useful.

    Statistics done well are reliable, it's who's using the statistics, what they are saying about them, and what they aren't telling you about them that make statistics untrustworthy.

    It's not the statistician who is the liar, it's the lawyer, or marketer, or politician who is the liar. It's their fault that 60% of statistics can be made to say whatever the hell you want them to say. That said, I don't trust any numbers given by the MPAA, especially when they arbitrarily adjust them up. More than likely the number should have been adjusted up, but the 5% figure seems rather pulled from thin air and unjustified. 2% or 3% would be more conservative, boosting the number of filesharer's by 50% just 'cause screams of desperation.

  • by Bigjeff5 ( 1143585 ) on Friday September 04, 2009 @11:07PM (#29320355)

    Survey sizes of around 1000 are pretty standard. If you run the survey and get 3 positives out of 1000, you say "Oh shit, sample size is too small", then run the same survey with 5,000 or 10,000 people to catch a larger number people you are targeting - i.e. we're looking to see what percentage of people practice illegal file sharing, we need to find at least a decent number of illegal file sharers so we know our survey is accurate.

    It's not a matter of knowing what you'll get before hand or rigging the study, you have to have a jumping off point somewhere, else you'll never do the study. If you get untrustworthy results, you simply adjust your sample size and conduct the survey again.

    A good study in this area would first investigate the characteristics of a large population WRT standardizing their likelihood of file-sharing.

    That is completely unnecessary if all you want to know is what percentage of people practice illegal file sharing. And I'm not sure what you mean by "standardizing" their likelihood of file sharing. Huh?

    This step in itself would involve many thousands of people in many different social, economic, and geographic strata.

    The government does this thing called a Census every few years, that collects just such information from close to 100% of the population, making it extremely reliable.

    (You might want to steer clear of race or national origin. It's likely significant, but too touchy.)

    Why? If it can affect your outcome, it should be in your demographics, otherwise your study is unreliable. Why the hell is that information "touchy"? Does a white guy not relize he's white? Or did someone forget to tell the Irishwoman she's from Ireland? What the hell?

  • Re:Story meaning? (Score:3, Informative)

    by Bigjeff5 ( 1143585 ) on Friday September 04, 2009 @11:58PM (#29320639)

    No, you missed the point of that post.

    The point was that the sample size has almost no bearing on the accuracy of the survey provided it is truly representative of the overall population.

    If you can get a sample size of 10 that is representative of a population of 60,000,000 people, you'll have a pretty accurate survey. The reality is, that's not possible in most cases. You'll generally have more than 10 demographics of varying percentages of the total population, making 10 simply too small. 1000, however, is not too small unless you are looking for very, very small percentages of the population. I.e if you are expecting results of less than 2%, a sample size of 1000 is too small because the margin of error is around 3% - you could easily run the survey and get no positive hits at all. For that survey, you'd probably need to bump it up to around 10,000 to drop the margin of error low enough to get reliable results.

    Since the results they got were 11.6%, and the margin of error was about 3%, you can very reliably say between 8.6% and 14.6% of people use file sharing software.

    I don't like that they added 4.7% to their figure without anything to back that up, especially since that is nearly 50% of their results. They basically said 30% of file sharers lie about being file sharers, without any data to back that up. They also used 40 million as their figure for people on the internet, when the government survey states something like 33.5 million.

    The numbers they should have used were 2.9 - 4.9 million people use file sharing software. That is accurate and can be backed up by statistics. It is probably more like 6 million due to people lying about using file sharing software, but that's still just a number I pulled out of my ass, and not statistically accurate.

  • by anyaristow ( 1448609 ) on Saturday September 05, 2009 @01:09AM (#29320981)

    Was 1,176 a sample large enough to represent the the 40,000,000? I would assume not. You could assume so. The fact would still be that we would both be assuming.

    Assume nothing. Google is your friend.

    Google: sample size [google.com]

    First result has all you need.

  • Re:Story meaning? (Score:3, Informative)

    by blueg3 ( 192743 ) on Saturday September 05, 2009 @09:48AM (#29322853)

    No, it's a thermodynamic concept that has been extended to information theory. Prior to major developments in statistical mechanics, entropy was a loss of energy associated with physical transformations. This significantly predates both statistical mechanics and information theory. Stat mech formulated the modern definition of entropy, and Shannon applied it to information theory.

Work without a vision is slavery, Vision without work is a pipe dream, But vision with work is the hope of the world.

Working...