Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Businesses The Internet

Internet Data Mining for Investment Analysis 74

CaroKann writes "Reuters is reporting on a Wall Street investment research company, Majestic Research, that is using web crawling techniques to track business performance. Instead of attempting to estimate business conditions by talking to company management, or pounding the pavement visiting stores, this company uses data mining systems to collect real-time sales data and other information on companies that have a web presence. Using this data, Majestic attempts to estimate company earnings more accurately than traditional research outfits."
This discussion has been archived. No new comments can be posted.

Internet Data Mining for Investment Analysis

Comments Filter:
  • by eldavojohn ( 898314 ) * <eldavojohnNO@SPAMgmail.com> on Wednesday February 15, 2006 @08:32AM (#14723644) Journal
    But the New York-based research firm is winning converts among hedge funds who say its brand of Web-based quantitative analysis can be more accurate than traditional Wall Street research forecasts.
    Possibly because "traditional Wall Street research" involves reading tea leaves and throwing down chicken bones while watching Alan Greenspan do a rain dance to the gods in hopes that our economy will pick up.

    Economics and future fiscal predictions are completely theoretical. There are just too many variables involved, folks.
    • Would he be by any chance dancing with Ben Bernanke?
    • Possibly because "traditional Wall Street research" involves reading tea leaves and throwing down chicken bones while watching Alan Greenspan do a rain dance to the gods in hopes that our economy will pick up.

      And despite plaudits and accolades from government and business, Greenspan was no "genius" in the classic sense. He made mistakes early on (ask George Bush the First!) and eventually found a technique (the now famous "Raise/Lower Interest Rates a Quarter Point Shuffle") that seemed to work most of th

      • Greenspan's genious was not in his handling of the Federal Reserve, he was an expert at utilizing the power he built through control of the FOMC to push politicians into deregulating the national economy. Regardless of your opinion of deregulating the national economy, I think anyone can appreciate the level of genious required for a single person to have such a substantial impact on that many decisions without publicly revealing the level of such influence until they near retirement.
        • But having one person wield that kind of power in an economy is anathema, and makes all economic models suspect. Do you now have to add a "Greenspan" factor to your equations? And more importantly, if you do, is this factor transferable, so that the "Greenspan" factor now becomes the "Bernacke" factor and has the same relative weight. No one person (or nation for that matter) should wield that kind of power, lest it create conditions where the misuse or abuse of that power would cause the dominoes of the wo
          • The only factor Greenspan had in most models was "the Greenspan put" which impacted growth only indirectly (because it freed an awful lot of risk capital). Macro forecasts are not worthe the paper they are printed on, my models were designed to interrpet when others were missing things like market share shifts and competitive advantages that were forming or decaying. The economy played a relativly small role in how those dynamics shifted (ie Dell was a better competitor than HP in good times 1998-2000, an
            • Macro forecasts are not worthe the paper they are printed on, my models were designed to interrpet when others were missing things like market share shifts and competitive advantages that...

              Good for you. And when you get your fixed interest rate mortgage, when a company takes out a loan for investment, when you have a great investment idea but need to buy a cascade of 10 year futures (invest in a gas pipeline but want to invest in the quality of the management rather than take risks on anything and eve
    • I was a traditional 'Wall Street research analyst' (on the buy side) and found this to be the source of several fruitful ideas (MS/DOJ settlement release half a day before everyone else, Apple G5, AMD/Intel competitive dynamics, and the post 9/11 Akamai vs everyone else's news sites report). You might be surprised how valuable some of the info is around here.
  • by drgonzo59 ( 747139 ) on Wednesday February 15, 2006 @08:37AM (#14723670)
    They can create bogus pages to feed to the Majestic bot like in the BMW vs. Google case.
  • by Rob T Firefly ( 844560 ) on Wednesday February 15, 2006 @08:42AM (#14723697) Homepage Journal
    We can expect yet another huge rise in fake blogs, fake product reviews on Amazon and such, and paid shills in chats and message boards. Swell.
  • by CaptainFork ( 865941 ) on Wednesday February 15, 2006 @08:43AM (#14723701)
    based on manually mining (eg reading) Slashdot I determine a spike in Majestic's share price about now...
  • I call Bull (Score:4, Insightful)

    by spectrokid ( 660550 ) on Wednesday February 15, 2006 @08:49AM (#14723730) Homepage
    TFA mentions data about drug prescriptions by hundreds of physicians. Is that lying around unorganised on the net? Tell me which algorithm you are going to use to predict how many XBOX365 are going to get sold next month by webcrawling??? You think supermarkets post their sales-figures to public webpages? Wallmart is said to have more data off-line than is available on the entire public section of the net. Now give me access to that.. But on the other hand; if you work for the sales-tax administration (in Europe) and all the big companies file their invoices weekly, that is also a good starting point...
    • Re:I call Bull (Score:1, Interesting)

      by Anonymous Coward
      Having spoken to the company, I'll call your bull call. There are good proxies for retail sales data available from/over the web - how significant a channel is Amazon for some manufacturers? I don't doubt you can licence all kinds of goodness from other online properties - how about cars.com? Also TFA isn't entirely accurate. As well as data mining the internet, they also have access to a large number of proprietary (& $$) data sources - the drug prescription data you mention comes from companies lik
    • exept that tax statements are confidencial. Quaterly results though of listed companies are public..
    • If I understand it correctly from reading their site, this company agrees information exchange relationships with major online retailers for point of sale data, aswell as gathering 'truly public' stats on Ebay bids, Amazon stock levels, and other stuff that you can read on the net. This idea isn't all that new, but its the first time we've seen it used across all industries. I remember an industry body for pharmaceuticals that used to ask its members for rolled-up numbers for drug sales each year, and then
    • If I can rephrase the skepticism expressed in the parent post: most publicly available information posted on web sites will not yield any startling analysis that can't simply be gained by reading annual and quarterly reports.

      There was some comment in response to the parent post that the data mining company was licensing data. This also sounds suspicious. There is an SEC regulation called FD (for Fair Disclosure). This regulation states that you cannot preferentially provide investor information to so

  • by Cryofan ( 194126 ) on Wednesday February 15, 2006 @08:56AM (#14723772) Journal
    I wrote a project in perl some years ago that would download online financial news stories and count the critical words and weigh their connotational weight, and compare that to the direction of the stock market. For example, if the words "stocks" and "down" started showing up a lot in sentences in online news stories, you might expect a downward trend.

    I posted the preliminary code online in the perl newsgroup.

    google "data mining" "news" "perl" etc
    • Train it for a particular stock automatically using the actual direction of the stock. Set the filter as one of the inputs among many others (yahoo data) to a genetic algorithm system and then give the lot away free. Bankrupt the big financial advice firms. :)

      Hmm, might be worth using it as an excuse to play with Ruby.

       
    • Wait a sec (Score:3, Interesting)

      by DrSbaitso ( 93553 )
      For example, if the words "stocks" and "down" started showing up a lot in sentences in online news stories, you might expect a downward trend.

      So you wrote a program that would read some stories that said the stock market was going down, and it told you the market was down? Did your program also see if weather news reports contained words like "rain" and "downpour" and hence "predict" rain?
      • Did your program also see if weather news reports contained words like "rain" and "downpour" and hence "predict" rain?

        Infact, in most places in the world, predicting the following day's weather to be the same as the present day's weather would be reasonably accurate. It could be further refined by taking into account local weather patterns. Of course in the stock market this should be corrected for in projections using simple statistics, and is done so by most market participants (most weighted by ma

    • So, is this code still up somewhere? :)
  • by Anonymous Coward
    A friend of mine has developed software that goes even further. It parses streaming news stories for good/bad news and executes orders before humans even finish reading. That advantage is enough to make this company a mint.
    • A friend of mine has developed software that goes even further. It parses streaming news stories for good/bad news and executes orders before humans even finish reading. That advantage is enough to make this company a mint.

      Yeah, riiiight. Who knew it was so easy! Baloney.

      I read financial news stories, and even with my human brain focusing on the article it is difficult to tell whether the information presented is a positive or a negative indicator for the price of the stock.

      Here's an example:
      "XYZ Corp toda
  • Certainly at some level the things that happen around a company in public sources have a bearing on the stock price as many people base their investments on such info.

    But the real problem with everything like this is.. even if it works well for many things... there will be those who will try to missuse it.. and finding all those will be very hard. Further it only takes one major problem case and your nice product becomes a laughing stock.

  • This is great (Score:2, Interesting)

    by RoboSpork ( 953532 )
    Computers should be able to give a much more unbiased assessment of the economy than any person ever could. People are essentially incapable of interpreting economic data in a straightforward way, political agendas always seem to work their way into economists opinions about the economy. By using algorithms to do the analysis (and allowing market forces to refine those algorithms), we should be able to get a much better understanding of the REAL economy.

    This is a good thing for mankind.
    • Good things like the crash of 1987? Computer-run mutual funds appear to do no better than fund managers, either. I suspect that most of the larger mutual funds have relatively strict rules about when to buy and sell, in order to minimize emotional choices.
    • Yes, in fact people *are* essentially incapable of interpreting economic data in a straightforward way ... but then, it is the economic activities of those same seemingly irrational people that the market reflects. My point being that while the machines will likely come to a deeper understanding of the market than us higher-cortex skin-sacks ... it will likely come at the price of what will appear as bias. Without change to the most fundamental mechanisms upon which the economy works, it might well be that
    • Well, I think computers and good algorithms can help, but who chooses the algorithms?

      Instead of trying to fully analyze the economy, let's try a relatively simple example: how do you determine who is in poverty in the USA? Do you go by a set income (i.e. anyone who makes less than 18,000/year). If so, what if someone lives in an expensive city vs. an inexpensive rural area? Is it going to be the same amount in each state?

      Or is poverty determined by lifestyle? If you can't afford health insurance are you in
    • By using algorithms to do the analysis (and allowing market forces to refine those algorithms), we should be able to get a much better understanding of the REAL economy.

      There is no "REAL" market! (given the context of the thread I think you mean "market" and not "economy"). There is no holy grail of a perfect technical stock picking algorithm.

      We could have all the information in the world and all the computing power to analyze it and we would still not be able to predict the market because it is driven by p
      • Well, no, I did mean economy, and not market. I also was reffering to algorithmns to MEASURE and qauantify the economy, not predict economic trends. Economists have a very hard time measuring very basic hard economic quantifiers like GDP (ever heard of the black market?), total cash flow and the like. They also have a hard time measuring more subjective quantifiers like poverty rate and consumer satisfaction. Maybe you missed your economics class the day the professor explained this, but economics is a
        • Maybe you missed your economics class the day the professor explained this, but economics is a science that involves every aspect of all life (not just human).

          What a smarmy repy. I have opinions on the issues and addressed them, but you have to resort to personal attack to make yours. Every point you replied to had a personal insult attached. Since I make my living in the market my thoughts on self-delusional stock-picking algorithms may have some validity. Surely you don't think that just becaue some f
          • Sorry, I re-read my post and it is flaming sounding, I apologize. I was a wee-bit grumpy before I had my coffee. I wish you luck researching the market and shuffling that gravy around. I have done work with technical analysis myself and I know how hard it can be. I gather from your posts that you regard technical analisys as a tool and not an end-all solution and I would certainly agree with that.
  • by DeveloperAdvantage ( 923539 ) on Wednesday February 15, 2006 @09:44AM (#14724124) Homepage
    This is interesting stuff. I would like to learn more about the algorithms they use to analyze their data - the article has very few details. It is neat how systems like this are becoming favored over traditional human analysts (or at least reducing the need for people).

    I remember back in grad school in the late 90s I worked on a major project to design an intelligent agent based system including the same functionality, but, in addition to pulling information off the internet, it could also take into account whatever other information could be gathered and interfaced into it (for example, there is also a lot of content on TV which could be fed into a system, in addition to the online data). It was a design project though and not implemented, perhaps I will need to resurrect it!

    I do think the whole area of quantitative or at least semi-quantitative analysis of information, both textual and numerical, is going to explode over the next few years, driven by vast amounts of incredibly cheap computing power and bandwidth. Computer applications do amazing stuff right now, but five years from now truly "intelligent" applications will exist. The term "artificial intelligence" has fallen out of fashion, perhaps a sign of how common place these systems have now become.

    As an example, our local phone company has a voice recognition system which actually works reasonably well, much, much better than anything 5-10 years ago. We are certainly making progress.
    • This is old news. Data mining systems like this have been around for years. Some are even actual real data mining systems (i.e. a SELECT statement against two tables is not data mining!! argh... i digress)
  • Ties to Majestic 12? (Score:2, Interesting)

    by Anonymous Coward
    Does anyone know whether Majestic Reasearch has any connections to Majestic 12 ( http://www.majestic12.co.uk/ [majestic12.co.uk])? For those who don't know, Majestic 12 is a distributed search engine. The distributed part is in that they have a bunch of people donate CPU cycles and bandwidth to run a web crawler in a SETI at home fashion. Now i thought this was a good thing to join, because we kind of need some independent alternatives to google. But if it turns out i'm sponsoring some marketing firm, well... i'd feel pretty s
  • Will it work? (Score:2, Informative)

    by Arwing ( 951573 )
    Nope, it won't, because even if it does, everyone will start using it and render it useless. There is only one trend in stock market that is backed up by statistics over long run and that is the stock market drifts upward overtime. My professor did a exmeripment using computer modeling, basically using a random number generator to decide if the stock market goes up or down, adding the 'upward drift' factor using historical data and comparing it to the actual data over last 75 yrs, and two data looks almos
    • Theres another company doing similar 'evidence based research/reporting' specifically for the Internet industry here [backchannel.co.uk]. In particular they publish the 'inside story' on recent M&A and flameouts in the industry.
  • by JimDog ( 443171 ) on Wednesday February 15, 2006 @11:01AM (#14724774)
    I once interviewed with a group in San Francisco that did stuff like this. They weren't clear about who they were working for, but I do remember some of the techniques they mentioned during the interview. Some of these were actually implemented, others were just ideas:

    - An eBay crawler that could estimate the number of auctions and average selling price to predict whether eBay would make their earnings target or not. eBay quickly blacklisted their IP space, so they started using a bunch of open proxies they found.

    - By analyzing client/server communication for the Sims Online, they discovered that each connection was assigned a sequentially incrementing connection ID number. By looking at the rate at which the connection ID numbers were increasing each time they logged in, they determined that the Sims Online wasn't going to be nearly as popular as Electronic Arts was forecasting.

    - They talked about placing a camera somewhere in Union Square (in SF) to monitor the entrace to Tiffany's during the holiday shopping season, and doing image analysis to determine what percentage of shoppers left the store with a Tiffany's bag in hand.

    - Monitoring wireless carriers' spectrum to determine what percentage of GSM/CDMA channels were in use for data vs. voice. The communication itself is encrypted of course, but you can still tell whether a channel is carrying voice or data. They wanted to determine if wireless carriers forecasts about revenue from data services were accurate.
    • Nice examples, very nice. Sounds like a company with some innovative ideas, which are sadly rare in larger financial institutions. While you were unsure of their clients, do you have any published success rates, or even their name?

      The reason? I just may send my CV off!
    • Hey Uber Banker, Why don't you send US a CV if you are interested and have something to contribute: tony@majesticresearch.com Cheers!
  • Forget trying to analize companies for optimum performance. For real performance dump the whole batch and buy precious metals because the fact is that the US economy has more debt than can ever be paid off at face value. IMHO, gold is pratically guaranteed to outperform every investmant class out there.

    Seriously, just watch what happens when the fed decides to print up money to try and stall off a cascading credit collapse. They will print up some, but that will make things worse because it will drive up
    • Almost every part of your post is wrong. The US government's debt (Federal debt only) is about 40% of US GDP. Compare this to countries like Japan which are well over 100%.

      Gold, historically has underperformed any other asset class. Over the last 200 years. Over the last 50 years. And over the last 10 years. Gold is traditionally seen as a good inflation hedge, for obvious reasons, but it's not a good long term investment.

      The US government will never resort to seignorage (printing of money) as you see
      • I've never understood what makes these crazy psycho-libertarians think that there is anything that makes gold of inherently high worth. Honestly, it's just a metal, it's not even that rare, and with the exception of being a particularly good conductor the only thing that makes it valuable is that some people think it's pretty.

        It's really bizarre. It's like they've never bothered to read any economics texts ever.
        • Your very ignorance is what presents my opportunity. It is true that there is little inherent value in gold. It is heavy, malleable, conductive, shiny, stable, and scarce. But what of those dollars you so tout? Their value stems from the demand for American goods and services and their scarcity. America's manufacturing capacity has been declining (e.g. GM) and the supply of dollars expands approximately at a 7% annual rate, doubling roughly every ten years. When our future obligations come due--when A
          • Money is an inefficient store of value. It, like gold, tends to decline in value over time (inflation) when many other investments increase in value. Gold decreases in value relative to the dollar, and has been doing so for a long time, as the GP pointed out. So gold is, at least at this point, one of the few things out there that is an even less efficient store of value than money.

            Investing in gold is also not productive, from an economic perspective, because gold is useless. You need money to make mor
      • Almost every part of your post is wrong. The US government's debt (Federal debt only) is about 40% of US GDP. Compare this to countries like Japan which are well over 100%.

        I didn't say the US government, even though their debt is nice and high too. It's the total debt in the US economy vs what the economy puts out and economic growth. (not to mention unfunded promises)

        Gold, historically has underperformed any other asset class. Over the last 200 years. Over the last 50 years. And over the last 10 ye

  • The firm's methods differ from traditional Wall Street research, where analysts make forecasts based on conversations with company executives, advertisers, suppliers and mall visits to forecast company results and make recommendations.
  • IDL have been doing this for years - http://www.investor-dynamics.com/ [investor-dynamics.com]
  • Maybe they'll soon announce a deal with Google?
  • Hello all:

    I like to highlight that there is a difference between a Prediction and a Summary. From what I read so far, the tool posted in the article generates a summary, which maybe used as a prediction.

    Let s(t) be the Summary of a system (in this case, the economy) at any given time, then:

    A prediction, p(S), would be a prediction based on a set of summary S, where: S == {s(t), s(t-1), s(t-2), ... s(t-n)}

    One can always make a prediction based on a very small number of summaries. |S| = 0 is a guess. |S| =
  • ...this company uses data mining systems to collect real-time sales data and other information on companies that have a web presence....
    Can anyone name a company that keeps "real-time sales data" on the web?
  • This is a pretty common trick, and used in one form or another on Wall Street for many years.

    I have seen ones that scanned EDGAR filings, (got canceled when the company was destroyed in the 9/11 attack), campaign contributions (works wonderfully for the telco and other highly regulated industries). patent filings (generally surprisingly well, though no one knows why), job adss, and many others.

    I even heard of one that analyzed free internet porn...(insert your favorite joke here, but it actually was a fairl

Things equal to nothing else are equal to each other.

Working...