Internet Data Mining for Investment Analysis 74
CaroKann writes "Reuters is reporting on a Wall Street investment research company, Majestic Research, that is using web crawling techniques to track business performance. Instead of attempting to estimate business conditions by talking to company management, or pounding the pavement visiting stores, this company uses data mining systems to collect real-time sales data and other information on companies that have a web presence. Using this data, Majestic attempts to estimate company earnings more accurately than traditional research outfits."
Traditional Wall Street Research? (Score:5, Funny)
Economics and future fiscal predictions are completely theoretical. There are just too many variables involved, folks.
Re:Traditional Wall Street Research? (Score:1)
Re:Traditional Wall Street Research? (Score:2)
Oh, wait, is that the new guy?
He needs to get in the news some more so we can all learn to remember (and maybe even pronounce) his name already.
Re:Traditional Wall Street Research? (Score:2)
And despite plaudits and accolades from government and business, Greenspan was no "genius" in the classic sense. He made mistakes early on (ask George Bush the First!) and eventually found a technique (the now famous "Raise/Lower Interest Rates a Quarter Point Shuffle") that seemed to work most of th
Re:Traditional Wall Street Research? (Score:2)
Re:Traditional Wall Street Research? (Score:2)
Re:Traditional Wall Street Research? (Score:3, Interesting)
Re:Traditional Wall Street Research? (Score:3, Informative)
Good for you. And when you get your fixed interest rate mortgage, when a company takes out a loan for investment, when you have a great investment idea but need to buy a cascade of 10 year futures (invest in a gas pipeline but want to invest in the quality of the management rather than take risks on anything and eve
Re:Traditional Wall Street Research? (Score:2)
Re:Traditional Wall Street Research? (Score:2)
Stock Memes - I posted this one year ago (Score:2, Interesting)
http://www.realmeme.com/Main/miner/stock.jsp?star
Now that the companies know that... (Score:3, Insightful)
Cue the web spam... (Score:5, Insightful)
My data mining results (Score:4, Funny)
I call Bull (Score:4, Insightful)
Re:I call Bull (Score:1, Interesting)
Re:I call Bull (Score:1)
Re:I call Bull (Score:1)
Re:I call Bull (warrented skepticism) (Score:2)
If I can rephrase the skepticism expressed in the parent post: most publicly available information posted on web sites will not yield any startling analysis that can't simply be gained by reading annual and quarterly reports.
There was some comment in response to the parent post that the data mining company was licensing data. This also sounds suspicious. There is an SEC regulation called FD (for Fair Disclosure). This regulation states that you cannot preferentially provide investor information to so
mining online news stories for word connotations (Score:3, Interesting)
I posted the preliminary code online in the perl newsgroup.
google "data mining" "news" "perl" etc
Apply bayesian filter for buy/sell advice (Score:2)
Hmm, might be worth using it as an excuse to play with Ruby.
Wait a sec (Score:3, Interesting)
So you wrote a program that would read some stories that said the stock market was going down, and it told you the market was down? Did your program also see if weather news reports contained words like "rain" and "downpour" and hence "predict" rain?
Re:Wait a sec (Score:2)
Infact, in most places in the world, predicting the following day's weather to be the same as the present day's weather would be reasonably accurate. It could be further refined by taking into account local weather patterns. Of course in the stock market this should be corrected for in projections using simple statistics, and is done so by most market participants (most weighted by ma
Re:mining online news stories for word connotation (Score:1)
One thing that would need to be done is collect a set of the most important words. I never had time to do that (this was a senior project for my comp sci degree).
I just came up with a set of critical words on my own (probably 30 words or so, of negative, positive and neutral connotational value).
But if you had completed a collection a
Re:mining online news stories for word connotation (Score:1)
Re:mining online news stories for word connotation (Score:2)
So, is this code still up somewhere?
Re:mining online news stories for word connotation (Score:1)
But here was the url:
http://www.geocities.com/uhdseniorproject [geocities.com]
now gone, and no cache.
But I still think the idea is a great one...
Realtime News Analysis (Score:2, Interesting)
Re:Realtime News Analysis (Score:2)
Yeah, riiiight. Who knew it was so easy! Baloney.
I read financial news stories, and even with my human brain focusing on the article it is difficult to tell whether the information presented is a positive or a negative indicator for the price of the stock.
Here's an example:
"XYZ Corp toda
Will it work? Yes and no. (Score:2)
But the real problem with everything like this is.. even if it works well for many things... there will be those who will try to missuse it.. and finding all those will be very hard. Further it only takes one major problem case and your nice product becomes a laughing stock.
This is great (Score:2, Interesting)
This is a good thing for mankind.
Re:This is great??? (Score:1)
Re:This is great (Score:1)
Re:This is great (Score:1)
Instead of trying to fully analyze the economy, let's try a relatively simple example: how do you determine who is in poverty in the USA? Do you go by a set income (i.e. anyone who makes less than 18,000/year). If so, what if someone lives in an expensive city vs. an inexpensive rural area? Is it going to be the same amount in each state?
Or is poverty determined by lifestyle? If you can't afford health insurance are you in
Re:This is great (Score:2)
There is no "REAL" market! (given the context of the thread I think you mean "market" and not "economy"). There is no holy grail of a perfect technical stock picking algorithm.
We could have all the information in the world and all the computing power to analyze it and we would still not be able to predict the market because it is driven by p
Re:This is great (Score:1)
Re:This is great (Score:2)
What a smarmy repy. I have opinions on the issues and addressed them, but you have to resort to personal attack to make yours. Every point you replied to had a personal insult attached. Since I make my living in the market my thoughts on self-delusional stock-picking algorithms may have some validity. Surely you don't think that just becaue some f
Re:This is great (Score:1)
the rise of the machines... (Score:3, Insightful)
I remember back in grad school in the late 90s I worked on a major project to design an intelligent agent based system including the same functionality, but, in addition to pulling information off the internet, it could also take into account whatever other information could be gathered and interfaced into it (for example, there is also a lot of content on TV which could be fed into a system, in addition to the online data). It was a design project though and not implemented, perhaps I will need to resurrect it!
I do think the whole area of quantitative or at least semi-quantitative analysis of information, both textual and numerical, is going to explode over the next few years, driven by vast amounts of incredibly cheap computing power and bandwidth. Computer applications do amazing stuff right now, but five years from now truly "intelligent" applications will exist. The term "artificial intelligence" has fallen out of fashion, perhaps a sign of how common place these systems have now become.
As an example, our local phone company has a voice recognition system which actually works reasonably well, much, much better than anything 5-10 years ago. We are certainly making progress.
This is nothing new (Score:1)
Re:This is nothing new (Score:2)
"The origins of the abacus are disputed, suggestions including invention in Babylonia and in China, to have taken place between 2400 BC and 300 BC."
Granted, there has been some improvement since then.
Ties to Majestic 12? (Score:2, Interesting)
Re:Ties to Majestic 12? (Score:1)
> connections to Majestic 12 (http://www.majestic12.co.uk/ [majestic12.co.uk] [majestic12.co.uk])?
As the founder of the Majestic-12 project I can assure you that we are not related in any way, shape or form.
Will it work? (Score:2, Informative)
Re:Will it work? (Score:1)
Several groups do this (Score:3, Interesting)
- An eBay crawler that could estimate the number of auctions and average selling price to predict whether eBay would make their earnings target or not. eBay quickly blacklisted their IP space, so they started using a bunch of open proxies they found.
- By analyzing client/server communication for the Sims Online, they discovered that each connection was assigned a sequentially incrementing connection ID number. By looking at the rate at which the connection ID numbers were increasing each time they logged in, they determined that the Sims Online wasn't going to be nearly as popular as Electronic Arts was forecasting.
- They talked about placing a camera somewhere in Union Square (in SF) to monitor the entrace to Tiffany's during the holiday shopping season, and doing image analysis to determine what percentage of shoppers left the store with a Tiffany's bag in hand.
- Monitoring wireless carriers' spectrum to determine what percentage of GSM/CDMA channels were in use for data vs. voice. The communication itself is encrypted of course, but you can still tell whether a channel is carrying voice or data. They wanted to determine if wireless carriers forecasts about revenue from data services were accurate.
Re:Several groups do this (Score:1, Flamebait)
The reason? I just may send my CV off!
Re:Several groups do this (Score:2)
Re:Several groups do this (Score:1)
Well, then they should analyze gold (Score:2)
Seriously, just watch what happens when the fed decides to print up money to try and stall off a cascading credit collapse. They will print up some, but that will make things worse because it will drive up
Re:Well, then they should analyze gold (Score:2)
Gold, historically has underperformed any other asset class. Over the last 200 years. Over the last 50 years. And over the last 10 years. Gold is traditionally seen as a good inflation hedge, for obvious reasons, but it's not a good long term investment.
The US government will never resort to seignorage (printing of money) as you see
Re:Well, then they should analyze gold (Score:2)
It's really bizarre. It's like they've never bothered to read any economics texts ever.
Re:Well, then they should analyze gold (Score:1)
Re:Well, then they should analyze gold (Score:2)
Investing in gold is also not productive, from an economic perspective, because gold is useless. You need money to make mor
Re:Well, then they should analyze gold (Score:2)
Almost every part of your post is wrong. The US government's debt (Federal debt only) is about 40% of US GDP. Compare this to countries like Japan which are well over 100%.
I didn't say the US government, even though their debt is nice and high too. It's the total debt in the US economy vs what the economy puts out and economic growth. (not to mention unfunded promises)
Gold, historically has underperformed any other asset class. Over the last 200 years. Over the last 50 years. And over the last 10 ye
Differing Research Methods (Score:1)
This is not new (Score:1)
Maybe they'll soon announce a deal with Google? (Score:1)
Difference between a Prediction and a Summary (Score:1)
I like to highlight that there is a difference between a Prediction and a Summary. From what I read so far, the tool posted in the article generates a summary, which maybe used as a prediction.
Let s(t) be the Summary of a system (in this case, the economy) at any given time, then:
A prediction, p(S), would be a prediction based on a set of summary S, where: S == {s(t), s(t-1), s(t-2),
One can always make a prediction based on a very small number of summaries. |S| = 0 is a guess. |S| =
Real-Time? (Score:1)
The more things change... (Score:1)
I have seen ones that scanned EDGAR filings, (got canceled when the company was destroyed in the 9/11 attack), campaign contributions (works wonderfully for the telco and other highly regulated industries). patent filings (generally surprisingly well, though no one knows why), job adss, and many others.
I even heard of one that analyzed free internet porn...(insert your favorite joke here, but it actually was a fairl