Recognizing Scenes Like the Brain Does

Recognizing Scenes Like the Brain Does 115

Posted by kdawson on Sunday February 11, 2007 @05:26PM from the software-meets-wetware dept.

Roland Piquepaille writes "Researchers at the MIT McGovern Institute for Brain Research have used a biological model to train a computer model to recognize objects, such as cars or people, in busy street scenes. Their innovative approach, which combines neuroscience and artificial intelligence with computer science, mimics how the brain functions to recognize objects in the real world. This versatile model could one day be used for automobile driver's assistance, visual search engines, biomedical imaging analysis, or robots with realistic vision. Here is the researchers' paper in PDF format."

Recognizing Scenes Like the Brain Does

This discussion has been archived. No new comments can be posted.

Search 115 Comments Log In/Create an Account

Comments Filter:

Does anybody know where to find the actual paper? (Score:1, Informative)

by Anonymous Coward writes: on Sunday February 11, 2007 @05:52PM (#17975790)

I hate when these articles talk about some research, but there isn't so much as a block diagram to show how the model works...

that's a generous view of it (Score:3, Informative)

by Trepidity ( 597 ) writes: <[gro.hsikcah] [ta] [todhsals-muiriled]> on Sunday February 11, 2007 @07:06PM (#17976350)

As someone in AI research myself, I'd say the more common reasons are:

1. The code is in a horrible hacked-together state and so not really fit for release, and nobody wants to put in the effort that would be needed to clean it up; or

2. The researchers don't want to release their code because keeping it secret creates a "research moat" that guarantees that they'll get to publish all the follow-up papers themselves, since anyone else who wanted to extend the work would have to first invest the time to reimplement it from scratch (this is more common in implementation-intensive areas like graphics)

Earlier work 1989-1997 on street scene analysis (Score:5, Informative)

by Wills ( 242929 ) writes: on Sunday February 11, 2007 @07:14PM (#17976400)

Apologies for blowing my own trumpet here, but there was much earlier work in the 1980s and 1990s on recognizing objects in images of outdoor scenes using neural networks that achieved a similarly high accuracy compared to the system mentioned in this article:
1. WPJ Mackeown (1994), A Labelled Image Database, unpublished PhD Thesis, Bristol University.

Design of a database of colorimetrically calibrated, high quality images of street scenes and rural scenes, with highly accurate near-pixel ground-truth labelling based on a hierarchy of object categories. Example of labelled image from database [kcl.ac.uk]
Design of a neural network system that recognized categories of objects by labelling regions in random test images from the database achieving 86% accuracy
The database is now known as the Sowerby Image Database and is available from the Advanced Technology Centre, British Aerospace PLC, Bristol, UK. If you use it, please cite: WPJ Mackeown (1994), A Labelled Image Database, PhD Thesis, Bristol University.
2. WPJ Mackeown, P Greenway, BT Thomas, WA Wright (1994).
Road recognition with a neural network, Engineering Applications of Artificial Intelligence, 7(2):169-176.

A neural network system that recognized categories of objects by labelling regions in random test images of street scenes and rural scenes achieving 86% accuracy

3. NW Campbell, WPJ Mackeown, BT Thomas, T Troscianko (1997).
Interpreting image databases by region classification. Pattern Recognition, 30(4):555-563.

A neural network system that recognized categories of objects by labelling regions in random test images of street scenes and rural scenes achieving 92% accuracy

There has been various follow up research since then [google.com]

Re:More importantly, where is the source code? (Score:2, Informative)

by NTiOzymandias ( 753325 ) writes: on Sunday February 11, 2007 @07:18PM (#17976444)

The paper claims the source code is (or will be) here [mit.edu]. Next time, ask the paper.

Re:not like the brain does. (Score:3, Informative)

by odyaws ( 943577 ) writes: on Monday February 12, 2007 @12:26AM (#17978836)

Disclaimer: I work with the MIT algorithms daily and know several of the authors of this work (though I'm not at MIT).
This paper's claim to recognize scenes like the brain does, is overdrawn. As far as i can tell from their paper (it is a journal version of their cvpr paper) only their low-level Gabor features are similar to what the brain does.

Their low-level Gabor filters are indeed similar to V1 simple cells. The similarity between their model and the brain goes a lot further, though. The processing goes through alternate stages of enhanced feature selectivity with roughly Gaussian tuning (the S layers) and pooling over spatial location and scale via a max operation (the C layers). If you read more papers from their lab, there is a significant amount of biological plausibility in both of these operations, and a great deal of effort has gone into tuning the various layers to behave in accordance with physiological data.
The rest of the paper uses the currently popular bag-of-features model, which is a model that discards all spatial information between image features, which i don't think the brain does.

The model is roughly equivalent to a bag-of-features, but with the nice feature (from a biologist's perspective) that it builds the bag in a biologically plausible way. The features themselves are picked randomly from natural images in a training stage that takes the place of human development. Discarding spatial information makes the model a lot more tractable, and it isn't clear what role spatial information plays in the processing of the ventral visual system, which is what their algorithm models.
Furthermore, for classification algorithms they consider a Support Vector Machine and Boosting. Both of these classifiers are certainly not comparable to what the brain does. Why not use a neural network if they aim is to mimic the brain?

They use these classifiers on top of their algorithm simply to determine how good the model was at extracting relevant feature information. Since they want to quantify how much information is there, it is wise to choose the best method they can to locate the information.
Furhtermore, they only conside feed-forward information, where research shows that there is at least as much information going back as there is going forward.

Feedback is definitely very important (this is what my own research is about), but feedforward accomplishes a lot with a vastly simpler computational model.
Don't get me wrong, it is still a nice paper, with good results. (however, all Caltech datasets are highly artificial, with objects artificially rotated in 1 direction) So, nice paper, but to compare it with the workings of the human brain is too much.

Here are the Caltech datasets they used: vision.caltech.edu [caltech.edu]. I think the "artificial" datasets you refer to are the "3D objects on turntable," which are a bit artificial. However, the images they refer to in the paper discussed here are from the Caltech-101 dataset, which consists of real-world images of objects from 101 different categories - most of the images are not at all artificial.

Re:nothing new (Score:4, Informative)

by kripkenstein ( 913150 ) writes: on Monday February 12, 2007 @03:03AM (#17979806) Homepage

I agree that the paper isn't revolutionary. In addition, it turns out that, after the 4-layer biologically-motivated system, they feed everything into a linear SVM (Support Vector Machine) or gentleBoost. For those that don't know, SVMs and Boosting are the 'hottest' topics in machine learning these days; they are considered the state of the art in that field. So basically what they are doing is providing some preprocessing before applying standard, known methods. (And if you say "but it's a linear SVM", well, it is linear because the training data is already separable.)

That said, they do present a simple and biologically-motivated preprocessing layer that appears to be useful, which reflects back on the brain. In summary, I would say that this paper helps more to understand brain functioning than to develop machines that can achieve human-like vision capabilities. So, very nice, but let's not over-hype it.

Fine paper, but why not quote all of PAMI ? (Score:5, Informative)

by HuguesT ( 84078 ) writes: on Monday February 12, 2007 @09:28AM (#17981734)
This is a nice paper by respected researchers in AI+Vision, however pretty much the entire content of the journal this was published in (IEEE Pattern Analysis and Machine Intelligence) is up to that level. Why single out that particular paper ?

Interested readers can browse the content of PAMI current and back issues [ieee.org] and either go to their local scientific library (PAMI is recognisable from afar by its bright yellow cover) or search on the web for interesting articles. Often researchers put their own paper on their home page. For example, here is the publication page of one of the authors [mit.edu] (I'm not him).

For the record, I think justifying various ad-hoc vision/image analysis techniques using approximations of biological underpining is of limited interest. When asked if computer would think one day, Edsgerd Dijkstra famously answered by "can submarine swim?". In the same manner, it has been observed that (for example) most neural network architectures make worse classifiers than standard logistic regression [usf.edu], not to mention Support Vector Machines [kernel-machines.org], which what this article uses BTW.

The summary by our friend Roland P. is not very good :
This versatile model could one day be used for automobile driver's assistance, visual search engines, biomedical imaging analysis, or robots with realistic vision
- There already exist working automated driving software. The december 2006 issue of IEEE Computers magazing [computer.org] was on them last month. Read about the car that drove a thousand miles [unipr.it] on Italy's road thanks to Linux, no less.
- Visual search engine exist, at the research level. The whole field is called "Content Based Retrieval", and the main issue is not so much to search, but to formulate the question.
- Biomedical image analysis has been going strong for decades and is used every day in your local hospital. Ask your doctor !
- Robotic vision is pretty much as old as computers themselves. There are even fun robot competitions like robocup [robocup.org].
I could go on with lists and links but the future is already here, generally inconspicuously. Read about it.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Recognizing Scenes Like the Brain Does 115

Recognizing Scenes Like the Brain Does More Login

Recognizing Scenes Like the Brain Does

Does anybody know where to find the actual paper? (Score:1, Informative)

that's a generous view of it (Score:3, Informative)

Earlier work 1989-1997 on street scene analysis (Score:5, Informative)

Re:More importantly, where is the source code? (Score:2, Informative)

Re:not like the brain does. (Score:3, Informative)

Re:nothing new (Score:4, Informative)

Fine paper, but why not quote all of PAMI ? (Score:5, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot