Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Google AI Media Open Source

Google Open Sources Its Image-Captioning AI (zdnet.com) 40

An anonymous Slashdot reader quotes ZDNet: Google has open-sourced a model for its machine-learning system, called Show and Tell, which can view an image and generate accurate and original captions... The image-captioning system is available for use with TensorFlow, Google's open machine-learning framework, and boasts a 93.9 percent accuracy rate on the ImageNet classification task, inching up from previous iterations.

The code includes an improved vision model, allowing the image-captioning system to recognize different objects in images and hence generate better descriptions. An improved image model meanwhile aids the captioning system's powers of description, so that it not only identifies a dog, grass and frisbee in an image, but describes the color of grass and more contextual detail.

This discussion has been archived. No new comments can be posted.

Google Open Sources Its Image-Captioning AI

Comments Filter:
  • Finally can I build that automatic nemesis recognition missile.

  • When I dreamed of having an intelligent computer a decade or two ago, I never dreamed that it could only be accomplished by sending queries to some big corporate-controlled cluster and getting responses back. I don't want to use Siri or Echo, because of this spying which is so far inherent to AI, and because Amazon and Google exist mainly to sell us stuff, to exploit us and get us to buy more of something. When open-source AI is capable of doing something useful, then I will run it on my own machine.

    But c

    • Re: (Score:2, Informative)

      by Anonymous Coward

      It does stay on your machine. The Google Cloud Compute API doesn't even have image captioning as a service right now. If you want to test this: you're going to have to get a nice NVIDIA GPU and compile their Tensorflow code by following the Readme.MD on github.

      The reality is this isn't a useful product for robotics because the output of the network is a natural language caption. If you wanted to use this model for robotics, you would chop off the classifier and use the pre-trained Inception v3 model for wha

    • I don't want to use Siri or Echo, because of this spying which is so far inherent to AI, and because Amazon and Google exist mainly to sell us stuff, to exploit us and get us to buy more of something.

      Did anyone ever put a gun on your head and made you buy/say/act against your wish? you get exploited when you are dumb; as simple as that. Increase your awareness.. don't blame/whine your opponent for being too strong.

  • With the advances in machine learning and the easy availability of tools like this, it would be so very satisfying to put serious time and energy in studying these interesting topics. However, like probably several others here, with a mortgage and in my case twin kids coming, it is near impossible to break away from the day job...
    • by Anonymous Coward on Monday September 26, 2016 @01:08AM (#52960559)

      If you've got $1200 you've got enough money to play in the arena. If you want to do "DeepMind" level work: you need a substantially larger farm of GPUs.

      If you don't feel a need to replicate the latest flashy advances: there's still plenty of opportunity to make really interesting contributions with an NVIDIA GTX 960 training networks on MNIST 28x28x1 Resolution Images.

      Time requirement is mostly reading in 15-30 minutes chunks. It took me a year to read enough to feel fluent.

      Start here:
      http://www.dspguide.com/ch26.htm
      Then read these:
      https://en.wikipedia.org/wiki/Artificial_neuron
      https://en.wikipedia.org/wiki/Artificial_neural_network
      https://en.wikipedia.org/wiki/Multilayer_perceptron
      https://en.wikipedia.org/wiki/Softmax_function
      http://stats.stackexchange.com/questions/126238/what-are-the-advantages-of-relu-over-sigmoid-function-in-deep-neural-network
      http://image.slidesharecdn.com/cnn-toupload-final-151117124948-lva1-app6892/95/convolutional-neural-networks-cnn-44-638.jpg?cb=1455889178
      (TLDR: Using the Sigmoid/Tanh for your transfer function suffers from something called "vanishing gradients" where the derivative(used for "backpropagation") approaches zero as the weights of the network become large. Restricted Boltzmann Machines(RBM's) use an alternative to backpropagation known as "contrastive divergence", and so it was popular to stack these to form "deep belief networks"(just a multi-layer RBM trained one layer at a time). The ReLU transfer function has grown popular because it solves this problem more easily, which means you can safely ignore RBMs and DBNs from your reading, at least initially.)

      Then read these:
      https://en.wikipedia.org/wiki/Support_vector_machine
      https://en.wikipedia.org/wiki/Convolutional_neural_network (Will explain what "Pooling Layers" are)
      https://www.reddit.com/r/MachineLearning/comments/3klqdh/q_whats_the_difference_between_crossentropy_and/

      Difference between "regression" and "classification":
      A regression network outputs the activation of the output neurons directly, while a classifier network uses the softmax function to ensure that the sum of all the output neurons' activations add up to one.

      The most important thing to understand: it is trivial to train a neural network to perform well on it's own training data(that's what backpropagation DOES). What is difficult is collecting enough data(preferably labeled) to where you can hold out a significant portion for validation(prevents overtraining), and another set of holdout data for TESTING. Your goal is to teach the network to generalize to work on the general case. This is called "regularization". The test data hold out set is for verifying that the validation data wasn't overtrained via "hill climbing".
      Cool trick: https://en.wikipedia.org/wiki/Dropout_(neural_networks)
      http://fastml.com/regularizing-neural-networks-with-dropout-and-with-dropconnect/
      https://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_topologies (Neural Networks meet Evolutionary Algorithms)
      https://people.cs.uct.ac.za/~gnitschke/projects/papers/2009-Niche%20Particle%20Swarm%20Optimization%20for%20Neural%20Network%20Ensembles.pdf

      Other things to know: learning rate is how quickly the network adjusts it's weights(how quickly you jump around during stochastic gradient descent). Bigger steps = faster approach of local minima, but you tend to "overshoot" the high-performing valleys and get stuck on the low-performing surface. This is why it's generally a good idea to "aneal" your learning rate over time.
      http://sebastianruder.com/optimizing-gradient-descent/

      Other cool things to learn about:
      Autoencoders and "Transfer Learning" IE. You can get most of the value of having Google's enormous GPU farms by simply downloading their pretrained inception models, then using them as pretrained features for other experiments.

      Caffe vs. Tensorflow vs. Keras vs. Torch? I vote: Tensorflow.
      https://www.tensorflow.org/versions/r0.9/tutorials/mnist/beginners/index.html

      Good luck!

      • Wow! Thanks so much for your friendly and ultra helpful reply. This will really help getting me started. There is so much cynisim here on /. - it is wonderful to read your very informative reply. Thanks again !
    • it's all about desire ordering..D1, D2, D3 .. you can always break-away from foo if your desire for bar is higher.. if you can giveup D3 for D2 and D2 for D1; you can realize any D1. [no one forced anyone to hv a mortgage or even kids..or raise them in expensive places/life-style.. or made one sit in a cubicle to pickup pay-check].. when a person lacks courage or passion for D1, he/she starts blaming the environment or say too much cynisim around.
      • Not blaming anything.
        No one indeed forced me to have kids, but it was something -let's call it D1- that we found very important, more important than my other personal interests.
        Also having your own house paid off is actually a good element to keep poverty at bay when old
        So, not blaming anything, not even unhappy with my job, and my own family priorities are more important. ML is a personal interest that I hope to develop.
        • sorry then why do u say something is going to be "so very satisfying?" I assume so-very means it falls in the top say 5 desires of a person. I like many things in life..but I wont' call them 'so very satisfying' ..in that case I will start throwing away stuff which is less important and focus on my top few..in fact life taught me I can't even have D2 if I wanted D1
  • by Gravis Zero ( 934156 ) on Sunday September 25, 2016 @11:19PM (#52960239)

    The nerve of this infernal program is so obscene it must be untenable! It captioned my dick pic as "YAUPFAN (Yet Another Unimpressive Penis From A Narcissist)"! Kudos for having it create it's own acronyms but I won't stand for a machine generated insult and neither should you! Though if you have a standing desk, it's cool, I totally get it. ;)

  • Average Slashdotter: Lemme try it on a selfie!

    Computer returns analysis title: "300 Lb. Fatass Masturbating"

    Slashdotter: Eerie!

  • Still not as impressive as the one that invented toothpaste and made art.

It is easier to write an incorrect program than understand a correct one.

Working...