Geoffrey Hinton has doubts about the next AI
[ad_1]
In-depth learning ushered in the latest revolution in AI, transforming the computer’s vision and field as a whole. Hinton thinks so deep learning should be almost everything that is needed to completely repeat the human mind.
But despite rapid progress, there are still major challenges. Insert a neural network into an unknown data set or foreign environment, and it shows that it is fragile and flexible. Driving cars and essay writing language are surprising to creative creators, but things can go awry. AI visual systems can be easily confused: on the one hand the known cup of coffee would be unknown from above if the system had not been trained in this approach; and can also be taken with the manipulation of a few pixels, the panda ostrich or the school bus.
GLOM addresses two of the most difficult problems in visual perception systems: understanding the whole scene in terms of objects and their natural parts; and recognizing objects when viewed from a new perspective. (GLOM’s approach is in focus, but Hinton hopes the idea can be applied to language as well).
An object like Hinton’s face, for example, is made up of the dog’s tired eyes (they ask too many people; they get little sleep), a mouth and ears, and a prominent nose. -especially most grays. And given the nose, it is easily recognizable even in the profile view at first glance.
These two factors — the whole-part relationship and the point of view — are essential, from Hinton’s point of view, to knowing how humans view. “If GLOM ever works,” he says, “perception will do so in a way that is much more human than current neural networks.”
Assembling the parts as a whole, however, can be a difficult problem for computers, as the parts are sometimes ambiguous. A circle can be an eye or a donut or a wheel. As Hinton explains, the first generation of AI visual systems sought to recognize objects, primarily based on the geometry of the part-whole relationship — the spatial orientation between parts and between parts and wholeness. The second generation, however, was largely based on in-depth learning — allowing the neural network to train in large amounts of data. With GLOM, Hinton combines the best aspects of both perspectives.
“There’s a kind of intellectual humility that I like,” says Gary Marcus, founder and CEO of Robust.AI and a well-known critic who is dependent on deep learning. Marcus admires Hinton’s willingness to question something that brought him fame, acknowledging that it doesn’t quite work. “She’s brave,” he says. “And it’s a great checker to say ‘I’m trying to think outside the box.'”
GLOM architecture
In Cultivating GLOM, Hinton tried to model some of the mental shortcuts (intuitive or heuristic strategies) that people use to make sense of the world. “GLOM, and a big part of Geoff’s work, is to study what heuristics look like, to build neural networks that can have that heuristic themselves, and as a result, to show that networks do better visually,” says Nick Frosst. A Toronto-born computer scientist who worked with Hinton on Google Brain.
With visual perception, one strategy is to study parts of an object, such as different facial features, and thus understand the whole. If you see a particular nose, you may notice it as part of Hinton’s face; it is a whole-part hierarchy. To build a better view system, Hinton says, “I have a strong intuition that we need to use whole-part hierarchies.” The human brain understands the whole part of composition by creating what is called an “analytical tree,” a branching diagram that shows the hierarchical relationship between the whole, its parts, and its subsections. The face itself is at the top of the tree, and the components of the eyes, nose, ears, and mouth form the lower branches.
One of Hinton’s main goals with GLOM is to replicate the analysis tree in the neural network — which would distinguish it from earlier neural networks. For technical reasons, it is difficult to do. “It’s hard because each image would be analyzed by a person in a single parse tree, so we’d want a neural network to do the same thing,” Frosst says. “It’s hard to get something with static architecture (neural network), to take on a new structure — a pars tree — for every new image you see.” Hinton has made several attempts. This is a major review of the previous GLOM 2017 trial, combined with other advances related to the field.
“I’m part of a nose!”
GLOM vector
The general way to think about GLOM architecture is as follows: the image of interest (say, a picture of Hinton’s face) is divided into a grid. Each region of the network is a “location” in the image; one location may have the irises of one eye, the other may have the tip of the nose. There are five layers or levels in each location on the network. And level by level, the system makes a prediction with a vector that represents the content or information. At the level near the bottom, the vector that locates the tip of the nose can announce, “I’m part of a nose!” And at the next level, when building a more consistent representation of what is being seen, the vector can predict, “I’m part of a face in side angle view!”
But then the question is, do the vectors of the same level match? When they agree, the vectors point in the same direction, with the same conclusion: “Yes, we are both on the same nose.” Or climb up the parse tree. “Yeah, we’re both on the same face.”
Seeking consensus on the nature of an object — what exactly an object is, after all — GLOM vectors are repeated, position by location and layer by layer, with adjacent vectors next to them, as well as above and below the lower vectors. .
However, Hinton says the network is nowhere near anything “average or desired”. It makes a selective average with predictions about similarities. “This is well known in America, it’s called the echo chamber,” he says. “What you do is you only accept the opinions of people who already agree with you; and then what happens is that you get an echo chamber, that a bunch of people have exactly the same opinion. GLOM really uses that in a constructive way. ”The analogous phenomenon of Hinton’s system is these“ islands of consensus ”.
“Imagine a bunch of people in a room shouting small variations of the same idea,” says Frosst, or imagine these people as vectors that show slight variations in the same direction. “After a while they would come up with a single idea, and everyone would feel stronger because others around them have confirmed it.” Thus, GLOM vectors reinforce and augment collective predictions about an image.
GLOM uses these islands of agreed vectors to perform the trick of representing the analysis tree in a neural network. On the other hand, some recent neural networks use vector agreement activation, GLOM uses the agreement representation—Build representations of online things. For example, when several vectors indicate that they are all part of the nose, a small set of consensus collectively represents the face-to-face nose in the network parse tree. Another small set of matching vectors can represent the mouth in the parse tree; and the large set at the top of the tree would represent the conclusion that the image is generally the face of Hinton. “The way to represent the Parse tree here,” Hinton explains, “is that you have a large island at the object level; the parts of the object are smaller islands; the subsections are smaller islands, and so on.”
According to Hinton’s longtime friend and collaborator Yoshua Bengio, a computer scientist at the University of Montreal, if GLOM were able to solve the engineering challenge of representing the parse tree in a neural network, it would be a feat – it would be important for neural networks to function properly. “Geoff has often created surprisingly strong intuitions throughout his career, many of which are right,” says Bengio. “That’s why I pay attention, especially when it feels as strong as with GLOM.”
The strength of Hinton’s conviction lies not only in the analogy of the echo chamber, but also in the mathematical and biological analogies that inspired and justified certain design decisions in GLOM’s new engineering.
“Geoff is a very unusual thinker because he is able to exploit complex mathematical concepts and integrate theory with biological limitations,” says Sue Becker, a former student at Hinton who is now a computational cognitive neuroscientist at McMaster University. “Researchers who focus on mathematical theory or neurobiology are much less likely to solve how machines and humans can learn and think in an engaging infinite puzzle.”
Turning philosophy into engineering
So far, Hinton’s new ideas have been well received, especially in the world’s largest echo chambers. “On Twitter, I like them,” he says. And a YouTube the tutorial proclaimed the term “MeGLOMania”.
Hinton is now the first to acknowledge that GLOM is something more than philosophical reflection (he spent a year in philosophy undergraduate studies before moving on to experimental psychology). “If an idea in philosophy sounds good, it’s good,” he says. “How would you ever have a philosophical idea that looks like garbage but comes true? That wouldn’t pass as a philosophical idea. ”Science, by comparison, is“ full of things that look like absolute garbage, ”but it works very well, for example, neural networks, he says.
GLOM is designed to be philosophically compelling. But will it work?
[ad_2]
Source link