Good Reads | The Latent Element

At Google, we have successfully applied deep learning models to many applications, from image recognition to speech recognition to machine translation. Typically, our machine learning models are painstakingly designed by a team of engineers and scientists. This process of manually designing machine learning models is difficult because the search space of all possible models can be combinatorially large — a typical 10-layer network can have ~10¹⁰ candidate networks! For this reason, the process of designing networks often takes a significant amount of time and experimentation by those with significant machine learning expertise.

Our GoogleNet architecture. Design of this network required many years of careful experimentation and refinement from initial versions of convolutional architectures.

To make this process of designing machine learning models much more accessible, we’ve been exploring ways to automate the design of machine learning models. Among many algorithms we’ve studied, evolutionary algorithms [1] and reinforcement learning algorithms [2] have shown great promise. But in this blog post, we’ll focus on our reinforcement learning approach and the early results we’ve gotten so far.

In our approach (which we call “AutoML”), a controller neural net can propose a “child” model architecture, which can then be trained and evaluated for quality on a particular task. That feedback is then used to inform the controller how to improve its proposals for the next round. We repeat this process thousands of times — generating new architectures, testing them, and giving that feedback to the controller to learn from. Eventually, the controller learns to assign a high probability to areas of architecture space that achieve better accuracy on a held-out validation dataset, and low probability to areas of architecture space that score poorly. Here’s what the process looks like:

We’ve applied this approach to two heavily benchmarked datasets in deep learning: image recognition with CIFAR-10 and language modeling with Penn Treebank.

from Pocket

via Did you enjoy this article? Then read the full version from the author’s website.

Musings about when and how this research might manifest in human products are welcome. Tweet to me @dilan_shah.

At VRLA this past month, I had the opportunity to see first-hand how the technology gap is closing in terms of photorealistic rendering in virtual reality. Using the ODG R-9 Smartglasses, Otoy was showing a CG scene rendered using Octane Renderer that was so realistic I couldn’t tell whether or not it was real. The ORBX VR media file that results when you build a scene using Octane can be played back at 18K on the GearVR. Unity and Otoy are actively working to integrate their rendering pipeline in the Unity2017 release of the engine. And in short, with a light-field render option, you can move your head around if the device’s positional tracking allows for it.

Octane is an unbiased renderer. In computer graphics , unbiased rendering is a method of rendering that does not introduce systematic errors or distortions in the estimation of illumination. Octane became a pipeline mainly used for visualization work, everything from a tree to a building for architects, in the early 2010s. About 13 years ago, Sony Pictures Imageworks enabled VFX knowledge that is coming to VR content from Magnopus. How VR, AR, & MR Are Driving True Pipeline Convergence, Presented by Foundry

change of pace with the voxel, so named as a shortened form of “volume element”, is kind of like an atom. It represents a value on a regular grid in three-dimensional space.

This is analogous to a texel, which represents 2D image data in a bitmap (which is sometimes referred to as a pixmap). As with pixels in a bitmap, voxels themselves do not typically have their position (their coordinates) explicitly encoded along with their values. Instead, the position of a voxel is inferred based upon its position relative to other voxels (i.e., its position in the data structure that makes up a single volumetric image). In contrast to pixels and voxels, points and polygons are often explicitly represented by the coordinates of their vertices. A direct consequence of this difference is that polygons are able to efficiently represent simple 3D structures with lots of empty or homogeneously filled space, while voxels are good at representing regularly sampled spaces that are non-homogeneously filled. [1]

Within the last year, I’ve seen Google’s Tango launch in the Lenovo Phab 2 Pro (and soon the Asus ZenFone AR), Improbable’s SpatialOS Demo Live, Otoy’s ODG R-9’s at Otoy’s booth at VRLA of an incredibly realistic scene completely rendered using some form of point cloud data.

“The ability to more accurately model reality in this manner should come as no surprise, given that reality is also voxel based. The difference being that our voxels are exceedingly small, and we call them subatomic particles.”

[1] Wikipedia : Voxel
from Pocket

via Did you enjoy this article? Then read the full version from the author’s website.

The Latent Element

Embrace Paradox.

Category / Good Reads

Reblog: Using Machine Learning to Explore Neural Network Architecture

Reblog: Why Voxels Are the Future of Video Games, VR, and Simulating Reality