The Crazy Abilities of Mobile Phones for 3D Data Collection

Pixel8Earth
7 min readSep 20, 2019

--

One of the technology trajectories that get us most excited at Pixel8 is the exponential improvements in smart phone cameras. The arms race between Google, Apple and Samsung has created a subsidized technical marvel that 35% of the world’s population have in their pocket. It wasn’t long ago that a smart phone photo meant a small grainy JPEG you could text to a friend. Those days have rapidly disappeared and we are diving head long into an amazing 3D future. To lay this out we will cover capabilities developed by 1) Apple, 2) Google and 3) Samsung et. al.

Apple

Apple often gets knocked for being behind the competition with the iPhone, but for aspects of 3D capture they’ve been leading. Specifically Apple has provided TrueDepth for iOS since WWDC 2018. There are two fascinating aspects to Truedepth 1) the ability to create a per pixel depth map for 2D images taken by the iPhone and 2) streaming 3D point clouds of objects seen by the iPhone camera. For the static 2D depth map the iPhone generates a distance value from the camera to each pixel instead of the “color” it normally detects taking an image. Here is an example from Apple:

iPhone Depth Map Generation from Two Cameras

You’ll notice that the depth map is generated from two images taken by two different cameras (the telephoto camera and the wide-angle camera). For the iPhone it is the dual cameras that make the depth map possible:

Because the two parallel cameras are a small distance apart on the back of the device, similar features found in both images show a parallax shift: objects that are closer to the camera shift by a greater distance between the two images. The capture system uses this difference, or disparity, to infer the relative distances from the camera to objects in the image, as shown below.

This is similar to the stereo pairs used on satellites to create 3D models of terrain just at a much smaller scale. In fact that is one of the biggest challenges with current smart phone generated depth maps — the scale. For Apple’s TrueDepth distance calculations are only accurate to five meters. So, it’s mostly useful for portraits, which is Apple’s focus for the feature. That said depth maps are critical inputs for photogrammetry and ranges are likely to improve. It is definitely going in the right direction.

This brings us to Apple’s more impressive innovation — streaming 3D point clouds. Building off of the depth map generation in TrueDepth Apple takes frames from their video feed and calculates a point cloud from each frame’s depth map. Which is pretty mind blowing. From the depth map covered above Apple calculates a “Z” distance value for each pixel coordinate “U” and the principal point “V” as a function. To generate the 3D point cloud they calculate a “X and “Y” position by subtracting the pixel coordinate from the principal point, multiplying by the depth and dividing by the focal length. Plugging this equation for “X” and “Y” into an intrinsic matrix then generates the point cloud:

Apple’s Equation for Calculating a Point Cloud from a Depth Map

Similar to original depth map this point cloud is made possible by harnessing two cameras 1) and infrared (IR) camera and 2) a traditional RGB camera. The IR camera provides depth and the RGB color textures resulting in a RGB-D output. In action this creates a streaming point cloud sent to the phone and viewed through the camera. Startup’s like 6D.ai have leveraged this capability brilliantly to do indoor mapping:

One downside to the point clouds is they suffer similar distance constraints for accuracy as the static depth maps. The accuracy of the points clouds deteriorates as a square of the distance (D²) from the camera. The combination of TrueDepth and point clouds has teed up Apple well for their investments in augmented reality and also set the stage for a powerful 3D capture platform.

Google

While Apple took the early lead with depth maps Google was not sitting idle. Apple’s one to one relationship between hardware and software make it a lot easier to push out new capabilities requiring a specific camera set up (dual cameras and an IR camera in this case). Google’s Android on the other hand is deployed across a plethora of different device and hardware manufactures. Although the Pixel has given them a consolidated platform to test with, and this is where we’ve been seeing their new features pushed first. Depth maps were no exception. Earlier this year Android Q started testing depth maps for images taken with a Pixel. Amazingly this worked for all the generation of Pixels including those with only one camera.

How do you do stereo based depth mapping with only one camera? Google had been experimenting with using deep learning models to simulate depth maps from images taken with a single camera. To do so Google used a classic MacGyver style rig to collect training data for their model:

Google Training Data Collection for Depth Map Prediction

The upshot is now that Android Q is publicly released as Android 10 anyone running the OS can generate depth maps with their phone. This radically improves the percentage of the smart phone market that can generate depth maps, which is marvelous. Google didn’t stop with static depth maps and have also added the concept of dynamic depth maps a.k.a augmented reality photos:

An augmented reality (AR) photo is an image that contains the pose of the capturing device, lighting estimate information, horizontal and/or vertical surface planes in the world, and camera intrinsics.

This approach creates a host of fascinating metadata and data combinations.

Dynamic Depth stores the pose (that is, the position and orientation) of the camera(s) relative to the world, or the poses of objects (e.g. 3D assets) in the world. This enables applications to use multiple images together, as when mapping depth data onto a photograph, and provides information about the image capture, such as the position and orientation of the image sensor.

This also includes data placing objects in a relative coordinate system called “Realm” an absolute coordinate system called “Earth” (a.k.a. WGS84). Lastly there is an “Object” coordinate system for aligning with cloud anchors. We’ve not seen a quantification of accuracy for Google’s depth maps, but it is likely to exhibit a similar pattern to Apple.

Over all Google has really pushed the wide scale availability and adoption of 3D capture to the mobile masses. This is key to enabling a large percentage of the population to participate in all the possible opportunities 3D brings.

Samsung, Huawei, LG etc.

While Google and Apple get most of the press; a variety of hardware vendors have quietly been pushing the envelope on multiple fronts. Arguably Samsung has been most progressive in next generation camera technologies. Specifically, developing “time of flight” cameras — where the phone shoots a laser and times it return to calculate depth. This is the same principle that LiDAR and older technologies like Microsoft’s Kinect operate on. While Huawei, LG, Oppo and Honor View have all been been shipping phones with ToF camera’s this year Samsung has gone the next step and shipped a 3D scanning capability.

Samsung recommends limiting 3D scanning to objects smaller than 80cm, but the direction and capabilities really start to open the aperture to what is possible. The amazing perspective, we find, is that the same capabilities as high end 3D laser scanners costing $75,000+ are turning up in commodity phones. Granted the ranges are vastly different at this point, but arguably the range of smart phones is only going to increase based on current trajectories.

The best part of the “camera phone” arms race is we, the app developers, and consumers only benefit. Apple is broadly expected to shipping a ToF camera in their 2020 iPhone to enable better augmented reality experiences. Possibly paired with an AR glasses release?

Conclusion

When we started looking at how we could scale collecting data to build a high definition 3D map of the globe we thought through several scenarios. While the need to leverage multiple sources was front of mind there was a missing component in the traditional mix of satellites, aerial, drones and cars. Each of the traditional platforms had significant operational costs to scale collection globally. Only a few technology player have the resources to pull it off. As a start up we certainly didn’t.

This lead us down the path of looking to smart phones. While Apple, Google and Samsung’s use cases aren’t global 3D mapping specific — the ingredients are all there. The capabilities are all improving. The beauty is the “arms race” across providers is subsidizing technical capabilities far beyond what they’d cost on their own. The market for phone hardware is intrinsically tied to massive software markets with fabulous margins. For the application developer that means your per unit hardware cost is functionally zero. When the competition is other apps it is easy to overlook this. When the co-opetition is expensive dedicated professional hardware and operators it potentially changes the game. At least that is what we want to find out.

--

--

Pixel8Earth
Pixel8Earth

Written by Pixel8Earth

We are building a multi-source 3D map of the globe one image at a time.

No responses yet