The Once and Future 3D Map
A Little Background
Way back in 2008 there was a session at WhereCamp titled “Is 3D Shit”. It was a “click bait” name to be sure, but it packed the room to discuss the value of 3D for mapping in 2008. Google and Microsoft had both made big investments to give cities gorgeous 3D treatments in their mapping platforms. A year earlier Google had launched Google SketchUp to crowdsource 3D buildings across the globe. Eleven years later the details from that session are pretty blurry. Chaitanya’s blog post (above) is the only record I can find of the discussion and he summed it up thus:
3D immersive environments are being prepared to provide a rich user experience and interaction. Thus allowing application developers of the future to build rich applications. I can think of a few use case scenarios in city planning, crime heat map patterns thru time, pollution heat maps thru time… Thats about all I can think of as potential applications at this point in time, but that is no reason to not build a data set that can open up the doors for developers to build applications that we today have not even thought of.
I do clearly remember Brian McClendon turning up, and after explaining Google’s rationale - us all feeling a bit silly for the title. In the clarity of hindsight, then and now, Brian was spot on. While 3D can at times feel like a superfluous feature for humans well versed with interpreting reality. It is critical for understanding both augmented reality (AR) and how robots (autonomous vehicles (AV), drones etc.) will navigate our future world. Where we struggled to find business use cases over a decade ago we are awash in them now.
The Problematic Mismatch Between Now and Then
The brilliant 3D work done by the community circa 2008 was primarily driven by the use of satellite and aerial imagery to generate models of cities and terrain. The science behind the capture and 3D renderings is fascinating and this video illustrating Google’s approach is brilliant. That said the 3D renderings from aerial/satellite tend to have a Monet quality where images are beautiful from a distance but a bit of messy melted butter up close.
Considering these images are taken mile(s) away it is not a bad trade off. We can see the photogrammetric difficulty satellite/aerial perspectives face generating detailed terrestrial views of the earth in this image from Spacenet below:
The challenge we face today is that the emerging use cases for 3D maps are primarily based on the need for terrestrial detail to enable immersive AR experiences and solid robotic navigation. Where multi view stereo models from satellite/aerial struggle the most is where the market needs the greatest detail.
Computer Vision to the Rescue?
To fill this gap in the market there has been a surge in start ups leveraging computer vision, advances in structure for motion and LiDAR to generate detailed terrestrial 3D models. Driven by the need for both detailed indoor and outdoor mapping there has been impressive advancements in generating 3D point clouds and meshes for a variety of use cases. In the collective sprint to enable a new paradigm of 3D modeling the rigor of absolute accuracy and spatial coordinate systems has been eschewed for the speed of relative accuracy and the simplicity of cartesian coordinates. The simple way to think of the difference between the two is absolute accuracy provides a rigorous x,y,z location on the earth’s surface with a margin of error for matching that location repeatedly. Relative accuracy, on the other hand, could tell you, with a margin of error, the height of a curb and accurately describe its shape, but not place that curb accurately on the earth in relation to other known spatial coordinates. The current focus on relative accuracy has meant massive progress on the 3D modeling problem, but in turn has created a potential anchor for getting to scale.
The Single Sensor Siren
Operating with just cartesian coordinates and relative accuracy works exceedingly well when you have just one source/sensor that is generating data. Measurements still work very well and you have all the ingredients you need for a navigable 3D map. This single data source approach also maps well to Silicon Valley’s ethos of winner-take-all investment. In success there is one de facto search engine, one e-commerce hub, one ride sharing service etc. While the reality is always more complicated it is a strong investment thesis though out the start-up ecosystem. Thus the idea of one sensor/platform providing “the” 3D map of the future is enticing. The allure of relying on relative accuracy is further reinforced by the small geographic scope of the vast majority to early stage efforts to leverage 3D maps for AV and AR. The boutique nature of early 3D efforts today make it easier to ignore the potential challenges of managing a global scale data model and the imagery collection needed to feed it.
Further complicating the path forward is the extraordinary expense of collecting high precision data with absolute accuracy. Capturing terrestrial data is primarily accomplished by specialty cars equipped with differential GPS (RTK), LiDAR and optical sensors. The vehicles can cost north of $200,000 and leasing data collection can cost upwards of $5,000 per kilometer. While a handful of players have the resources to do this at wide scale the economics are a significant inhibitor to persistent data updates and coverage.
A Car/Plane Imaging Platform in Your Pocket
What if we could take the photogrammetric power of Google’s aerial 3D modeling and the ground perspective or HD mapping vehicles then package into an app on your phone. That is a core enabler for Pixel8Earth to economically map the world in 3D. There are 2.7 billion smart phones globally enabling 35% of the worlds population to capture photos with location rich EXIF data. Turning those photos into georectified 3D models with high definition absolute accuracy is our mission. Success means not only do we have an economical data collection platform, but also the ability to conflate that data with any other geospatial data source for the planet. For instance both phones and cars are bad at capturing roofs and tall buildings.
Further, satellites and aerial assets are fabulous at capturing huge swaths of geography where granular terrestrial detail is less critical.
In short being able to use multiple sources/sensors (satellite, aerial, streetview, drone, LiDAR et. al.) for creating a 3D digital twin of the globe means we can get to true scale faster and adapt as new data collection capabilities emerge. Neither OpenStreetMap or Google Maps were built from a single source of data and our bet is neither will the future 3D HD map of the globe.
Delivering the Goods
This all sounds fantastic but where are we on delivering the core capabilities to deliver the idea. Step one a mobile app that will take standard smart phone photographs and generate a high fidelity 3D model from them.
From here we have the option to generate a mesh from the dense point cloud to fill in the details as a 3D image.
Obviously photogrammetric renderings are not where we’ve been spending out time, but you can begin to see the potential for photorealistic 3D renderings from the point clouds. Going back to dense point clouds we can leverage the fact we are georectifying the data and begin to stitch it together. First let’s get some more collects of the St. Julien to flesh it out.
Next let’s start to collect and align the rest of the block for Walnut St.
Then we can turn up the georectification volume up by conflating our smart phone ground collects with aerial imagery to build out our roofs.
The best part of this process is that as we add more detail to our collective 3D world the better we can align future contributions. The accuracy of the map continues to improve as more data and nuance is added to the surface. The second best part of the process is we have the potential to convert all the relative accuracy computer vision work to a spatial coordinate system with absolute accuracy. We can have the best of both worlds.
How do we Scale?
It is one thing to say there are 2.7 billion smart phones capable of collecting photos for our new 3D world, but an entirely different thing to make those contributions happen. We are forever inspired by OpenStreetMap and plan on testing the concept for 3D mapping by holding a “mapping party” in Boulder October 6th 2019. We’ll meet at the Bohemian Biergarten at 1pm and make a go at mapping downtown. Beers and pretzels at the pub afterwards.
Since any EXIF enabled photo is fair game we can augment community collections through a variety of public sources and potential partnerships. The key across these efforts will be creating a shared resource instead of walled gardens. The beauty of a spatial coordinate system is that we can align all our data in one global model and constantly improve on a collaborative ground truth in the process. We strongly believe there will be no single silver bullet for creating a high definition 3D map of the globe. Collectively a fusion of data sources, sensors and contributors can get us to the promised land.