OpenARK Drone-based 3D Reconstruction
CS 194-26/294-26 Fall 2020
Nitzan Orr, Shubha Jagannatha
link to our research paper
link to our video presentation
Our team has been working this semester to design, implement, and test two end-to-end pipelines for conducting 3D reconstruction via a drone in real time.
Some of the technical challenges we encountered which relate to class topics included:
We faced a number of other technical challenges with hardware, streaming, and working with the ZED API which will be discussed in more detail below.
Real time drone-based 3D reconstruction has an extremely powerful use case in drone navigation, particularly autonomous navigation.
Here are some of the typical issues faced when piloting a drone remotely:
With real-time 3D reconstruction, drones can generate a mesh to be used for navigation. This mesh can either be sent back to a human pilot for steering the drone or it can be used by the drone itself for autonomous navigation.
Additionally, the generated mesh can be used to map an environment for use in graphics and VR applications.
We split our research group into two teams: Team ARK (Shubha) and Team ZED (Nitzan).
Team ARK’s main focus was to create a pipeline using the Vive Center’s open source augmented reality SDK, OpenARK, as the base for our 3D reconstruction and try out OpenARK’s SLAM.
Team ZED’s goal was to use ZED’s built-in SDK as a base for 3D reconstruction and to try out ZED SLAM.
We chose to use the ZED 2 RGB-D camera which is known to work well in the outdoors. With each frame, we grab the RGB image, depth image, IMU data, and timestamp. On the drone itself, we chose to use the lightweight and powerful Jetson NX to run the SLAM algorithm and stream the data. Both of these were installed on the drone via custom-made mounts designed and 3D printed by our team.
Lastly, we aimed to stream the data to a powerful, on-site server to conduct 3D reconstruction. For our testing purposes, we streamed to our laptops.
Before attempting to get our “online” or real time 3D reconstruction working, our team had to work through the challenge of making OpenARK compatible with the ZED 2 camera. One of the biggest roadblocks was getting the output format of the ZED 2 camera to match the necessary inputs needed for OpenARK’s 3D reconstruction and SLAM algorithms.
OpenARK takes in the RGB images, depth image, IMU data, and timestamp from the camera. The resolution of images provided by the camera (1280 x 720) was different from the resolution needed for OpenARK (640 x 480), so our team had to find a means to modify our camera intrinsics to work well for a downsampled image.
After obtaining the downsampled images, we fed these images and IMU data into OpenARK’s SLAM algorithm which is built with OKVIS (open keyframe-based visual-inertial SLAM). Due to the “black box” nature of ZED’s API, our team was unsure of whether the IMU data required was calibrated or uncalibrated, so we had to run a number of tests to determine whether calibrated or uncalibrated IMU data gave the best results with OpenARK.
With the pose estimation obtained through OpenARK’s SLAM, we project the depth data into a point cloud. We iterate through this point cloud and integrate every measurement point into the TSDF volume. Lastly, we extract the triangle mesh from this volume using the marching cubes algorithm. Here is OpenARK’s 3D Reconstruction and SLAM pipeline.
Through our implementation and testing of Offline 3D reconstruction, our team was able to get good outputs from OpenARK’s 3D reconstruction but was unable to get great results from OpenARK’s SLAM. This is potentially because OpenARK SLAM’s loop closure functionality did not work at the time. Additionally, the timestamps we were receiving from the camera had a non-negligible amount of error since they were system timestamp, not hardware timestamps. We ultimately decided to use OpenARK’s 3D reconstruction with ZED’s SLAM output in our Team ARK pipeline. Here are some outputs with OpenARK’s 3D reconstruction and ZED’s SLAM.
We want the drone operator to be able to inspect the map in real-time as it's being recorded from the drone and reconstructed. However, the first step in 3D reconstruction is simultaneous localization and mapping (SLAM), and it requires high compute which is unavailable on-board the drone. We decided to remedy this issue by streaming captured data from the ZED 2 to a ground station computer. Visual SLAM requires an image and depth data, and improves its estimations with inertial data. Thus, our application encodes and sends 720p RGB images, depth images, and inertial measurement unit (IMU) data over synchronized streams at 15 fps over a WiFi connection. On the remote computer our application receives the data, and uses Visual-Inertial SLAM to estimate the camera pose for each frame.
On the ground station computer, we utilized the SLAM library provided by the ZED SDK to calculate the left RGB sensor pose in each frame while building a representation of the environment. The library requires an RGB image, a depth image rectified to that RGB image, accelerometer and gyroscope data, and a timestamp for each frame. Given these visual observations and IMU+timestamps , SLAM estimates the camera pose and map .
As the camera observes more of the environment a larger map representation is formed, however, it is not human-readable so we use 3D reconstruction as well. Using the ZED SDK and OpenGL, we render each frame in a graphical 3D environment, and visualize the areas of the camera’s perspective that have been mapped. Once we’re done mapping, we extract the 3D mesh using the ZED SDK which also texturizes the data. The 3D mesh results from projecting each pixel based on its depth map and then filtering the pointcloud to create a compressed representation of the geometry of the scene. Pictured below is the real time visualization of the mesh being created and the final mesh output. Check out our demo video for footage of the mesh being created with ZED 2!
After developing our offline 3D reconstruction algorithm, converting it to real-time was a matter of reading in each additional frame obtained through the camera and using the projected depth points to continually update our TSDF volume. The mesh is extracted after every volume update so that users can visualize their mesh in real time as shown in the image below. Pictured is the real time visualization of the mesh being created and the final mesh obtained. Check out our demo video for footage of the mesh being created with OpenARK!
One challenge we faced was that we ran this as a single threaded process, so there was a delay in getting a streaming input while the mesh was updating. This is something our team would like to improve upon in the future.
Here is the final pipeline used with OpenARK’s 3D reconstruction:
In the future, we aim to take our mesh outputs and import them into Unity to create an actual VR drone-piloting application!