Implementing Big Data Lake for Heterogeneous Data Sources

Modern connected cities are more and more leveraging advances in ICT to improve their services and the quality of life of their inhabitants. The data generated from different sources, such as environmental sensors, social networking platforms, traffic counters, are harnessed to achieve these end goals. However, collecting, integrating, and analyzing all the heterogeneous data sources available from the cities is a challenge. This article suggests a data lake approach built on Big Data technologies, to gather all the data together for further analysis. The platform, described here, enables data collection, storage, integration, and further analysis and visualization of the results. This solution is the first attempt to integrate a diverse set of data sources from four pilot cities as part of the CUTLER project (Coastal urban development through the lenses of resiliency). The design and implementation details, as well as usage scenarios are presented in this paper.