High-quality, meaningful data are crucial for successfully implementing analytics solutions that apply artificial intelligence (AI) and perform simulations using physics-based models. In such context, this paper proposes a semi-automated approach for the semantic enrichment of the building energy consumption data of Sofia, delivering a more meaningful dataset for further analytics and simulations. The aim is to enrich the building energy consumption dataset of the City of Sofia, Bulgaria, from the Sustainable Energy Development Agency with cadastral and spatial data, including а cadastral identifier, geometry, coordinates, built-up area, floors, etc. The data enrichment process is rather time-consuming since it requires substantial manual work. For this reason, a semi-automated data enrichment pipeline has been developed, including various processing activities such as data classification, cleaning, filtering, validation, aggregation, augmentation, and formatting. A dedicated crawler is developed to collect additional data needed for the enrichment. As a result, 1991 of a total of 2586 building data points have been successfully enriched. The enriched dataset is used for statistical and clustering analyses and applied to elaborate the energy atlas of Sofia.

20th International Conference on Artificial Intelligence Applications and Innovations