Title : Big Trajectory Data Warehouse: application to autonomous robots planning in the agricultural context.

Nowadays, more and more trajectory data is collected from new acquisition systems (smartphones, vehicles, etc.). A trajectory is described by temporal and spatial data, and it is accompanied by contextual data (such as field, markets, meteo, etc.). Then, we can consider trajectory data as Big Data presenting 3Vs features: Velocity, Variety and Volume.  In particular in the context of the I-Site CAP2025 SupeRob project that aims to provide an information system for the planning and monitoring of autonomous robots planning in the agricultural context a big data set of trajectory data is generated.

Recent approaches adopt multimodel databases (MMDBs) to natively handle the variety and volume issues arising from the increasing amounts of heterogeneous data (structured, semi-structured, graph based, etc.) made available. However, when it comes to analyzing these data, traditional data Warehouses (DWs) and OLAP systems fall short because they rely on relational DBMSs for storage and querying, thus constraining data variety into the rigidity of a structured schema. DW and OLAP systems allow the online analysis of huge datasets with simple and userfriendly user interfaces.

This project will provide a preliminary investigation of the performance of MMDBs when used to store multidimensional trajectory Big Data for OLAP analysis. The proposals will be applied to data generated in the context of the SupeRob project to handle robots experts to visually analyze their datasets.

Supervisor : Sandro Bimonte

Email  : sandro.bimonte@inrae.fr

Laboratory : TSCF

Equipe : COPAIN

Institute INRAE

Co-supervisor: Roland Lenain

Laboratory: TSCF

Institute : Inrae


Work plan

Analysis of spatio-temporal functionalities of multimodel databases for the identification of a reference platform.
Proposal of a logical model for trajectory DW
Integration of Variety in the logical model for trajectory DW
Design of a multimodel trajectory DW using data issued from the SupeRob project

Location: INRAE, Campus Cezeaux, Clermont Ferrand, France

Date : 6 months starting March-April 2021



Context. With the advanced techniques of acquisition of geographical positions (sensors, objects connected, etc.) huge trajectory data has been generated. These trajectory data are one of the most important sources of information for many applications in different areas such as for example, mobility (travel behavior, mobility, etc.), the environment, marketing, agriculture (fleet management, tractors, precision farming via sensors, etc.), etc, which characterize data and applications of the I-Site challenges 1 and 2.

In particular, in the agricultural context, today more and more work is being done to set up autonomous robots at service to reduce the costs of manual labor for farmers, to improve their quality of life, as well as to promote agro-ecology with agricultural practices having less significant impacts on the agricultural ecosystem. Autonomous robots move on plots to perform technical tasks such as plowing or weeding or mechanical weeding. They are programmed to perform these tasks by minimizing movement on plots, via trajectories planned, while avoiding potential fixed obstacles (such as a rut or a pole) or mobile (human, animals, or vehicle) requiring a deviation to the trajectory predefined. To analyze the experiments of algorithms for calculating trajectories, a possible solution is to set up a Data Warehouse. This data set represents Trajectory Big Data. This is the context of the in the context of the I-Site CAP2025 SupeRob project that aims to provide an information system for the planning and monitoring of autonomous robots planning in the agricultural context a big data set of trajectory data is generated.

Big Data are notoriously characterized by (at least) the 3 Vís: volume, velocity, and variety. To handle velocity and volume, some distributed file system-based storage (such as Hadoop) and new Database Management Systems (DBMSs) have been proposed. In particular, four main categories of NoSQL databases have been proposed [1]: key-value, extensible record, graph-based, and document-based. Although NoSQL DBMSs have successfully proved to support the volume and velocity features, variety is still a challenge [19]. Indeed, several practical applications ask for collecting and analyzing data of different types: structured (e.g., relational tables), semi-structured (e.g., XML and Json), and unstructured (such as text, images, etc.). Using the right DBMS for the right data type is essential to grant good storage and analysis performance. Traditionally, each DBMS has been conceived for handling a specific data type, for example, relational DBMSs for structured data, document-based DBMSs for semi structured data, etc. Therefore, when an application requires different data types, two solutions are actually possible: (i) integrating all data into a single DBMS, or (ii) using two or more DBMSs together. The former solution presents serious drawbacks: first of all, some types of data cannot be stored and analyzed (e.g., the pure relational model does not support the storage of images and XML arrays [27]); besides, even when data can be converted and stored in the target DBMS, querying performances could be unsatisfactory. The latter approach (known as polyglot persistence [15]) presents important challenges as well, namely, technically managing more DBMSs, complex query languages, inadequate performance optimization, etc. Multimodel databases (MMDBs) have recently been proposed to overcome these issues. A MMDB is a DBMS that natively supports different data types under a single query language to grant performance, scalability, and fault tolerance [19]. Remarkably, using a single platform for multimodel data promises to deliver several benefits to users besides that of providing a unified query interface; namely, it will simplify query operations, reduce development and maintenance issues, speed up development, and eliminate migration problems [19]. Examples of MMDBs are Postgres and ArangoDB.

Related work. Handling variety while granting at the same time volume and velocity is even more complex in Data Warehouses (DWs) and OLAP systems. Indeed, warehoused data result from the integration of huge volumes of heterogeneous data, and OLAP requires very good performances for data-intensive analytical queries [18]. Traditional DW architectures rely on a single, relational DBMS for storage and querying. To offer better support to volume while maintaining velocity, some recent works propose the usage of NoSQL DBMSs; for example, [7] relies on a document-based DBMS, and [4] on a column-based DBMS. NoSQL proposals for DWs are based on a single data model, and all data are transformed to fit with that model (document, graph, etc.). Overall, although these approaches offer interesting results in terms of volume and velocity, they have been mainly conceived and tested for structured data, without taking into account variety. Furthermore, to facilitate OLAP querying, DWs are normally based on the multidimensional model, which introduces the concepts of facts, dimensions, and measures to analyze data, so source data must be forcibly transformed to fit a multidimensional logical schema. Since this is not always painless because of the schemaless nature of some source data, some recent work (such as [11]) propose to directly rewrite OLAP queries over document stores that are not organized according to the multidimensional model, following a schema-on-read approach.

Contribution. However, even this approach relies on a single DBMS An interesting direction towards a solution for effectively handling the 3 Vís in DW and OLAP systems is represented by MMDBs. A multimodel data warehouse (MMDW) can store data according to the multidimensional model and, at the same time, let each of its elements be natively represented through the most appropriate model. It will support OLAP querying over large volumes of multimodel and multidimensional data, thus ensuring support to both volume, velocity, and variety.*

We will apply our proposals to the dataset used in the context of the SupeRob project in order to provide robots experts with an easy visual interface to explore their data by means of simple and graphical reporting tools.



Sandro Bimonte, Yassine Hifdi, Mohammed Maliari, Patrick Marcel, Stefano Rizzi: To Each His Own: Accommodating Data Variety by a Multimodel Star Schema. DOLAP 2020: 66-73