An important technological breakthrough in Dharma Academy! Interpretation of the aerospace database engine Ganos

Introduction: Ganos aerospace database engine is a new generation of location intelligence engine developed by the DAMO Academy’s database and storage laboratory led by Li Feifei. It adopts a platform-as-a-service, multi-mode integration, computing pushdown, and a new cloud-native processing architecture for the government, Enterprises, institutions, and pan-Internet customers provide mixed storage, query and analysis services for mobile objects, space/space-time, remote sensing multi-modal data, and solve the problems of complex aerospace big data use process, high use threshold, low application efficiency, etc., mainly used in Urban management, transportation and logistics, natural resources, aerospace, and Internet of Things information.
image.png

Author | Xie Jiong
Source | Alibaba Technical Official Account

Ganos aerospace database engine is a new generation of location intelligence engine developed by the DAMO Academy’s database and storage laboratory led by Li Feifei. It adopts a platform as a service, multi-mode integration, computing pushdown, and a new cloud-native processing architecture for governments, enterprises and businesses. Units and pan-Internet customers provide mixed storage, query and analysis services for mobile object, space/space-time, remote sensing multi-modal data, and solve the problems of complex aerospace big data use process, high use threshold, low application efficiency, etc. It is mainly used in urban management , Transportation and logistics, natural resources, aerospace, information and other fields.

1. R & D background

1 What is aerospace big data

With the rapid development of mobile Internet, location-aware technology, and earth observation technology, aerospace sensing data represented by moving objects, space/time and space, and remote sensing have exploded and become an important foundation for new infrastructure and digital framework.

In a narrow sense, aerospace data mainly comes from space-based and space-based, such as GNSS (Global Navigation Satellite System) data based on space-based platforms, and aerial images and video data based on space-based platforms. In a broad sense, we define aerospace data as all kinds of location-related data covering Spatial (space, that is, geographic space) and Space (space, that is, cosmic space), that is, the aerospace big data described in this article. The landing of Tianwen 1 and Zhurong on Mars will send us a large amount of remote sensing images and space information of Mars, so that everyone can most intuitively feel the aerospace big data from outside the earth.

Taking aerospace big data to help epidemic prevention and control as an example, we can use the trajectory data of people, vehicles and other moving objects for transmission source tracking and suspected population investigation; use the AIS ship dynamic data provided by maritime communication satellites to analyze the impact of the epidemic on port trade and many more. In such complex analysis scenarios, the rapid acquisition, storage, and efficient query processing of new aerospace sensing data such as remote sensing images, moving objects, and Internet of Things communications play a key role in intelligent decision-making assistance.

2 Challenges faced by aerospace big data

The data structure is complex and diverse and difficult to manage

Compared with unstructured data such as text and pictures, aerospace data has the characteristics of diverse types, highly unstructured, large monomers, and multiple dimensions, which poses great challenges to integrated data management and efficient query and retrieval. E.g:

  • Large and complex physical objects composed of millions of points, such as the Yangtze River/Yellow River, complex buildings, irrigation areas, etc.;
  • Space-time trajectory of moving objects composed of tens of millions of points, such as ultra-long travel data such as cars, boats, and aircraft;
  • The large-scale high-resolution remote sensing image composed of trillion pixels continuously covers...

Dynamic changes in data require higher-dimensional calculations

Traditional spatial data expresses more static features, such as rivers, railways, and buildings. With the popularization and application of technologies such as mobile APP and IoT, there are more and more dynamic data represented by time and space moving objects (people, cars, ships, etc.). Recording the dynamic changes of location requires the system to provide space-time modeling, space-time indexing, and space-time analysis and computing capabilities.

Poor performance in big data and big computing scenarios

Unstructured, large objects, and dynamics determine the potential large volume of aerospace data. A single table can be as small as tens of millions and large as tens of billions. Scenarios will no longer be individual phenomena. Therefore, the storage cost and elasticity of the system , Reading and writing efficiency will inevitably put forward higher requirements. When large-scale data requires online analysis and computing services, traditional production and application processes based on offline preprocessing (such as offline slicing) will face great challenges.

Intelligentization requires multi-modal data fusion management

Multi-modal data fusion management and cross-modal query analysis such as text, time series, time and space, graph (Graph) are important foundations for intelligence. Single-modal data intelligence cannot effectively support the discovery of complex business knowledge and truly explore the development rules and trends of things. Therefore, there is still a big gap from local model specialization to global multi-mode generalization, and a new architecture needs to be developed from the basic database form level. .

3 Dharma Academy's first aerospace database

In response to this, Dharma Academy has developed a new generation of aerospace database engine Ganos, which solves the challenges of integrated management of aerospace data, fast cross-integration query, and efficient analysis and processing from the lowest level of database and storage, and achieves a "100-million-scale" full polygon polygon. Advanced technologies such as quick access to graphs and second-level efficiency of "tens of millions of square kilometers" remote sensing image spatio-temporal dynamic puzzles have the advantages of "integrated integrated management, large-scale flexible services, and independent and controllable core technologies." , Earth, and sea space applications have become a new type of database infrastructure supporting the development of the Sky Network and Nebulas industry.

The evolution of aerospace data processing architecture

In 1995, in order to meet the needs of the 2B market, ESRI of the United States revolutionized the introduction of the spatial data engine SDE-modeling our world based on a commercial relational database + middleware architecture, which affected a generation. More than 20 years have passed. With the evolution of Hadoop, Spark, and distributed database technologies, distributed spatial data engines have developed rapidly in recent years, and have played unique advantages in some large-scale spatial data analysis and processing scenarios. So, where will the next evolution of spatial data processing go?

We believe that integrating aerospace information processing into PaaS services (Platform as Services), using cloud databases and storage platforms as the core to solve aerospace data real-time access, efficient storage and elastic computing, is to support the in-depth development of the space-time information cloud architecture The inevitable trend. We decompose it into four directions of architecture evolution: platform as a service, multi-mode integration, computing pushdown, and cloud native.

1 Platform as a service

Different from the traditional spatio-temporal data engine solution based on general database as storage and external middleware, the new generation of aerospace database engine adopts a platform as a service architecture. This architecture builds aerospace engine into different systems such as OLTP databases, OLAP data warehouses, data lakes, and NoSQL multi-mode databases on the cloud. Compared with traditional solutions, it has inherent advantages in ease of use, computing efficiency, and transaction consistency. In the future, cross-platform capabilities can be quickly established based on SQL standardization. Through the product portfolio, we can provide massive aerospace big data solutions ranging from online processing to online analysis, to offline computing to offline storage.

image.png

2 Multi-mode fusion

Traditional spatio-temporal data processing takes geographic information system (GIS) or remote sensing image processing platform software as the core, emphasizing platform professionalism, but due to professional enhancement, a highly professional semi-closed system is formed, which will also be reversed and weakened and other multi-mode types Data fusion processing capability; from the perspective of IT, aerospace/space-time data will be decentralized and become a category of various multi-modal data, and universal associations will be established with the help of databases to lower professional thresholds. Through universal association, the integrated management and processing of aerospace/space-time data and multi-modal data such as general data, text, time series, graphs, etc., this kind of pan-space-time solution capability will provide greater flexibility for the development of complex big data services.

image.png

3 Calculate pushdown

Computing pushdown is an important trend in the evolution of IT technology architecture. Push the key calculations of spatial information system business down to the database and big data system to make the calculation closer to the data. You can directly use the storage computing pushdown, parallel processing, GPU/FPGA heterogeneous computing acceleration capabilities to achieve local data calculations, which can not only reduce The IO delay caused by the network transmission of a large amount of intermediate result data can also simplify the business logic and improve the overall business system performance.

image.png

4 Cloud native

The new generation of aerospace database engine was born out of public cloud, and from public cloud to hybrid cloud. We believe that the data should be flexible and the algorithm should be supplemented; the algorithm should be flexible and the computing power should be supplemented. For example, traditional aerospace data applications require a large amount of slice preprocessing, resulting in inflexible data applications. In order to make the data more flexible, the industry has introduced the algorithm of pre-static caching + dynamic slicing, but this algorithm is obviously very complicated; then the algorithm must be flexible and must be supplemented by computing power, that is, with sufficient flexible computing power to ensure the purity of a single algorithm And universality. This requires cloud native capabilities. The essence of cloud native is resource pooling, which means that elastic services and scale can be realized through resource pooling. The essence of cloud services is computing power economy.

image.png

Three accumulate sand into a tower, make the base

Following the platform as a service, multi-mode integration, computing pushdown and cloud native concepts, DAMO Academy designed and implemented a new generation of aerospace database engine Ganos. We have continuously explored and made technological breakthroughs in key technologies of aerospace data processing such as global aerospace grid coding, aerospace multi-mode parallel query processing, and large-scale vector graphics quick display acceleration, and established data storage, indexing, query, analysis and visualization. Supporting the technical system and forming differentiated competitiveness in the core areas of aerospace multi-modal data processing.

1 Overall framework

Ganos is named after the goddess of the earth Gaea (Gaea) and the god of time Chronos (Chronos), representing the deep combination of space + time. It is not an independent cloud product, but a set of space-time\time-space\multidimensional data storage and processing solutions. The bottom layer of the system provides support for large-scale data storage capabilities of land, sea, air and space, including fast batch writing, multi-dimensional expression of space and space, multi-dimensional spatio-temporal index and multi-level storage of cold and hot, etc. The upper layer provides data management, batch query processing, analytical calculation and operation .

image.png


Ganos capability framework

From the product structure, Ganos integrates aerospace data processing capabilities into cloud relational database RDS PG, cloud native relational database PolarDB, cloud native data warehouse AnalyticDB PostgreSQL, multi-mode database Lindorm, data lake analysis DLA, and builds aerospace based on product portfolio Integrated base for database big data. Further unite AI Earth (the first pan-natural resource industry AI engine released by Dharma Academy), OSS object storage, and microservice frameworks and other technical ecosystems to build a new architecture for users without slicing storage, integration of time and space, dynamic computing, and intelligent analysis The cloud-native aerospace big data platform provides core capability support, which can be widely used in different industries such as urban management, natural resources, emergency management, transportation and logistics.

image.png


Ganos ecosystem

2 Aerospace multi-mode and global grid coding

A single model can no longer meet the current digital new scenario applications. Ganos has developed an aerospace multi-model engine from the bottom, which has natively supported the storage, query, analysis and calculation of more than 10 major categories of aerospace data. On this basis, based on the integration with the multi-mode database Lindorm, the integrated management and processing of multi-mode data such as key values, wide tables, time series, time and space, search, and files are realized.

image.png


Aerospace multi-model engine

On this basis, Ganos introduced a new grid data type geomgrid based on the GeoSOT global grid division theory and combined with PolarDB, which supports operations such as space-sky object coding and grid object calculation. Aerospace grid code is a discretized, multi-scale regional location identification and measurement system developed on the basis of GeoSOT geospatial division theory. The core of the system is to use a new method to divide the earth space from the center of the earth to 60,000 kilometers above the ground into trillions of grid groups of varying sizes, multi-scale, and high precision. Give the world's only integer identification code. The system can seamlessly connect with Peking University/Xuanji Fuxi to build an aerospace database-grid big data integrated solution based on GeoSOT's grid big data platform. The introduction of native grid data types enhances the aerospace database's unified spatio-temporal identification capabilities, aerospace computing acceleration capabilities, and geospatial grid-based data sharing capabilities.

image.png


Schematic diagram of aerospace grid division

3 Separation of storage and accounting and multi-level parallel computing acceleration

Based on PolarDB, Ganos adopts a separate storage and accounting and distributed shared storage architecture. Separation of computing and storage, which completely decouples the components (computing/memory/storage) of the original integrated design database to form an independently scalable resource pool. At the same time, in order to reduce the write and query latency caused by the separation of storage and counting, the shared storage system adopts an end-to-end full user mode, which combines high-speed data transmission and access software and hardware technologies such as RDMA and SPDK, as well as close storage computing. The DB processing push-down technology combined with media hardware has effectively improved the storage scale and processing capacity of aerospace data.

Based on the storage and accounting separation and distributed shared storage architecture, Ganos further organically combines two-stage query enhancement and multi-node parallel query to achieve a cross-node aerospace parallel query processing framework. Among them, three-pronged approach to improve data parallel processing performance:

  • The distributed shared storage architecture effectively avoids the network IO overhead caused by data cross-node shuffle;
  • A two-stage query based on topological index, coarse filtering and fine filtering, greatly improves the performance of aerospace data query and filtering;
  • A multi-level parallel framework is formed by cross-node parallelism and parallel operation within nodes. The authoritative third-party evaluation results show that the 200 million-level map spots are superimposed and analyzed and the area is calculated. 80 processes are used for parallel calculation, and the result can be obtained in 10 minutes (including the cut out of the 78 million super large result set), which is at least compared with the traditional big data solution An order of magnitude faster.
image.png


Cross-node parallel query processing framework based on two-stage optimization

4 Intelligent online dynamic processing services

To build the aerospace "most powerful brain", data organization, processing, and application models based on dynamic calculations need to be established. Taking large-scale remote sensing image data processing as an example, Ganos integrates PolarDB's aerospace index, Lindorm's aerospace multi-mode storage and DLA Serverless Spark's aerospace computing capabilities to provide users with unitized storage, spatiotemporal organization, and pixel-level calling The new processing framework:

  • Unitized storage: storage is based on each remote sensing image as a unit, avoiding more preprocessing, so that the data remains flexible enough;
  • Spatiotemporal organization: based on the original image as a unit, the time dimension is embedded in the architecture, so that the entire amount of data can be structured in time and space;
  • Pixel-level calling: The original pixel matrix of the image is retained in the design concept to ensure the accuracy of every inch of pixel information, including time, space and spectrum information, and to provide the most fresh raw materials for intelligent services. Users delineate time and space boundaries and other conditions, Ganos uses elastic cloud computing power to achieve dynamic parallel computing.

Internal tests have shown that based on serverless flexible computing power, the efficiency of the spatio-temporal puzzle of Qianjing remote sensing images can reach the second level, and the traditional pre-processing/pre-slicing mode is innovated into an on-demand spatio-temporal dynamic parallel computing mode, saving at least 50% of storage and processing costs.

11.gif


Grid dynamic space-time puzzle

5 Visual computing co-ordination to break the service boundary

Aerospace data is a special type of graphic image data. The same data storage structure is difficult to meet the needs of fast calculation and fast display at the same time. In the past, users had to experience the complexity of large data structure reconstruction from query analysis and calculation to large data scene display. The "entropy increase" process.

Another design requirement of Ganos is to coordinate calculation and visualization, and the database side will integrate storage, calculation and visualization. After a large amount of vector data is stored in the database, real-time global browsing has always been a problem in the industry, and it is time-consuming and labor-consuming to find additional tools to cut and publish images. Ganos designs a sparse vector pyramid index. The client can interact with the database in real time to quickly and visually access the "100 million-scale" polygon features in seconds, while creating an index only takes minutes and consumes only 5% of additional storage space. This method of accelerating data visualization with a database index structure greatly reduces the complexity of user data processing. This technology can be easily integrated into data management tools such as PGAdmin, and the import of hundreds of millions of geometric data can be visualized in seconds, which solves the historical problem of "checking but not seeing" of traditional data management tools for large vector images.

12.gif


Based on 【Jietai Tianyu】provide real data of hundreds of millions of polygons to realize quick display access on the terminal

Four building ecological solutions

1 DB for AI——Integration with AI Earth

Alibaba Dharma Institute’s self-developed AI Earth products are used to integrate and analyze multi-source earth observation data such as satellite images, drone images, real-time video streams, meteorological data, IoT data, and intelligent interpretation and real-time sensing of buildings and land. , Vegetation, rivers and other target information changes, providing professional services for the global environment and ecology.

Ganos and AI Earth know the earth, and are using innovative ways to solve the problem of management and calculation of the earth's air, space, ground and sea data. This is a DB for AI product portfolio. Ganos provides intelligent storage and management of large-scale aerospace data capabilities, serves the AI ​​Earth to know the earth upwards, supports operations such as change detection, ground object classification and target extraction, and realizes intelligence. Computational analysis and in-depth information mining.

image.png


DB for AI: Ganos + AI Earth

2 DB for GIS——Co-built with GIS platform

GIS platform is a specialized system for spatial data processing. DB for GIS will change the traditional docking mode of GIS and general databases for more than two decades. The core spatial computing of GIS pushes down the cloud-based platform with cloud native database as the core to achieve computing acceleration will be a feasible development path for the next generation of GIS systems. Ganos has completed compatibility and adaptation with mainstream GIS basic platform software such as SuperMap (SuperMap), ArcGIS (ESRI), MapGIS (Zhongdi), and can support the seamless migration of existing GIS applications. The spatial data engine of the GIS platform can push down the spatial query and analysis and calculation of Ganos, and use the multi-mode processing, efficient indexing, multi-level parallel computing, and elastic resource scheduling of the aerospace database engine to achieve calculation acceleration. In turn, Ganos also uses GIS platform tools to achieve full-space modeling and data display on the ground and underground, indoor and outdoor, land and ocean.

image.png


DB for GIS architecture

The in-depth integration with the GIS platform implements Alibaba Cloud's integration strategy and conforms to the platform strategy of "one horizontal and one vertical". Vertical integration is accomplished through "one vertical", that is, the GIS platform uses technology to integrate Ganos to improve the overall performance of the system, and Ganos uses the GIS platform to broaden the breadth of spatial business capabilities. "Yiheng" is to build a platform ecology through brand superposition, and provide professional full-space digital solution capabilities for strong GIS digital applications. "One horizontal and one vertical" integration has expanded the "area" of spatial data services.

Five supports for air, space, earth and sea applications

Standing on the cloud, the boundaries of the traditional spatial information industry are gradually breaking, and the coverage of aerospace applications continues to extend. The aerospace database engine Ganos has been applied in the entire space of the sky, the air, the ground, and the sea, covering natural resources, disaster emergency, transportation and logistics, aerospace, travel, safety, agriculture, ocean, water conservancy, science and education, as well as social, fitness, games, O2O and other different industry directions.

13.gif


Cooperate with Feichangzhun and Supermap to realize the millisecond-level spatio-temporal playback and display of 2.5 billion global flight trajectory points

14.gif


Supporting Alibaba's digital planet engine, enabling the spatio-temporal dynamic organization of PB-level large-scale remote sensing data, on-demand logical splicing, and pixel-level quick access calls possible

15.gif


In the field of agricultural informatization, the agricultural geographic big data platform represented by Guoyuan Technology, through 2B business transformation, relies on Ganos to manage geographic information resources, integrates artificial intelligence and big data technology on the cloud, and provides agricultural big data for modern agricultural construction New products and services

16.gif


Integrate with DataV to provide aerospace data retrieval and multi-dimensional terrain analysis capabilities for the professional version of Alibaba Cloud 3D city rendering engine DataV.CityPro

17.gif


Established in-depth docking with the Peking University Cyclone Grid Big Data Platform, and established a grid database + big data integrated solution

18.gif


Jointly cooperate with Dharma Academy of AI Earth to form an integrated intelligent platform for remote sensing big data management and AI, which is applied to provincial and ministerial applications in the fields of natural resources, environmental protection, and water conservancy

19.gif


Empowering the global natural disaster risk big data service platform, and fully supporting the global earthquake, typhoon, landslide, forest and grass fires and other 12 disaster types of time and space process modeling and risk map release

Six Conclusion

In the era of cloud computing and big data, aerospace big data will become the basic core of location intelligence. To make the satellite "sky eyes" more magical, and to make IoT devices more intelligent, it is necessary to establish a new model of aerospace data organization, processing and application. In the future, we will further integrate management and processing of location information, temporal information and multi-modal information, expand computational intelligence, and extend scenarios to the deep earth, deep sea, and deep space. Ganos will always be based on cloud and space infrastructure capacity building, deconstruct aerospace multi-mode and coding, distributed parallel computing acceleration, online dynamic processing and other key technologies, provide basic cloud services for enterprises to build aerospace "most powerful brain", and promote time and space As the basic engine of digital transformation, cloud computing reaches more customers.


2021 Alibaba Cloud Summit and Developer Conference

image.png

The digital age, the age of innovation. Alibaba Cloud started with the ideals of developers, and insisted on using the power of the cloud to make developers' innovation easier, and jointly create a new chapter in digital. The 2021 Alibaba Cloud Developer Conference invited Li Feifei, vice president of Alibaba Group and senior researcher of Alibaba Cloud Intelligent Database Division, to chat with developers about the current status and future of databases in the cloud-native era. Click here to register now! On May 29th, we are waiting for you at the Beijing National Convention Center~

Original link: https://developer.aliyun.com/article/784264?

Copyright statement: The content of this article is voluntarily contributed by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.