Much of the information that businesses, public agencies or consumers need to access is currently in the hands of multiple players spread across the Internet. For various reasons, it is not always possible to combine different data sources to form a single database. One alternative solution is to integrate the information virtually using new intelligent methods.
Olaf Hartig
Senior Associate Professor of Computer Science
Wallenberg Academy Fellow 2023
Institution:
Linköping University
Research field:
Management of data and databases, focusing on data on the internet and graph data, as well as problems associated with distribution of data over multiple autonomous and/or heterogenous sources
Olaf Hartig is a researcher at Linköping University and a Wallenberg Academy Fellow. Hartig and his research team have a vision that we should be able to bring together information that is currently scattered across the Web, or even within the Intranet of an organization. This will not take place physically, however, but virtually, so that each player retains control over their own information.
At present, the information is often found in “knowledge graphs.” This technology for representing and analyzing information is fairly new, but has already become established.
Knowledge graphs are like large digital maps of the way different things in our world fit together. They show how people, places and concepts are linked to each other in different ways, like branches in a network. By linking together data from separate sources, knowledge graphs can reveal new links and provide better understanding of complex information.
Knowledge graphs are widely used: from streaming services and e-commerce to large industrial concerns and biomedical research.
“When a streaming service suggests a new movie or series based on what the user usually likes, this is often done with the help of a knowledge graph. So this is a technology that many people come into contact with,” Hartig explains.
Interweaving scattered data
Increasingly, the information on which analyses are based is not gathered in a single place; it is managed by different actors. It may also be distributed among various units in a company or public agency. Hartig is developing methods for weaving this information together.
“In this context it is important to emphasize that it is not always possible to create a gigantic composite knowledge graph. In many use cases that may be impractical for technical reasons, or due to organizational obstacles, and it may also entail legal problems.”
Hartig’s research team is therefore concentrating on integrating knowledge graphs virtually, so that each player retains control over their own data.
Intelligent methods for “federating” knowledge graphs enable users to ask questions spanning numerous data sources while still receiving an integrated answer as though all the information were gathered in a single graph.
“For the user, it appears as a single integrated knowledge source, but below the surface the data continue to be distributed and controlled by their original owners.”
Early fascination for the Web
Hartig’s interest in this research began already when he was a master’s student in Berlin. He was fascinated by the way the Web was evolving into a new platform with myriad dimensions.
“It was exciting to see not only how people came into contact with each other via the Web, but also how computers can use the Web to understand and link together data from disparate sources.”
Since then, a consistent theme of his research has been how to integrate different data sources. Virtual integration of data poses numerous challenges. Different data sources use different words and terms for the same thing, and the technical interfaces often differ.
Knowledge representation and the management of knowledge bases are key elements of AI, and Sweden is lagging behind a little in this respect. Hopefully, my project will help to strengthen the field, both in the academic world and in industry.
“The goal is to develop algorithms and solutions that take these various forms of heterogeneity into account, and are nonetheless efficient and scalable.”
Related to this goal, the research team is also developing benchmark tests to evaluate how well new methods perform in different scenarios.
Improving AI
The research is closely related to the current debate on artificial intelligence. AI is often associated with machine learning and neural networks, but knowledge representation and knowledge graphs are also part of the efforts to improve AI. Coupling together advanced language models such as ChatGPT with knowledge graphs could be a way of reducing the risk of the system making up its own facts. A knowledge graph can serve as a knowledge anchor and make AI models more reliable.
“Future knowledge graphs and large language models will be able to interact so their answers are more accurate and traceable. The user won’t only receive an answer, they will also receive an explanation of why that answer was given,” says Hartig.
The funding under the Wallenberg Academy Fellow scheme has given this research a real boost, both in financial terms and by raising its visibility.
“This is a research field that has not received as much attention in Sweden as it has elsewhere,” says Hartig.
The research may ultimately provide a key for how businesses and public agencies should manage their data. Virtual integration of information will make it possible to improve collaboration between organizations, make better decisions, and create new services – without data providers having to give up control of their own data sources. This is a future vision that will allow knowledge to flow more freely across borders in the cyber world.
“It may eventually help us to share and connect more reliable information and lead to new breakthroughs in various areas of science.”
Text Nils Johan Tjärnlund
Translation Maxwell Arding
Photo Magnus Bergström