# What is Knowledge Graph

December 9, 2017

Knowledge graphs are large networks of entities and their semantic relationships. They are a powerful tool that changes the way we do data integration, search, analytics, and context-sensitive recommendations. Knowledge graphs have been successfully utilized by the large Internet tech companies, with prominent examples such as the Google Knowledge Graph. Open knowledge graphs such as Wikidata make community-created knowledge freely accessible.

### Overview of Knowledge graphs

The World Wide Web is a vast repository of knowledge, with data present in multiple modalities such as text, videos, images, structured tables, etc. However, most of the data is present in unstructured format and extracting information in structured machine-readable format is still a very difficult task. Knowledge graphs aim at constructing large repositories of structured knowledge which can be understood by machines. Such knowledge graphs are being used to improve the relevance and the quality of search in case of search engines like Google and Bing. Knowledge graphs are also being used by applications like Google now, Microsoft Cortana and Apple Siri which are capable of understanding natural language queries and answer questions, making recommendations, etc. to the user. The construction of knowledge graphs is thus a major step towards making intelligent personalized machines.

Web search engines have evolved beyond just presenting the classic ten blue links in the search engine result page as an answer to a keyword query. Modern Web search engines include in the search engine result page results from verticals such as news, images, videos, etc. More than that, Web search engines present rich search result pages when users query for information about specific entities like actors or movies. Depending on the type of entity users ask for, the structure of the information presented differs. Search engine result pages may include news articles, pictures, factual statements, and related entities. This is due to the fact that users of Web search engines look for specific entities on-line. Indeed, about 50 % of the query workload a commercial search engine receives is related to specific entities.

#### What is Knowledge graphs ?

Knowledge graphs provide an opportunity to expand our understanding of how knowledge can be managed on the Web and how that knowledge can be distinguished from more conventional Web-based data publication schemes such as Linked Data. Large-scale information processing systems are able to extract massive collections of interrelated facts, but unfortunately transforming these candidate facts into useful knowledge is a formidable challenge.

“A knowledge graph is a structured graphical representation of semantic knowledge and relations where nodes in the graph represent the entities and the edges represent the relation between them. Constructing a knowledge graph involve extracting relations from unstructured text followed by efficient storage in graphical databases.”

Knowledge Graphs are large repositories of structured information about entities like persons, locations, and organizations and their relations. A key challenge in producing the knowledge graph is incorporating noisy information from different sources in a consistent manner. Information extraction systems operate over many source documents, such as web pages, and use a collection of strategies to generate candidate facts from the documents, spanning syntactic, lexical and structural features of text. Ultimately, these extraction systems produce candidate facts that include a set of entities, attributes of these entities, and the relations between these entities which we refer to as the extraction graph.

### Illustration

Never-Ending Language Learning (NELL): Research project from Carnegie Mellon University attempting to create a computer system that learns over time to read the web.

Freebase/Probase: Large collaborative knowledge base consisting of data composed mainly by its community members.

Figure 1: Knowledge graph of Knowledge graphs

Metaweb: Described as an “open, shared database of the world’s knowledge”, the company developed Freebase, was acquired by Google in 2010 and subsequently made most of the data available to Wikidata.

Cyc: Common sense knowledge base: vast quantities of fundamental human knowledge: facts, rules of thumb, and heuristics for reasoning about the objects and events of everyday life.

GDelt: Monitors the various news outlets from nearly every corner of every country and identifies the people, locations, organizations, events, etc, thus creating a free open platform for computing on the entire world.

DBpedia: Open, free and comprehensive knowledge base constantly improved through a crowd-sourced community effort to extract structured information from Wikipedia.

YAGO: Semantic knowledge base from the Max-Planck Institute, derived from Wikipedia, WordNet, and GeoNames.

Wikidata: Project of the Wikimedia Foundation: a free, collaborative, multilingual, secondary database, collecting structured data to provide support for all other Wikimedia projects, and beyond.

LinkedIn’s Knowledge Graph: Built upon “entities” on LinkedIn, such as members, jobs, titles, skills, companies, geographical locations, schools, etc. forming ontology of the professional world. Not available.

OpenIE: Quality information extraction at web scale; toolkit originating from the University of Washington.

PROSPERA: Hadoop-based scalable knowledge-harvesting engine which combines pattern-based gathering of relational fact candidates.

ConceptNet: Originated from the crowdsourcing project Open Mind Common Sense, launched in 1999 at the MIT Media Lab, it is a freely-available semantic network.

WordNet: Nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept.

### References

[1] Sebastien Dery, available online at: https://medium.com/@sderymail/challenges-of-knowledge-graph-part-1-d9ffe9e35214

[2] Kumar, Kundan, and Siddhant Manocha, “Constructing knowledge graph from unstructured text”, Self 3, no. 4 (2007).

[3] Pujara, Jay, Hui Miao, Lise Getoor, and William W. Cohen. “Knowledge graph identification.” (2013): 542.

## One Comment

• Santos Washko July 13, 2018 at 3:25 pm

good stuff. I will make sure to bookmark your blog.

Insert math as
$${}$$