FrameBase

FrameBase is a linked open knowledge base meant to uniformly represent a wide range of knowledge, tackling semantic heterogeneity among various sources of structured knowledge, such as the ones in the Linked Open Data cloud. It provides a flexible and uniform way of capturing n-ary relationships by adapting and combining repositories of frames from the fields of linguistics and cognitive science (FrameNet and WordNet) to establish a large and wide-coverage vocabulary that can be used to represent complex knowledge and extended with more specific elements. In other words: if you can express it with language, you can express it with FrameBase —barring some very specific concepts that need to be coined or imported from domain-specific KBs.

The basic motivation underlying FrameBase is: if you can express it

There are two interconnected representation levels in FrameBase:

A highly expressive layer where information is represented with explicit entities instantiating the frames, and representing specific situations, processes or events of any kind, organized into a rich hierarchy.
A less expressive but simpler layer based on direct binary predicates between the elements (participants, properties) of the frames. This level is more compact to store and query, and it is connected with the other layer by means Reification-Dereification (ReDer) rules. It can also be used to connect to similar predicates in other sources of structured knowledge or natural language.

FrameBase is distributed in RDF (Resource Description Format), though it can be translated to other formats. The FrameBase schema includes rich linguistic annotations using Lemon.

FrameBase connects to other knowledge bases by means of integration rules that can link data in ways that cannot be implemented with existing binary properties like owl:sameAs and rdfs:subClassOf .

Furthermore, because of FrameNet's ties to linguistic semantics, it offers additional possibilities for interfacing with natural language, both for querying and text mining.

Currently, FrameBase integrates knowledge from different large-scale LOD knowledge bases such as YAGO2s, Freebase, and events from DBpedia and Schema.org, so they can be queried under our single schema.

FrameBase thus represents a significantly novel way of connecting the Linked Data world to natural language that is highly expressive, easily expandable, and allows us to draw on natural language processing techniques.

Currently, FrameBase is being used by the PIKES project.

Select different English sentences to see how their information can be represented in FrameBase:

Albert Einstein won the Nobel Prize in 1921.

In 1921, Albert Einstein won the Nobel Prize for his work in the photoelectric effect, which was carried out in 1905.

Albert Einstein worked on the photoelectric effect in 1905.

Diagram showing example knowledge. — The bubbles are nodes of the knowledge graph. The arrows with white head are "instance of" relations, the black arrows are relations representing "frame elements", the green arrows are direct binary predicates obtained from ReDer rules

Comparison of existing representation models

The knowledge bases in the LOD cloud use different models to represent n-ary relations, which leads to inconsistency and impossiblity to link knowledge by binary predicates such as owl:sameAs . The FrameBase model subsumes them, providing less overall overhead and a flexible two-layer model that combines the benefits of each of them.

Model using direct binary predicates alone, which are useful to query, but it fails to connect the pieces of knowledge related to the same event or situation. For instance, if John married twice we would not know *when* he married Mary.

Model using RDF reification, as in YAGO. It includes the direct binary predicates, and generates entities for situations or events, but it does so pairwise, so they have to be further connected. Therefore, it generates a big overhead (it suffers combinatorial explosion) when situations have many participants or data associated (the example only has 3 elements). Furthermore, it mixes information about the event with metadata about the statements, like for instance provenance.

Model using subproperties. Like the model based on RDF reification, it includes direct binary predicates, but it also generates event/situation entities pair-wise, and even though it does so with lower overhead, it still requires the mutual connections.

Model using so-called *Neo-Davidsonian* representation. Several knowledge bases such as Freebase or DBpedia use it, but in an ad-hoc way, with different coverage and different vocabulary. The FRED system also uses this model based on FrameNet, but without binary predicates. Furthermore, it does not offer direct binary predicates, producing overhead in the knowledge base and the queries when only two elements of the situation are required.

Model in FrameBase. It has a lower overhead for big numbers than models using subproperties or RDF reification , and it creates a unique "event/frame". Frames are clustered by nearly equivalent meanings, such those for "marriage (noun)", "marry (verb)" and "wedding (noun)". The green triples use direct binary predicates that can be inferred with special "ReDer" rules, only when needed. FrameBase offers a wide vocabulary with tens of thousands of frames such as "Marriage" with accompanying ReDer rules, and means to connect to natural language.

This project has received funding from the European Union's Seventh Framework Programme for research, technological development and demonstration under grant agreement No. FP7-SEC-2012-312651 (ePOOLICE project).

Additional funding was provided by the Danish Council for Independent Research (DFF) under grant agreement No. DFF-4093-00301 (part of the QWeb project), as well as the National Basic Research Program of China Grants 2011CBA00300, 2011CBA00301, and NSFC Grants 61033001, 61361136003, 61450110088.

The FrameBase logo and icon are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License by the FrameBase team.