Introduction: FrameNet
The backbone of FrameBase's schema is FrameNet, a repository of linguistic frames. FrameNet uses frames to annotate the meaning of sentences in natural language. Certain words, called lexical units (LUs) in FrameNet, evoke certain frames, depending on the linguistic context in which they appear. The frame represents a sort of situation, event, process, or object, and it has several frame elements (FEs), which are properties that each frame can have and that specify its particular meaning. For example, consider the following sentence:
"On his marriage on 2 April 1622 to Elizabeth , daughter of Robert Coxe , grocer , of London , he had land in Hampshire , Surrey , and Wiltshire settled on him."
The noun "marriage" evokes the frame "Forming_relationships". Some of the frame elements for this frame are filled in this sentence:
- The pronoun "his" fills the frame element "Partner_1". To know which entity it refers to, anaphora resolution should be performed with previous sentences in the overall text.
- "Elizabeth , daughter of Robert Coxe , grocer , of London" fills the frame element "Partner_2".
- "2 April 1622" fills the frame element "Time".
A sentence like "He married on April the 2nd 1622" would have evoked the same frame "Forming_relationships", even though the lexical unit is different (the verb "to marry"). The first and the third frame elements would have been evoked with equivalent values, but the second would be left unspecified.
The FrameBase Schema
FrameBase's RDFS schema declares classes for each frame in FrameNet, and outgoing properties for each frame element. This allows representing in an RDF knowledge base the knowledge one can express in natural language, following the frame patterns that would be used to annotate natural language. It translates relations in FrameNet such as frame inheritance, perspectivication, and frame element inheritance into RDFS counterparts. The vocabulary is strictly RDFS, which allows more efficient inference in many current triplestores.
Specific frame classes are created for each lexical unit in a frame, and these are declared subclasses of the non-lexicalized frame. We refer to these very specific LU-based frames as LU-microframes.
FrameBase extends this FrameNet-based backbone with WordNet. It adds new microframes based on synsets (sets of synonymous word-POS-sense tuples), which are denoted as synset-microframes. These new frames are linked to exisiting LU-microframes and they serve to create clusters of (both lu- and synset-) microframes that have near-equivalent meaning. This includes nominalizations such as "marry.v" and "marriage.n" and other morpho-semantic equivalences. Each cluster is represented with an individual miniframe.
All the microframes under a given miniframe, together with the
miniframe itself, are connected by the property
fb-meta:isSimilarTo
, forming a clique.
FrameBase IRIs
FrameBase IRIs can be built using the Java library, but we also provide an EBNF specification here. (Note however that some examples and illustrations use a previous format for the IRIs)
RESOURCE_IRI = "http://framebase.org/" , MACROFRAME_PATH | LU_MINIFRAME_PATH | LU_MICROFRAME_PATH | SYNSET_MICROFRAME_PATH |
FRAME_ELEMENT_PATH | DIRECT_BINARY_PREDICATE_PATH | METASCHEMA_CLASS_PATH | METASCHEMA_PROPERTY_PATH | INSTANCE_PATH;
MACROFRAME_PATH = "frame/" , FRAMENET_FRAME_NAME ;
(*Example: http://framebase.org/frame/Destroying *)
LU_MINIFRAME_PATH = "frame/" , S , FRAMENET_FRAME_NAME , S , "m" , S , FRAMENET_LU_LEXEME, S , POS ;
(*Example: http://framebase.org/frame/Destroying.m.blow+up.verb *)
LU_MICROFRAME_PATH = "frame/" , S , FRAMENET_FRAME_NAME , S , FRAMENET_LU_LEXEME, S, POS ;
(*Example: http://framebase.org/frame/Destroying.demolition.noun *)
SYNSET_MICROFRAME_PATH = "frame/" , S , SYNSET_OFFSET , S , SYNSET_REPRESENTANT_WORD, S , POS ;
(*Example: http://framebase.org/frame/Synset00217014.destruction.noun *)
FRAME_ELEMENT_PATH = "fe/" , FRAMENET_FRAME_NAME , S , FE_NAME ;
(*Example: http://framebase.org/fe/Destroying.Cause *)
DIRECT_BINARY_PREDICATE_PATH = "dbp/" , FRAMENET_FRAME_NAME , S , DBP_NAME ;
(*Example: http://framebase.org/dbp/Destroying.isDestroyedBy *)
S = "." ;
POS = "verb" | "noun" | "adjective" | "adverb" | "conjunction" |
"determiner" | "interjection" | "numeral" | "preposition" | "pronoun" | "subordinate_conjunction" | "other" ;
METASCHEMA_CLASS_PATH = "meta/" , ( "MetaClassClass" | "MetaPropertyClass" |
"Frame" | "Macroframe" | "Miniframe" | "Microframe" | "LuMicroframe" | "SynsetMicroframe" |
"FrameElementPropertyClass" | "DirectBinaryPredicateClass" ) ;
(*Example: http://framebase.org/meta/Frame*)
METASCHEMA_PROPERTY_PATH = "meta/" , ( "inheritsFrom" | "isPerspectiveOf" | "isSimilarTo" | "hasLexicalForm" |
"hasDefinition" | "hasSyntacticallyAnnotatedLexicalLabel" |
"hasFramenetFrame" | "hasFramenetLU" | "hasFramenetFE" | "hasSynsetNumber" |
"isCreatedFromNumberOfFramenetAnnotatedSentences" | "isExtendedRule" | "isOriginalRule" ) ;
INSTANCE_PATH = ( "http://framebase.org/fi/" , HEX_HASH ) | IRI_FROM_DIFFERENT_DOMAIN ;
Note that the namespaces are the substrings until the last '/'. Therefore, there are the following namespaces:
http://framebase.org/frame/
for frame classes. The standard prefix isfbframe
http://framebase.org/fe/
for FE properties. The standard prefix isfbfe
http://framebase.org/dbp/
for DBP properties. The standard prefix isfbdbp
http://framebase.org/meta/
for FrameBase meta-classes and meta-properties. The standard prefix isfbmeta
The old namespace http://framebase.org/ns
is part of FrameBase 1.x and is now deprecated.
Each block (the content of every non-terminal inside every
non-terminal ending in
_PATH
) is encoded using the following procedure:
- The following mapping is applied:
{('+'<->' '),('.'->':'),(':'->U+EAAA)}
- The result is encoded using RFC 3987.
The decoding uses the inverse procedure.
This method was chosen for the following reasons:
- It allows the IRI path to be structured, containing blocks separated with a dot character. The original blocks represent information from external dataset that was used to mint the IRI, and they can be easily obtained with a regular expression and a simple mapping. This is done by mapping any dot occurring inside the blocks to a colon (a character occurring less often in linguistic datasets), and any eventual colon to a private use area unicode character (U+EAAA) which cannot happen in an external dataset, but can be encoded by RFC 3987, which can accommodate any Unicode character using %-encoding. This makes the mapping reversible without ambiguity. While is is not necessary for IRIs to provide information in this way, and any data relevant to a FrameBase IRI is also accessible as RDF properties, we believe it is convenient.
- It is backward-compatible with bare RFC 3987, but it
improves its readability. Whitespaces are more common than + signs,
and the switching between both owes to that. We chose the + sign
because it does not need %-encoding and it is commonly associated to
whitespace due to the
application/x-www-form-urlencoded
encoding (different from path encoding), but this is not such encoding.