XML Release Notes for r1.3 Changes have been made to frames.xml (top-half of the database), luxxx.xml (annotation data), and lexxx.xml (lexical entry report) files to provide, for each LU, the list of lexemes contained by its lemma. This information was already available in frames.xml in release 1.2, but is now also available in the annotation and lexical entry reports. The format has been changed slightly: all of the "lexeme" elements are grouped into a "lexemes" element, and the name of the lexeme has been changed so that it is represented as parsed content rather than as an attribute. A further, minor change to frames.xml is that capitalization of the grouping element has been changed to so be more consistent with its contained elements (). This change has also been made to all the luxxx.xml files. The format of the luxxx.xml files has been also been changed slightly, introducing a "rank" attribute to the "layer" elements. Because this change and the specification of component lexemes mentioned above has been applied systematically, neither of these changes is represented as a difference in the luXXXXDiff.xml files. Similarly, the formating changes to the "lexemes" and "lexeme" elements in frames.xml are not represented in framesDiff.xml. A new XML document, semtypes.xml, has been created for this release. This document contains information about semantic types and their relations. No formating changes have been made to any other xml documents. Some DTD files, however, have been repaired to correctly reflect the structure of the xml documents. The following are the list of DTDs for this release: annotationV1_3.dtd corpusV1_1.dtd framesDiffV1_1.dtd framesV1_3.dtd frRelationV1_1.dtd lexentryV1_1.dtd luDiffV1_1.dtd metaRelationV1_2.dtd semtypesV1_1.dtd With this release, the the xml files refer to their corresponding DTDs locally in "../docs" rather than via a URL to the FrameNet website. The DTD files are thus distributed along with the xml data. The luXXXXDiff.xml and framesDiff.xml catalogue changes that have been made to the data between release 1.2 and 1.3. Changes in format are not represented. A note about the structure of these files: if a new element (e.g. subcorpus, annotationSet, frame, etc.) has been created, the corresponding diff element will be marked with changeType "Created". The content (i.e. children) of this diff element is the same as the content of the created element. This structure contrasts then with cases where already existing data has been changed, where children of changed elements are not listed unless those children were themselves changed. Please refer to the release notes for r1.2 more more information on the structure of luXXXXDiff.xml files. ***************PREVIOUS RELEASE NOTES******************************** XML Release Notes for r1.2 Minor changes have been made to the frames.xml (top-half of the database) and luxxx.xml (annotation data) files. In the frames.xml the "cBy" field has been removed from the "frame" element, the "fe" element and the "lexUnit" element. In the luxxx.xml files, the "cBy" and "cDate" have both been removed from the label element. In the generation of the framesDiff.xml file, frame and frame element definitions are now compared to see if the content has changed since the previous version. This comparison was not done in the previous (r1.1) version of the framesDiff.xml No changes were made to frRelations.xml file Two new XML documents have been created for this release, the luXXXDiff.xml and the leXXX.xml. These are described in detail below. luXXXDiff.xml ---------------------------------------------------------------------- This is a new XML format for this release. The purpose of the luXXXDiff.xml files is to show specific changes between the current luXXX.xml files and the previous (r1.1 version) release of the luXXX.xml files. The general format of these luXXXDiff.xml files follows the luXXX.xml with similar child element names, in the same order. The luXXXDiff.xml files are only created for the data that has changed since the last release. This means that there IS NOT one of these for every luXXX.XML file that is created. As mentioned above, the xml elements in this file are nested in the same order as the luXXX.xml files. If the change has occurred at the top level (lexunit-annotation) the "changeType" will say "Edited" followed by the field that was changed with its new and old values. If the change has not occurred at the top level, but rather in one of the child elements the change type may say "None" but one of the children with say "Edited". The following is a general outline of the luXXXDiff.xml. The items in parenthesis indicate the attributes that were compared between the two versions of the luXXX.xml files (r1.1 vs r1.2) to create the luXXXDiff.xml I. lexunit-annotation (name, frame, pos, incorporatedFE) A. definition B. subcorpus (none*) 1. annotationSet (none**) a. layer (name) i. label (name, start, end, iType) * In the current software configuration it is not possible for a user to change the name of a subcorpus so this attribute is not compared. ** Attributes of Annotation set contain only the ID and status of the AnnotationSet. These values are not compared during the creation of the luXXXDiff.xml leXXX.xml ___________________________________________________________________ This is a new XML format that mirrors the Lexical Entry Reports (LE Reports) on the public and internal web pages. This xml format is for anyone who wishes to have the information contained in the LE Reports without having to compile all of the information from the luXXX.xml. There is one report for each lexical unit. The format of the leXXX.xml can be summarized as follows (items in parenthesis are the attributes of each of the XML Elements): I. lexical-entry (luName, frameName, luPOS, incorporatedFE) A. definition B. semtypes 1. semtype C. governors 1. governor(lemma, type) a. annotationSet-ids i. id D. FERealizations 1. FERealization (total) a. valence-unit (fe, pt, gf) b. annotationSet-ids i. id E. FEGroupRealizations 1. FEGroupRealization (total) a. fes i. fe b. pattern (total) i. valence-unit(fe, pt, gf) ii. annotationSet-ids a. id The XML Element FERealizations represent the first table in the report titled "Frame Elements and Their Syntactic Realizations". Each FERealization Element represents one line in the table. The XML Element FEGroupRealizations represents the second table titled "Valence Patterns". Each of the FEGroupRealization Elements represent the fes in a pattern and the child element "pattern" displays the fe gf and pt involved in that pattern. ************************************************************************ XML Release Notes for r1.1 The xml format for both frames.xml (top-half of the data) and the luxxx.xml files (annotation data) has been changed with this release. The changes reflect the changes made to the database when the new version of the Framenet II software was implemented in January, 2003. Two new XML file types have been created for this release. The file frRelation.xml shows the frame-to-frame relations and the file framesDiff.xml defines the major changes that have been made to the frames since the previous release in October 2002. The rest of this document describes the current format of the XML files, the differences in the format from the previous release, and the Document Type Definitions (DTDs) for the XML files. FRAMES.XML ------------------------------------------------ Current Format for r1.1 The frames.xml file contains information about the basic frame-semantic objects: frames, frame elements, lexunits, lexemes, etc. The data is divided into frames and presented in order by the ID number of the frame. Each frame has the same elements and their format can be summarized as: I. Frame A A. Frame A Definition B. FrameElements 1. FrameElement A a. FrameElement A Definition b. Semantic Types (FrameElement) i. Semantic Type 2. FrameElement B etc. C. Lexical Units 1. LexUnit A a. LexUnit A Definition b. Annotation i. Number Annotated ii. Total Sentences c. Lexeme d. Semantic Types (LexUnit) i. Semantic Type 2. LexUnit B etc. D. Semantic Types (Frame) 1. Semantic Type II. Frame B A. Frame B Definition etc. Format of The Previous (r1.0) Release of frames.xml In the previous version of the frames.xml, frames were listed alphabetically and the following basic format was used for each Frame: I. Frame A A. Frame A Definition B. FrameElements 1. FrameElement A a. FrameElement Definition 2. FrameElement B etc. C. Lemmas 1. Lemma A a. notes (Lemma) 1. note b. Annotation i. Number Annotated ii. Total Sentences c. Lexeme 2. Lemma B etc. II. Frame B A. Frame B Definition etc. Major differences between frames.xml R1.1 and R1.0: Root Element has XMLCreated element showing when XML was created. The lemmas element has been renamed to lexunits and the lemma element renamed to lexunit. Notes element has been eliminated. Semantic Types provided as elements (rather than attributes) for Frames, FrameElements and LexUnits (where they occur in the database). Definitions have been stripped of formatting code to make them more human readable. LexUnit Definitions (or Sense Descriptions) are now included as Definition elements under each LexUnit. The Core attribute of FrameElement has been changed to coreType and its values from 'true' or 'false' to 'Core', 'Peripheral', 'Extra-Thematic' or 'Core-Unexpressed'. LUXXX.XML FILES ------------------------------------------------ Current Format for R1.1 The luxxx.xml files contain information about the annotation objects: lexunit-annotation, subcorpora, annotationSets, layers, etc. There are now two versions of XML for each LexUnit. The xmlPOS version contains the BNC/PENN TreeBank parts of speech tags. Both sets of luxxx.xmls will have the same name, but have been zipped into separate directories for the release. The general format of the XML has also undergone a major change for this release. As before, each lexunit is contained in its own file and now (r1.1) has the following general format: I. lexunit-annotation A. definition B. subcorpus 1. annotationSet a. layers i. layer a. labels 1. label b. sentence i. text ii. parts-of-speech* a. pos* * Only present in the xmlPOS version of the luxxx.xml files Format of the previous (r1.0) release I. lexunit A. definition B. subcorpus 1. sentence a. text b. layers i. layer a. labels 1. label b. parents 1. parent Major Differences between luxxx.xml r1.1 and r1.0 Name of the Root Element changed to lexunit-annotation from lexunit New attribute incorporatedFE in lexunit-annotation element Child of subcorpus is now annotationSet (previously sentence) Sentence Element now a child of annotationSet Layers are now children of annotationSet rather than sentence Parent Elements have been eliminated Layer Elements have 'ID' and 'name' attributes only. The 'cBy', 'lexUnitRef', 'frame', and 'lemma' attributes have been eliminated Color information (bgColorS, fgColorS, bgColorP and fgColorP attributes) about each label no longer appears 'cBy' and 'cDate' attributes were eliminated from Sentence Element FRRELATION.XML Current Format for r1.1 The frRelation.xml contains information about the frame-to-frame relations. Its format can be summarized as: I. fr-relations A. frame relation type 1 1. frame-relations a. frame relation 1 1. fe-relation A 2. fe-relation B etc. b. frame relation 2 1. fe-relation A 2. fe-relation B etc. B. frame relation type 2 etc. Format of the previous (r1.0) release None. This is the first release version FRAMESDIFF.XML ------------------------------------------------ Current Format for r1.1 A new XML type was created for this release: framesDiff.xml. The purpose of this xml is to present changes in the data that have occurred since the last release. The general format of the framesDiff.xml follows the framesXML with similar child element names, in the same order, presented by frame, and ordered by frame ID. Each frame will appear whether it has been changed or not. If a change in the data has occurred at the frame level the 'changeType' attribute will say 'edited' rather than 'none'. The FrameElements and/or LexUnits of that frame that have been changed will appear as children to that frame. The following is a general outline of the framesDiff.xml The items in parenthesis indicate the attributes that were compared between the two versions of frames.xml (r1.0 vs r1.1) to create the framesDiff.xml I. Frame A (Name) A. definition (not compared see * below) B. fe (FrameElement) (name, abbrev, coreType, colors) 1. definition (not compared see * below) C. lexunit (name**, pos, status) 1. definition 2. annotated 3. total 4. oldannotated 5. oldtotal II. Frame B etc. * The definitions for Frame and FrameElement will only appear when it is a newly created Frame or FrameElement respectively. Future versions of the framesDiff.xml will also show if a definition itself has been changed. ** Please note that the names of all lexunits changed for this release. The parts of speech have been appended to each LexUnit name. If the lexunit name is otherwise changed it will appear with a 'oldname' following the changeType = "Edited" All of the child elements are optional and will only appear if some portion of the data has changed followed by the attribute or child element that has been changed. As noted above the 'frame' element will appear even if its name has not changed. For example: If the frame name had changed the xml would look like this: Notice the changeType = "Edited" and that this is followed by the old value. Another example shows that the coreType has changed for an fe: The following xml excerpt shows that the lexunit has a changeType = "Edited" but that is not followed by the old value. This is because it is the children of the lexunit that have changed not the attributes of the lexunit itself. 1 592 0 592 New frames, fes and lexunits also appear in the framesDiff.xml and they have a changeType = "Created", have an attribute cDate to indicated when they were created and are followed by their definition (where available). Deleted Frames, fes and lexunits have a changeType = "Deleted" and are followed by their definition. Format of the previous (r1.0) release None. This is the first release version DOCUMENT TYPE DEFINITION (DTD) GENERAL INFORMATION ------------------------------------------------ DTDs are written in a formal syntax that explains precisely which elements and entities may appear in an XML document and what the elements' contents and attributes are. The DTDs each start with a general outline of the elements that are/may be included in the XML. This general information is followed by the dtd syntax that is read by the xml parser when the document is validated. All xml files of a specific type (luxxx.xml files for example) must follow the rules listed in the DTD. A person using the data can therefore determine which elements will always be present or are optional what the children of that element will be (if any) and the possible attribute of each element. The DTDs are essentially in outline form from the top level down. Each element is defined, its possible children are listed followed by the possible attributes of that element. A portion of the Framesv1_1.xml file is shown below as an example: explanation: The above defines an element whose name is 'frames' and has zero or more children named 'frame'. The 'zero or more' is indicated by the asterisk following the word frame. A '?' following the name would indicate "zero or one' of the element and '+' would indicate 'one or more' of the element. defines the attribute list for the 'frames' element. In this case the 'frames' element has one attribute 'xmlcreated' whose data type is 'CDATA' (any string of text valid in xml). #REQUIRED indicates that this attribute is required to appear every time. defines the element 'frame' and shows that there are four possible children: 'definition', 'fes', 'lexunits' and 'semtypes' and since there is no modifier (?, +, or *), all of these children must always be present. defines the attribute list for the 'frame' element. There are four listed attributes: ID, name, cBy and cDate. The #IMPLIED tag indicates that these two attributes (cBy and cDate) are optional and each instance of the element may or may not provide a value for the attribute. defines the 'definition' element and indicates that the only child of this element is parsed character data (no child elements). The dtd then goes on to define each of the other 'frame' element children (fes, lexunits, semtypes) and their children and attributes in a similar manner.