XML Release Notes for r1.3

Changes have been made to frames.xml (top-half of the database),
luxxx.xml (annotation data), and lexxx.xml (lexical entry report)
files to provide, for each LU, the list of lexemes contained
by its lemma.  This information was already available in frames.xml
in release 1.2, but is now also available in the annotation and
lexical entry reports.  The format has been changed slightly: all
of the "lexeme" elements are grouped into a "lexemes" element, and
the name of the lexeme has been changed so that it is represented as
parsed content rather than as an attribute.

A further, minor change to frames.xml is that capitalization of the
<semtypes> grouping element has been changed to <semTypes> so be more
consistent with its contained elements (<semType>).  This change has
also been made to all the luxxx.xml files.

The format of the luxxx.xml files has been also been changed slightly,
introducing a "rank" attribute to the "layer" elements.  Because this
change and the specification of component lexemes mentioned above has
been applied systematically, neither of these changes is represented
as a difference in the luXXXXDiff.xml files.  Similarly, the formating
changes to the "lexemes" and "lexeme" elements in frames.xml are not
represented in framesDiff.xml.

A new XML document, semtypes.xml, has been created for this release.
This document contains information about semantic types and their
relations.

No formating changes have been made to any other xml documents.  Some
DTD files, however, have been repaired to correctly reflect the structure
of the xml documents.  The following are the list of DTDs for this release:

annotationV1_3.dtd
corpusV1_1.dtd
framesDiffV1_1.dtd
framesV1_3.dtd
frRelationV1_1.dtd
lexentryV1_1.dtd
luDiffV1_1.dtd
metaRelationV1_2.dtd
semtypesV1_1.dtd

With this release, the the xml files refer to their corresponding DTDs
locally in "../docs" rather than via a URL to the FrameNet website. The
DTD files are thus distributed along with the xml data.

The luXXXXDiff.xml and framesDiff.xml catalogue changes that have been
made to the data between release 1.2 and 1.3.  Changes in format are
not represented.  A note about the structure of these files: if a new
element (e.g. subcorpus, annotationSet, frame, etc.) has been created,
the corresponding diff element will be marked with changeType
"Created".  The content (i.e. children) of this diff element is the
same as the content of the created element.  This structure contrasts
then with cases where already existing data has been changed, where
children of changed elements are not listed unless those children were
themselves changed.  Please refer to the release notes for r1.2 more
more information on the structure of luXXXXDiff.xml files.


***************PREVIOUS RELEASE NOTES********************************

XML Release Notes for r1.2

Minor changes have been made to the frames.xml (top-half of the
database) and luxxx.xml (annotation data) files.  In the frames.xml
the "cBy" field has been removed from the "frame" element, the "fe"
element and the "lexUnit" element.  In the luxxx.xml files, the "cBy"
and "cDate" have both been removed from the label element.  In the
generation of the framesDiff.xml file, frame and frame element
definitions are now compared to see if the content has changed since
the previous version.  This comparison was not done in the previous
(r1.1) version of the framesDiff.xml


No changes were made to frRelations.xml file

Two new XML documents have been created for this release, the
luXXXDiff.xml and the leXXX.xml.  These are described in detail below.

luXXXDiff.xml
----------------------------------------------------------------------

This is a new XML format for this release.  The purpose of the
luXXXDiff.xml files is to show specific changes between the current
luXXX.xml files and the previous (r1.1 version) release of the
luXXX.xml files.  The general format of these luXXXDiff.xml files
follows the luXXX.xml with similar child element names, in the same
order.  The luXXXDiff.xml files are only created for the data that has
changed since the last release.  This means that there IS NOT one of
these for every luXXX.XML file that is created.  As mentioned above,
the xml elements in this file are nested in the same order as the
luXXX.xml files.  If the change has occurred at the top level
(lexunit-annotation) the "changeType" will say "Edited" followed by
the field that was changed with its new and old values.  If the change
has not occurred at the top level, but rather in one of the child
elements the change type may say "None" but one of the children with
say "Edited".

The following is a general outline of the luXXXDiff.xml.  The items in
parenthesis indicate the attributes that were compared between the two
versions of the luXXX.xml files (r1.1 vs r1.2) to create the
luXXXDiff.xml

I.	lexunit-annotation (name, frame, pos, incorporatedFE)
	A.	definition
	B.	subcorpus (none*)
		1.	annotationSet (none**)
			a.	layer (name)
				i.	label (name, start, end, iType)

* In the current software configuration it is not possible for a user
to change the name of a subcorpus so this attribute is not compared.

** Attributes of Annotation set contain only the ID and status of the
AnnotationSet.  These values are not compared during the creation of
the luXXXDiff.xml


leXXX.xml
___________________________________________________________________

This is a new XML format that mirrors the Lexical Entry Reports (LE
Reports) on the public and internal web pages.  This xml format is for
anyone who wishes to have the information contained in the LE Reports
without having to compile all of the information from the luXXX.xml.
There is one report for each lexical unit.  The format of the
leXXX.xml can be summarized as follows (items in parenthesis are the
attributes of each of the XML Elements):

I.	lexical-entry (luName, frameName, luPOS, incorporatedFE)
 	A.	 definition 
 	B.	 semtypes
		1.	semtype
 	C.	 governors
		1.	governor(lemma, type)
			a.	annotationSet-ids
				i.	id
	D.	FERealizations
   		1.	FERealization (total)
      			a.	valence-unit (fe, pt, gf)
      			b.	annotationSet-ids
				i.	id
  	E.	FEGroupRealizations
    		1.	FEGroupRealization (total)
      			a.	fes
				i. fe
			b.      pattern (total)
			        i.	valence-unit(fe, pt, gf)
        			ii.	annotationSet-ids
          				a.	id

The XML Element FERealizations represent the first table in the report
titled "Frame Elements and Their Syntactic Realizations".  Each
FERealization Element represents one line in the table.  The XML
Element FEGroupRealizations represents the second table titled
"Valence Patterns".  Each of the FEGroupRealization Elements represent
the fes in a pattern and the child element "pattern" displays the fe
gf and pt involved in that pattern.

************************************************************************

XML Release Notes for r1.1

The xml format for both frames.xml (top-half of the data) and the
luxxx.xml files (annotation data) has been changed with this release.
The changes reflect the changes made to the database when the new
version of the Framenet II software was implemented in January, 2003.
Two new XML file types have been created for this release.  The file
frRelation.xml shows the frame-to-frame relations and the file
framesDiff.xml defines the major changes that have been made to the
frames since the previous release in October 2002.  

The rest of this document describes the current format of the XML
files, the differences in the format from the previous release, and the
Document Type Definitions (DTDs) for the XML files.

FRAMES.XML
------------------------------------------------

Current Format for r1.1

The frames.xml file contains information about the basic
frame-semantic objects: frames, frame elements, lexunits, lexemes,
etc.  The data is divided into frames and presented in order by the ID
number of the frame.  Each frame has the same elements and their
format can be summarized as:

I.	Frame A
	A.	Frame A Definition
	B.	FrameElements
		1.	FrameElement A
			a.	FrameElement A Definition
			b.	Semantic Types (FrameElement)
				i.	Semantic Type
		2.	FrameElement B
			etc.
	C.	Lexical Units
		1.	LexUnit A
			a.	LexUnit A Definition
			b.	Annotation
				i.	Number Annotated
				ii.	Total Sentences
			c.	Lexeme
			d.	Semantic Types (LexUnit)
				i.	Semantic Type
		2.	LexUnit B
			etc.
	D.	Semantic Types (Frame)
		1.	Semantic Type

II.	Frame B
	A.	Frame B Definition
	etc.


Format of The Previous (r1.0) Release of frames.xml

In the previous version of the frames.xml, frames were listed
alphabetically and the following basic format was used for each Frame:

I.	Frame A
	A.	Frame A Definition
	B.	FrameElements
		1.	FrameElement A
			a.	FrameElement Definition
		2.	FrameElement B
			etc.
	C.	Lemmas
		1.	Lemma A
			a.	notes (Lemma)
				1.	note
			b.	Annotation
				i.	Number Annotated
				ii.	Total Sentences
			c.	Lexeme
		2.	Lemma B
			etc.
	
II.	Frame B
	A.	Frame B Definition
	etc.


Major differences between frames.xml R1.1 and R1.0:

	Root Element has XMLCreated element showing when XML was created.

	The lemmas element has been renamed to lexunits and the lemma
	element renamed to lexunit.

	Notes element has been eliminated.

	Semantic Types provided as elements (rather than attributes)
	for Frames, FrameElements and LexUnits (where they occur in
	the database).
	
	Definitions have been stripped of formatting code to make them
	more human readable.

	LexUnit Definitions (or Sense Descriptions) are now included
	as Definition elements under each LexUnit.

	The Core attribute of FrameElement has been changed to
	coreType and its values from 'true' or 'false' to 'Core',
	'Peripheral', 'Extra-Thematic' or 'Core-Unexpressed'.

LUXXX.XML FILES
------------------------------------------------

Current Format for R1.1

The luxxx.xml files contain information about the annotation objects:
lexunit-annotation, subcorpora, annotationSets, layers, etc.

There are now two versions of XML for each LexUnit.  The xmlPOS
version contains the BNC/PENN TreeBank parts of speech tags.  Both
sets of luxxx.xmls will have the same name, but have been zipped into
separate directories for the release.

The general format of the XML has also undergone a major change for
this release.  As before, each lexunit is contained in its own file and
now (r1.1) has the following general format:

I.	lexunit-annotation
	A.	definition
	B.	subcorpus
   	1.	annotationSet
	    a.	layers
        i.	layer
					a.	labels
            1.	label
			b.	sentence
        i.	text
        ii.	parts-of-speech*
          a.	pos*

	*	Only present in the xmlPOS version of the luxxx.xml files


Format of the previous (r1.0) release

I.	lexunit
	A.	definition
	B.	subcorpus
   	1.	sentence
			a.	text
      b.	layers
        i.	layer
					a.	labels
            1.	label
					b.	parents
						1.	parent
		


Major Differences between luxxx.xml r1.1 and r1.0

	Name of the Root Element changed to lexunit-annotation from lexunit
	
	New attribute incorporatedFE in lexunit-annotation element

	Child of subcorpus is now annotationSet (previously sentence)
	
	Sentence Element now a child of annotationSet
	
	Layers are now children of annotationSet rather than sentence

	Parent Elements have been eliminated

	Layer Elements have 'ID' and 'name' attributes only.  The
	'cBy', 'lexUnitRef', 'frame', and 'lemma' attributes have been
	eliminated

	Color information (bgColorS, fgColorS, bgColorP and fgColorP
	attributes) about each label no longer appears

	'cBy' and 'cDate' attributes were eliminated from Sentence Element



FRRELATION.XML

Current Format for r1.1

The frRelation.xml contains information about the frame-to-frame
relations.  Its format can be summarized as:

I.	fr-relations
	A.	frame relation type 1
			1.	frame-relations
				a.	frame relation 1
					1.	fe-relation A
					2.	fe-relation B
					etc.
				b.	frame relation 2
					1.	fe-relation A
					2.	fe-relation B
					etc.
			
	B.	frame relation type 2
	etc.


Format of the previous (r1.0) release

None.  This is the first release version




FRAMESDIFF.XML
------------------------------------------------

Current Format for r1.1

A new XML type was created for this release: framesDiff.xml.  The
purpose of this xml is to present changes in the data that have
occurred since the last release.  The general format of the
framesDiff.xml follows the framesXML with similar child element names,
in the same order, presented by frame, and ordered by frame ID.  Each
frame will appear whether it has been changed or not.  If a change in
the data has occurred at the frame level the 'changeType' attribute
will say 'edited' rather than 'none'.  The FrameElements and/or
LexUnits of that frame that have been changed will appear as children
to that frame.

The following is a general outline of the framesDiff.xml The items in
parenthesis indicate the attributes that were compared between the two
versions of frames.xml (r1.0 vs r1.1) to create the framesDiff.xml

I.	Frame A (Name)
	A.	definition (not compared see * below)
	B.	fe (FrameElement) (name, abbrev, coreType, colors)
		1.	definition (not compared see * below)
	C.	lexunit (name**, pos, status)
		1.	definition
		2.	annotated
		3.	total
		4.	oldannotated
		5.	oldtotal
II.	Frame B
	etc.

* The definitions for Frame and FrameElement will only appear when it
  is a newly created Frame or FrameElement respectively.  Future
  versions of the framesDiff.xml will also show if a definition itself
  has been changed.

** Please note that the names of all lexunits changed for this
   release.  The parts of speech have been appended to each LexUnit
   name.  If the lexunit name is otherwise changed it will appear with
   a 'oldname' following the changeType = "Edited"


All of the child elements are optional and will only appear if some
portion of the data has changed followed by the attribute or child
element that has been changed.

As noted above the 'frame' element will appear even if its name has
not changed.  For example:

	<frame ID="5" name="Causation" changeType="None">

If the frame name had changed the xml would look like this:

  <frame ID="13" name="Cause_to_amalgamate" changeType="Edited"
  oldname="Amalgamating">

Notice the changeType = "Edited" and that this is followed by the old
value.

Another example shows that the coreType has changed for an fe:

   <fe ID="1755" name="Place" changeType="Edited"
   coreType="Peripheral" oldcoreType="False" />

The following xml excerpt shows that the lexunit has a changeType =
"Edited" but that is not followed by the old value.  This is because
it is the children of the lexunit that have changed not the attributes
of the lexunit itself.

    	<lexunit ID="2" name="cause.v" changeType="Edited">
      	<annotated>1</annotated>
      	<total>592</total>
      	<oldannotated>0</oldannotated>
      	<oldtotal>592</oldtotal>
    	</lexunit>

New frames, fes and lexunits also appear in the framesDiff.xml and
they have a changeType = "Created", have an attribute cDate to
indicated when they were created and are followed by their definition
(where available).

Deleted Frames, fes and lexunits have a changeType = "Deleted" and are
followed by their definition.



Format of the previous (r1.0) release

	None.  This is the first release version


DOCUMENT TYPE DEFINITION (DTD) GENERAL INFORMATION
------------------------------------------------

DTDs are written in a formal syntax that explains precisely which
elements and entities may appear in an XML document and what the
elements' contents and attributes are.  The DTDs each start with a
general outline of the elements that are/may be included in the XML.
This general information is followed by the dtd syntax that is read by
the xml parser when the document is validated.  All xml files of a
specific type (luxxx.xml files for example) must follow the rules
listed in the DTD.  A person using the data can therefore determine
which elements will always be present or are optional what the
children of that element will be (if any) and the possible attribute
of each element.  The DTDs are essentially in outline form from the
top level down.  Each element is defined, its possible children are
listed followed by the possible attributes of that element.

A portion of the Framesv1_1.xml file is shown below as an example:

     <!ELEMENT frames (frame*)>

     <!ATTLIST frames 
               XMLCreated  CDATA     #REQUIRED>

     <!ELEMENT frame (definition, fes, lexunits, semtypes)>

     <!ATTLIST frame 
               ID       CDATA         #REQUIRED
               name      CDATA         #REQUIRED
               cBy      CDATA         #IMPLIED
               cDate     CDATA         #IMPLIED> 

     <!ELEMENT definition (#PCDATA)>

explanation:
	
	<!ELEMENT frames (frame*)>	

The <!ELEMENT> above defines an element whose name is 'frames' and has
zero or more children named 'frame'.  The 'zero or more' is indicated
by the asterisk following the word frame.  A '?' following the name
would indicate "zero or one' of the element and '+' would indicate
'one or more' of the element.

     <!ATTLIST frames 
          XMLcreated  CDATA     #REQUIRED>

defines the attribute list for the 'frames' element.  In this case the
'frames' element has one attribute 'xmlcreated' whose data type is
'CDATA' (any string of text valid in xml).  #REQUIRED indicates that
this attribute is required to appear every time.

     <!ELEMENT frame (definition, fes, lexunits, semtypes)>
			
defines the element 'frame' and shows that there are four possible
children: 'definition', 'fes', 'lexunits' and 'semtypes' and since
there is no modifier (?, +, or *), all of these children must always
be present.

     <!ATTLIST frame 
               ID       CDATA         #REQUIRED
               name      CDATA         #REQUIRED
               cBy      CDATA         #IMPLIED
               cDate     CDATA         #IMPLIED> 

defines the attribute list for the 'frame' element.  There are four
listed attributes: ID, name, cBy and cDate.  The #IMPLIED tag
indicates that these two attributes (cBy and cDate) are optional and
each instance of the element may or may not provide a value for the
attribute.

    <!ELEMENT definition (#PCDATA)>
	
defines the 'definition' element and indicates that the only child of
this element is parsed character data (no child elements).

The dtd then goes on to define each of the other 'frame' element
children (fes, lexunits, semtypes) and their children and attributes
in a similar manner.
