Close this window

Understanding XML terminology

What are XML and CNXML?

XML stands for eXtensible Markup Language, and is a way to surround text to convey information about its content or format. CNXML is the particular brand of XML used in Connexions. Here is an example of some markup in CNXML:
Example 1
<para>
  This is a paragraph in <term>CNXML</term>.
  Notice that the markup contains tags that 
  express the meaning of it.
</para>

What is a Tag?

Tags are the markers used to enclose the text they define. In the above example, para and term are the tags being used. In XML, the opening (first) tag is surrounded by angle brackets (< and >) and the closing (second) tag is also surrounded by angle brackets, but the name of the tag is preceded by a slash (/). Also, the closing tag never contains any attribute information.

What is an Attribute?

Sometimes in XML, the tag name is not sufficient to contain all the information one might want to convey. For example, in CNXML, there are two types of code tags: 'inline' (the code is included in the current line of text) or 'block' (it is set apart from the text). This information can be included in attributes, which are nestled between the name of the opening tag and its 2nd angle bracket (>), as in the example below:
Example 2
<para>
  This is a paragraph written in 
  <code type="inline">CNXML</code>.  
  In the output, the word 'CNXML' would not 
  be offset from the rest of the text.
</para>
In this example, type is the attribute name and inline is the attribute value. The attribute name is always followed by the equal sign (=) and the attribute value is always surrounded by double or single quotes (" or '). In CNXML, the attributes one can use for a given tag are strictly defined.

What is a Child/Parent?

In XML, tags are contained inside other tags. In the above example, a code tag is nested inside a para tag. It can be said that code is a child of para and that para is a parent of code. In XML, it is not valid to write tags that do not properly nest, such as in this example, where the emphasis tag ends before its child, foreign, also ends:
Example 3
It is <emphasis><foreign>muy</emphasis> 
importante</foreign> that you write proper XML.
However, "siblings" nested inside a single tag are allowed:
Example 4
<para>
  Shakespeare wrote both <cite>MacBeth</cite>
  and <cite>Hamlet</cite>.
</para>
In CNXML, the children that a given tag can contain are strictly defined.

What is escaping?

Escaping allows one to write certain characters that would otherwise be parsed by XML processors for another purpose. For example, if you want to use the "less than" symbol, you would have to write it in XML as &lt;, since that angle bracket (<) is parsed by the processors to mark the beginning of an opening or closing tag. The following characters should be escaped:

What is CDATA?

If the text of your paragraph or code block uses a lot of characters that otherwise would need to be escaped, you might find it convenient to use CDATA, or (Unparsed) Character Data, to eliminate the need to transform each character in need of escaping. This is accomplished by beginning the section in question with <![CDATA[ and ending it with ]]>, as highlighted in this example:
Example 5
<code type="block">
<![CDATA[
<html>
  <title>This is an HTML document</title>
  <body>This is its body text</body>
</html>
]]>
</code>

In CNXML, failing to add the highlighted lines in the above code will result in a validation error. The validator will interpret the <html> tag as unknown CNXML inside the <code> tag.

Additional help topics: Connexions Tutorial and Reference, The CNXML 0.5 Specification, Using CNXML with other XML languages, CNXML and MathML
Close this window