Skip to content Skip to navigation

OpenStax-CNX

You are here: Home » Content » A Brief Introduction to XML

Navigation

Recently Viewed

This feature requires Javascript to be enabled.
 

A Brief Introduction to XML

Module by: R.G. (Dick) Baldwin. E-mail the author

Summary: This module is part of a collection dedicated to learning XML.

Preface

General

This module is part of a collection dedicated to learning XML.

Viewing tip

I recommend that you open another copy of this document in a separate browser window and use the following links to easily find and view the figures while you are reading about them.

Figures

Supplemental material

I recommend that you also study the other lessons in my extensive collection of online programming tutorials. You will find a consolidated index at www.DickBaldwin.com .

A brief introduction to XML

The name XML derives from e X tensible M arkup L anguage. According to Wikipedia,

"A markup language is a system for annotating text in a way which is syntactically distinguishable from that text."

In other words, when text has been annotated or marked up, the annotations can be easily distinguished from the original text. For example, if you turn in a term paper and the professor annotates it with a red pencil, you can easily distinguish her annotations from your original text. However, XML doesn't use color to annotate text. Instead, XML uses specially formatted text to annotate text.

Structured documents

XML gives us a way to create and maintain structured documents in plain text that can be rendered in a variety of different ways. For example, before I upload this document to the Connexions website for publishing, I will convert into CNXML , which is one of the many flavors of XML. Once the document is on the website in that format, programs on the website have the ability to render it in the form of a web page (which you are probably reading right now) or in the form of a PDF document, which you can download and print if you choose to do so.

There is a lot of jargon involved in XML. One of my objectives will be to explain the jargon.

What do I mean by a "structured document?"

I will answer this question by providing an example. A book is a structured document. In its simplest form, a book may be composed of chapters. The chapters may be composed of sections. The sections may contain illustrations and tables. The tables are composed of rows and columns. Thus, it would be possible to draw a picture that illustrates the structure of a book.

What do I mean by "plain text?"

Characters such as the letters of the alphabet and punctuation marks are represented in the computer by numeric values, similar to a simple substitution code that a child might devise. For example in one popular encoding scheme (ASCII), the upper-case version of the character "A" is represented by the value 65, a "B" is represented by the value 66, a "C" is represented by 67, etc.

Different encoding schemes

The actual correspondence between the characters and the specific numeric values representing the characters has been described by several different encoding schemes over the years. One of the most common and enduring schemes is a scheme that was devised a number of years ago by an organization known as the American Standards Committee on Information Interchange. This encoding scheme is commonly known as the ASCII code.

XML supports several encoding schemes

XML is not confined to the use of the ASCII encoding scheme. Several different encoding schemes can be used. However, all of them have been selected to make it possible to read a raw XML document without the requirement for any special software.

What do I mean by a raw XML document?

By a raw XML document, I am referring to the string of sequential characters that makes up the document, before any specific rendering has been applied to the document.

What do I mean by rendering?

The most common use of the word rendering in the information technology world means to present something for human consumption. Thus, we render the specifications for a new house as a set of drawings.

When we speak of rendering a drawing or an image, we usually mean that we are going to present it in a way that makes it look like a drawing or an image to a human observer.

When we speak of rendering a document, we usually mean that we are going to present it in a way that a human will recognize such as a book, a newspaper, a web page, or some other document style.

Consider a newspaper, for example

There are at least two different ways to render a newspaper. One way is to print the information (daily news) , mostly in black and white, on large sheets of low-grade paper commonly known as newsprint. This is the rendering format that ends up on my driveway each morning.

Render on a computer screen

Another way to render a newspaper is to present the information on a computer screen, usually in full color, with the information content trying to fight its way through dozens of animated advertisements. This is the rendering format that ends up on my computer screen each day when I check for the news of the day.

The base information doesn't change

The base information for the newspaper doesn't (or shouldn't) change for these two renderings. After all, news is news and the content of the news shouldn't depend on how it is presented. What does change is the manner in which that information is presented.

A newspaper is a structured document

A newspaper is a structured document consisting of pages, columns, etc. When the information content of a newspaper is created and maintained in XML, that same information content can be rendered either on newsprint paper or on your computer screen without having to rewrite the information content.

Achieving Structure

Consider the simple structure shown in Figure 1 that represents a book having two chapters with some text in each chapter:

Figure 1: The structure of a simple book.
The structure of a simple book.
Begin Book

  Begin Chapter 1
    Text for Chapter 1
  End Chapter 1
  
  Begin Chapter 2
    Text for Chapter 2
  End Chapter 2
  
End Book

Obviously a real book has a lot more structure than this, such as the preface, the table of contents, paragraphs in the text, and an alphabetical index. However, I am trying to keep this example as simple as possible.

The Objective of XML

Perhaps the primary reason for using XML is to make it possible to share the same physical document among different computer systems in a way that they all understand.

No small task

That is no small task. Over the years, dozens of different types of computers have been built, operating under several different operating systems, and running thousands of different programs. As a result, insofar as the exchange of structured documents is concerned, the computer world is a modern manifestation of the "Tower of Babel" where everyone spoke a different language. XML attempts to rectify this situation by providing a common language for structured documents.

What Does XML Contribute?

Without getting into the technical details at this point, XML provides a definition of a simple scheme by which the structure and the content of a document can be established. The resulting physical document is so simple that any computer (or any human) can read it with only a modest amount of preparation. You will sometimes see XML referred to as a "meta" language.

What Does Meta Mean?

In computer jargon, the term meta is often used to identify something that provides information about something else. (If you want to impress someone at your next cocktail party, mention that meta information is information about information.)

For example, consider the listings of stock prices, bond prices, and mutual fund prices that commonly appear in most daily newspapers. The various tables on the page provide information about the bid and ask prices for the various stock, bond, and mutual fund instruments.

Usually somewhere on the page, you will find an explanation as to how to interpret the information presented throughout the remainder of the page. You could probably think of the information contained in the explanation as meta information. It provides information about other information.

So, why might people refer to XML as a meta language?

If you write a book, XML doesn't tell you how to structure the document that represents your book. Rather, it provides you with a set of rules that you can use to establish structure and content when you create the document that represents your book. It is up to you to decide how you will use those rules to establish the structure and content of your book.

Information about new languages

You might say that XML is a language that provides information about a new language that you are free to invent. For example, Flex is a specialized programming language that is based on XML. XML doesn't specify the language. Instead, XML provides the tools used by the inventors of the Flex programming language to specify the structure of the language.

Different flavors of XML

Similarly, XML doesn't specify CNXML. Instead, XML provides the tools used by the inventors of CNXML to specify the format of documents suitable for publication on the Connexions website. In the past, I have also published documents on a particular IBM website. That website uses a different flavor of XML to specify the format of documents suitable for publication on the website.

Transportable

If you follow the rules for creating an XML document, the document that you create can be easily transported among various computers and rendered in a variety of different ways.

Multiple renderings

For example, you might want to have two different renderings of your book. One rendering might be in conventional printed format and the other rendering might be in an online format. The use of XML makes it practical to render your book in two or more different ways without any requirement to modify the original document that you produce.

Applying XML

At this point, I am going to provide two different examples of actual XML code, either of which might reasonably represent the simple book example presented earlier in Figure 1. The first example is shown in Figure 2.

Figure 2: Very simple XML syntax.
Very simple XML syntax.
<book>
  <chap>
    Text for Chapter 1
  </chap>

  <chap>
    Text for Chapter 2
  </chap>
</book>

If you compare this example with the book example given earlier , you should be able to see a one-to-one correspondence between the "elements" in this XML code and the description of the book presented earlier.

Introducing attributes

The example in Figure 3 provides an improvement over the example in Figure 2. Figure 3 provides an "attribute" in each of the chapter elements. Each attribute specifies the chapter number.

Figure 3: XML syntax with attributes.
XML syntax with attributes.
<book>
  <chap number="1">
    Text for Chapter 1
  </chap>

  <chap number="2">
    Text for Chapter 2
  </chap>
</book>

That's a wrap

That's enough for this module. In the next module, I will begin discussing the syntax shown in Figure 3 and begin the explanation of tags , elements , content , and attributes .

Miscellaneous

This section contains a variety of miscellaneous materials.

Note:

Housekeeping material
  • Module name: A Brief Introduction to XML
  • File: FlexXhtml0080.htm
  • Revised: 11/07/13

Note:

Disclaimers:

Financial : Although the Connexions site makes it possible for you to download a PDF file for this module at no charge, and also makes it possible for you to purchase a pre-printed version of the PDF file, you should be aware that some of the HTML elements in this module may not translate well into PDF.

I also want you to know that, I receive no financial compensation from the Connexions website even if you purchase the PDF version of the module.

In the past, unknown individuals have copied my modules from cnx.org, converted them to Kindle books, and placed them for sale on Amazon.com showing me as the author. I neither receive compensation for those sales nor do I know who does receive compensation. If you purchase such a book, please be aware that it is a copy of a module that is freely available on cnx.org and that it was made and published without my prior knowledge.

Affiliation : I am a professor of Computer Information Technology at Austin Community College in Austin, TX.

-end-

Content actions

Download module as:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks