[XML] Note 1

SGML (Standard Generalized Markup Language) is a meta language that is used to creat markup languages.
XML is a subset of SGML.

XML is a technology for creating markup languages to describe data of virtually any type in a structured manner.

XML can be used to create markup languages for describing data in almost any field.

We use "DOCUMENT" as the unit of xml, thus we call an "xml document."

XML documents are commonly stored in text files that end in the extension ".xml."

An xml parser(or an xml processor) is the software program that process an xml document.

Steps of an xml parser:
1. reads the xml document
2. check its syntax
3. reports errors
4. allows programmatic access to the docunent's contents

1. single root element (contatins only one root element)
2. start tag and end tag for each element
3. properly nested tags
4. attribute values in quotes (ex. abc="123")
5. proper captialization (case sensitive)

An xml document contains data, not formatting information.

Markup text is enclosed in angle brackets("<" and ">").

Insignificant whitespace characters may be collapsed into a single whitespace character or even removed entirely.

Reserved characters and their build-in entities in xml:

Characters Build-in Entities
ampersand (&) &amp;
left-angle bracket (<) &lt;
right-angle bracket (>) &gt;
apostrophe (') &apos;
quotation mark ("") &quot;


A processing instruction's (PI's) information is passed by the parser to the application using xml document.
Document authors may create their own processing instructions.
Almost any name may be used for a PI target except the reserved word "xml."

In CDATA(Character DATA) section, reserved characters and whitespaces are allowed.
Charcters in this section are not processed by the xml paser. (do not require xml processors)

CDATA section begins with <![CDATA[ and terminate with ]]>.


以內容上整理自:XML How to Program Chapter 5