XML Basics for API Professionals

XML is often seen by API developers as an outdated and overly complicated technology that has been replaced by JSON. This view is understandable, but incomplete. XML still plays an important role in integration scenarios, especially when working with legacy systems, industry standards, and document-oriented data.

This article provides the XML knowledge that API professionals need when integrating with legacy systems.

Why XML Is Still Relevant

When working with APIs, there is a good chance that you will still encounter XML.

a) Legacy Services

Many existing enterprise systems expose XML-based interfaces, for example SOAP web services described in WSDL. These systems are often business-critical and cannot easily be replaced. Instead, they must be integrated using API gateways or transformation layers.

b) Standard Formats

XML is popular in government, finance, logistics, and healthcare to standardize document formats. The European Union, for example, defines multiple XML-based exchange formats. To process or exchange documents in these formats, XML support is required. Examples include:

UBL (Universal Business Language)
PEPPOL e-invoicing
SEPA payment formats
HL7 healthcare messages
NIEM government data exchange

c) Markup Language

JSON, as the name JavaScript Object Notation suggests, is designed to serialize objects. XML, in contrast, is a markup language. It was often used as a serialization format, even though that was not its primary purpose.

For semi-structured and document-oriented content, XML remains a strong choice. Office document formats are a good example. The OpenDocument Format (ODF) uses XML to represent complex structured content such as text, tables, and styles.

What Is XML?

XML stands for Extensible Markup Language and is a text-based data format.

Originally, XML was envisioned as a universal format for structured documents and even as a successor to HTML. That never fully materialized. Instead, XML became widely adopted as a platform-neutral format for data exchange between applications. At the time of its introduction, XML introduced several capabilities that were considered revolutionary.

Consider the following data:

9001,2026-03-16,Anna Müller,P10,USB Cable,2,5.90,P11,Wireless Mouse,1,19.90,P12,Laptop Stand,1,29.90

Without additional information, this data is difficult to understand. Expressed in XML, it becomes:

<order id="1016">
    <date>2026-03-16</date>
    <customer>C100</customer>
    <name>Anna Müller</name>
    <items>
        <item>
            <article>P10</article>
            <description>USB Cable</description>
            <quantity>2</quantity>
            <price>5.90</price>
        </item>
        <item>
            <article>P11</article>
            <description>Wireless Mouse</description>
            <quantity>1</quantity>
            <price>19.90</price>
        </item>
        <item>
            <article>P12</article>
            <description>Laptop Stand</description>
            <quantity>1</quantity>
            <price>29.90</price>
        </item>
    </items>
</order>

XML is human-readable and self-describing. This means the structure and meaning of the data are embedded directly in the document, making it easier to understand without consulting external documentation.

XML Basics

XML documents may start with an optional prolog:

<?xml version="1.0" encoding="UTF-8"?>

It specifies the XML version and the character encoding. If your language includes funny characters such as the German ä, ö, or ü, it is important to specify the encoding. Without it, the receiver of the document must guess the correct character set, which often leads to encoding errors.

The basic concept in XML is the element. An element consists of an opening and closing tag:

<order></order>

Each element can contain child elements, forming a hierarchy with a single root element at the top.

<order>
    <date>2026-03-16</date>
    <customer>C100</customer>
    <name>Anna Müller</name>
</order>

The <order> element is also the parent of three children. XML documents form a tree structure, and elements must be properly nested to maintain that hierarchy. All tags must therefore be closed in the correct order. The following example is therefore not well-formed:

<order><date></order></date>

XML elements are ordered. This makes it possible to represent lists by repeating elements:

<items>
    <item>
        <article>P10</article>
        <description>USB Cable</description>
        <quantity>2</quantity>
        <price>5.90</price>
    </item>
    <item>
        <article>P11</article>
        <description>Wireless Mouse</description>
        <quantity>1</quantity>
        <price>19.90</price>
    </item>
    <item>
        <article>P12</article>
        <description>Laptop Stand</description>
        <quantity>1</quantity>
        <price>29.90</price>
    </item>
</items>

Attributes

Elements may also have attributes:

<order date="2026-03-16">

Attributes help keep the syntax compact. Multiple attributes can be placed in the start tag, but each attribute must have a unique name. Attribute order is not significant, and parsers may return attributes in a different order.

You do not need more than the basics discussed so far to use XML. However, XML offers many additional features, and this is where things can become more complex.

Namespaces

XML namespaces are used to avoid naming conflicts and to combine multiple XML vocabularies in one document.

<office:document-content
        xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
        xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0">
    <office:body>
        <office:text>
            <text:h>An XML Poem</text:h>
            <text:p>
                Tags whisper structure where data finds its frame,
                Verbose yet timeless, <text:span text:style-name="BOLD">XML</text:span> remembers every name.
            </text:p>
        </office:text>
    </office:body>
</office:document-content>

The elements:

<text:p>, <text:h>, <text:span>

belong together because they share the prefix text. In the same document, there is a second family of elements using the prefix office. A prefix must be associated with a namespace. This declaration defines that association:

xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"

It maps the prefix text to a namespace identifier, allowing all elements with that prefix to be grouped into the same vocabulary. A common technique for making namespaces globally unique is to use a URL:

http://www.w3.org/2001/XMLSchema

Because the W3C controls the domain w3.org, it can use URLs within that domain to create globally unique namespace identifiers. In this context, the value is used as an identifier, not as a locator. For that reason, these names are referred to as URIs (Uniform Resource Identifiers).

Using URLs to achieve uniqueness is effective, but it can also be confusing. The namespace value looks like a web address, which often leads users to assume that it points to a resource that must be retrieved. In most cases, this is not true. The URI is only used as a unique name.

There are many rules and techniques that define how namespaces are declared and where they are valid. This added complexity contributed to XML being perceived as difficult compared to simpler formats.

However, the complexity can be worthwhile. Namespaces allow elements from different XML vocabularies to be combined in a single document. This provides a high degree of flexibility and is one of the reasons the language is called Extensible Markup Language.

Think of an office document that contains elements from a word processing vocabulary and a spreadsheet vocabulary. It may also include elements from a security specification that add digital signatures. In addition, metadata about the author and the document itself might be expressed using the Dublin Core specification.

Namespaces allow all of these different vocabularies to be combined in a single document.

Markup Language

An element can contain both child elements and text mixed together. This concept is called mixed content. Consider the paragraph element:

<text:p>
    Tags whisper structure where data finds its frame,
    Verbose yet timeless, <text:span text:style-name="BOLD">XML</text:span> remembers every name.
</text:p>

This element has the following children:

Tags whisper structure where data finds its frame,
Verbose yet timeless,
<text:span text:style-name="BOLD">XML</text:span>
remembers every name.

Mixed content allows expressing semi-structured content such as text documents or HTML pages, where the structure is not as rigid as in object-based data. Compare this with the order example earlier, which follows a strict hierarchical structure.

Markup language is part of the name XML, but this aspect is often overlooked. XML is not primarily an object notation. JSON and YAML are usually better suited for serializing objects. XML shines when the data has a document structure and markup is required.

XML Technologies

XML itself is only the core standard. Additional standards build on top of XML and provide more advanced features.

One important standard is XML Schema (XSD), which allows describing the structure of a family of XML documents and therefore defining an XML vocabulary.

XPath is used to query XML documents and select elements within them. And XSLT can transform XML documents into other formats such as XML, HTML, or JSON.

These are just a few of the technologies that form the XML ecosystem.

Conclusion

XML is not outdated. It remains a critical technology for standardized data exchange and document-oriented formats such as ODF.

However, XML was often used in older REST APIs and web services where its advanced features were not needed. In these cases, namespaces and the verbose syntax made XML less suitable. As a result, XML in such APIs is often considered legacy technology that is gradually replaced by the more lightweight JSON, which is also better suited for object serialization.

Modern API gateways support XML alongside JSON and OpenAPI and make it possible to integrate with XML-based systems.

Understanding XML is therefore essential for building reliable integration architectures and modernizing legacy systems.