Loading XML Data–CONVERT Option 2

I was playing around with some XML lately, and had to load a file that looked like this::

2015-07-22 14_40_59-mathis.xml - Notepad

I ran a simple query, one that used the OPENROWSET and a CONVERT to load the data.

WITH XmlFile (Contents) AS (
SELECT CONVERT (XML, BulkColumn) 
FROM OPENROWSET (BULK 'C:\mathis.xml', SINGLE_BLOB) AS XmlData
)
SELECT *
FROM   XmlFile
GO

However, that didn’t work. I received this message in SSMS.

Msg 6359, Level 16, State 1, Line 1

Parsing XML with internal subset DTDs not allowed. Use CONVERT with style option 2 to enable limited internal subset DTD support.

Hmmm. That seems to make sense. Let’s add an option to CONVERT of 2. I’ve assumed the last parameter is the one mentioned, as with date conversions, and added that. I can hover with SQL Prompt and see that.

2015-07-22 14_43_13-SQLQuery1.sql - ARISTOTLE.sandbox (ARISTOTLE_Steve (71))_ - Microsoft SQL Server

Let’s change the code:

WITH XmlFile (Contents) AS (
SELECT CONVERT (XML, BulkColumn,2) 
FROM OPENROWSET (BULK 'C:\mathis.xml', SINGLE_BLOB) AS XmlData
)
SELECT *
FROM   XmlFile
GO

That works.

Why?

If you go to the BOL page for CAST and CONVERT, you will find an XML styles section. There are possible values of 0 (default), 1, 2, and 3. In this case, the 2 enables an internal DTD processing, which basically uses a default document of sorts for parsing the XML. No external DTD is needed and this is treated as a standalone document.

I am not an XML expert, but I’m guessing here that I’ve included a document that doesn’t conform to some specification and the additional style parameter allows SQL Server to ignore some of what’s there.

If anyone knows more, I would like to better understand how this works.

About way0utwest

Editor, SQLServerCentral
This entry was posted in Blog and tagged , , . Bookmark the permalink.

4 Responses to Loading XML Data–CONVERT Option 2

  1. it’s impossible to tell without seeing the XML header, but the idea is that SQL server thinks that your header contains a document type declaration (DTD), therefore it is called an “internal DTD”. Take look here – https://en.wikipedia.org/wiki/Document_type_definition

  2. way0utwest says:

    Thanks. I’ll see if that might be the case.

  3. You have an external DTD declaration in your XML “” and that is what makes SQL Server cranky.

    The DTD is used to enforce the structure of the XML document much like XML Schemas but with the additional feature of defining your own entities. Entities works like macros and SQL Server can handle those if they are included in an internal DTD where the entire DTD is included in the XML document.

    SQL Server does not validate the XML against the DTD, neither for internal or external DTD’s. External DTD’s is never loaded and if I remember correctly (Michael Rys blog/whitepaper/presentation ?) that was a design decision made early because external DTD’s could then possibly be used to perform a DoS attack. .

    Using the parameter 2 in the convert is you saying to SQL Server that you know what you are doing and that you don’t expect the DTD structure to be enforced and that external DTD’s will not be used.

    Loading your XML will fail if it use entities defined in the external DTD.

    Msg 9448, Level 16, State 1, Line 1
    XML parsing: line 5, character 13, well formed check: undeclared entity

Comments are closed.