I know, weird name for a blog title, but I’ve spent the better part of the last two weeks figuring out how to take a rather large XML file (13-25 Mb on average), import the information into a SQL Server database into 6 related tables and then use SQL Server Integration Services to query the data to generate 80+ fixed width format files for subsequent transfer to another system for processing.  When I first began this endeavor I grabbed my “Beginning XML 3rd Ed.” book by Wrox to study-up on XPath and XQuery. After some reading, and trial and error I made an attempt at using SSIS to import the data.  In short, this would have worked well had there been any data to actually import from within the element tags.  Unfortunately, this was not the case.  All of the data contained in this XML file was stored in the attribute tags of each element.  For those who are technologically challenged, let me explain.

<Person>Jay</Person>     Within the beginning and closing tag of person, there exists a Jay.  Jay is the data stored within the element “Person”.  Unfortunately, in my situation the XML looked like this:

<Person value=”Jay”></Person>   As I attempted my import, there was nothing between the tags to import.

This particular XML file has more than 60 elements and a minimum of 4-5 attributes per tag.  Traversing the tree was becoming quite cumbersome.  I went to my newest favorite search engine BING (http://www.bing.com) and searched for “XML Import Solution".  Fairly high in the result set was an open source tool called XMLlstar.  XMLstar is a command line tool that utilizes XPATH to perform queries of XML files and generate text files or other formats for importation into databases.  It was a fairly easy tool to utilize but I still needed something more robust and something that my corporate IT group would allow to be placed on the network.  I continued researching and discovered LINQ.  LINQ is a Microsoft tool that is absolutely phenomenal.  Visual Studio 2008 even has IDE support for LINQ to SQL classes.  Using this tool Visual Studio builds the classes necessary to access the relational data stored in the database.  I wasn’t quite interested in this feature yet but I was very interested in the query capabilities of LINQ to access the data stored within the xml file.  In reality, an xml file is nothing more than a representation of a relational database.

If you’re interested in the solution to my problem keep reading, if not then bail out now (if you haven’t already.)

To begin with I needed to read the xml file and query the specific elements and attributes with the data necessary to load into the database.  For this purpose I used the XDocument and XElement. 

I primarily code in C# so the code that follows are snippets of the code required to process my file.

Imports

using System.XML;
using System.XML.LINQ;
using System.LINQ;

XDocument doc = new XDocument(Path.GetFullPath(“c:\ImportData\myXMLfile.xml”);
XElement ex = doc.Element("RootElement")

So now what does all this garbledy gook do?  Let’s see…   The most important part is to reference the LINQ classes.  This is done using the Imports statements. 

The XDocument is a reference to an XML document that I’ve loaded providing the file path and file name.  The instance I’ve created is called “doc”.

The XElement sets a reference to the top most element of the XDocument referred to simply as doc.  Now that I have a document and a starting location I can use LINQ to query the data I require for my processing.  It’s probably best to determine if an element has attributes before trying to get their values.  I can check to determine if attributes exist for an element by using the following:

StringWriter sb = new StringWriter();   // I’m going to use a StringWriter to write any found attributes to.
            try
            {
                if (e.HasAttributes)  // checking to see if the selected element has any attributes
                {
                    foreach (string item in e.Attributes())  // element has attributes so reiterate through the collection
                    {
                        sb.Write("Attribute Name" + item + "\n");       // write out the attribute name
                 
                }

            }
            catch (Exception exc)   // catch any exceptions and show them in a messagebox
            {
                MessageBox.Show("{0} Exception Caught: " + exc);  //note that the Messagebox Class is found in System.Windows.Forms

                        }

Thus far we haven’t seen any LINQ usage.  I’m documenting the “On-the-job” training process I used as I was learning about XML and gaining access to attributes so it might take a few blog entries to complete this. :)   Tomorrow night I’ll tackle part 2 and show how I extracted the data required for my import and then follow through with the SSIS dataflow design.  Heck maybe I’ll even include some screenshots to text the full functionality of this Windows Live Writer.

Till next time…

Jay