xBus

CSVMessage

 Simple parsing and serialization

The first example shows the datastructures in parsing and serialization of CSV data to/from XML documents without using an interface definition. First the CSV data is shown and afterwards the corresponding XML document.

01 Name, Rank, Location
02 John Doe, Software Engineer, Munich
03 "Powers, Mary", Senior Software Engineer, London
01:
This CSV example contains a header, containing the names of the three fields.
02 - 03:
The data consists of two lines, each line must have exactly the same number of columns. Since the first field of the second data line contains the delimiter, it must be surrounded by a quote character.
01 <?xml version="1.0" encoding="UTF-8"?>
02 <EmployeesInCSV>
03 <Header>
04 <Heading>Name</Heading>
05 <Heading>Rank</Heading>
06 <Heading>Location</Heading>
07 </Header>
08 <Records>
09 <Record>
10 <Name>John Doe</Name>
11 <Rank>Software Engineer</Rank>
12 <Location>Munich</Location>
13 </Record>
14 <Record>
15 <Name>Powers, Mary</Name>
16 <Rank>Senior Software Engineer</Rank>
17 <Location>London</Location>
18 </Record>
19 </Records>
20 </EmployeesInCSV>
02:
The name of the root element is the name of the interface as defined in the configuration. During serialization, the name of the root element is ignored.
3 - 7:
These lines represent the header of the CSV data. Each header entry is enclosed in a Heading tag.
8 - 19:
The Records section contains the data, each record is enclosed in a Record tag. The content of each field is enclosed in a tag with a name either derived out of the header or the constant name Field if the CSV data doesn't contain a header.
 Parsing using an interface description file

CSV parsing can be performed automatically without any additional meta information indication necessary. However, user want sometimes change the meta names. Using interface description files, user can define the header values, the field names or both.

The steps to achieve the renaming of CSV metadata are:

  1. Create an interface description file
  2. Indicate the definition file in the configuration:
    Chapter Section Key Value
    System
    source
    interface
    name
    DescriptionFile
    Filename


DTD for interface description file

The interface description file must be in XML format. It needs to be valid against the DTD definition file InterfaceSpecCSV.dtd:


<?xml version="1.0" encoding="UTF-8"?>

<!ELEMENT InterfaceSpec  ( Header?, Records?) >
    <!-- Root element -->
<!ATTLIST InterfaceSpec Name NMTOKEN #REQUIRED >

<!ELEMENT Header ( Field+ ) >
    <!-- File header -->
<!ATTLIST Header Name NMTOKEN #IMPLIED >

<!ELEMENT Records ( Record+ ) >
    <!-- Part which contains the Content Fields-->

<!ELEMENT Record ( Field+ ) >
    <!-- Structure description for a single record type. -->
<!ATTLIST Record Name NMTOKEN #REQUIRED >

<!ELEMENT Field EMPTY >
    <!-- Specification of a single field -->
<!ATTLIST Field Name CDATA #REQUIRED >
<!ATTLIST Field Format ( alpha | blank | const | date | num ) #IMPLIED >
<!ATTLIST Field Value CDATA #IMPLIED >
    <!-- Necessary if <Format>="const".
         Specifies the constant value. -->
<!ATTLIST Field DateFormat NMTOKEN #IMPLIED >
    <!-- Necessary if <Format>="date".
         Format strings like "yyMMdd" should be used which obey the
         java.text.SimpleDateFormat conventions. -->
<!ATTLIST Field DecimalPoint ( comma | dot ) #IMPLIED >
    <!-- Necesssary if <Format>="num" and float values may be specified. -->
<!ATTLIST Field Length NMTOKEN #IMPLIED >



Interface description example

The following example shows, how the sample CSV format from the top of the page can be modified by an interface description.

The CSV file:
01 Name, Rank, Location
02 John Doe, Software Engineer, Munich
03 "Powers, Mary", Senior Software Engineer, London

A suitable interface description file:
<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE InterfaceSpec SYSTEM "InterfaceSpecCSV.dtd">

<InterfaceSpec Name="CSVFullTest">
      <Header Name="employeesHeader">
                  <Field Name="Full name"/>
                  <Field Name="Status in company"/>
                  <Field Name="Working location"/>
      </Header>
      <Records>
            <Record Name="employee">
                  <Field Name="name"/>
                  <Field Name="status"/>
                  <Field Name="location"/>
            </Record>
    </Records>
</InterfaceSpec>

Using the CVS data and the description file, the input values are parsed into an XML representation. The XML representation is used internally by the xBus and the starting point for further processing.

The parsed data in XML format:
01 <?xml version="1.0" encoding="UTF-8"?>
02 <EmployeesInCSV>
03 <Header>
04 <Heading>Full name</Heading>
05 <Heading>Status in company</Heading>
06 <Heading>Working location</Heading>
07 </Header>
08 <Records>
09 <Record>
10 <name>John Doe</name>
11 <status>Software Engineer</status>
12 <location>Munich</location>
13 </Record>
14 <Record>
15 <name>Powers, Mary</name>
16 <status>Senior Software Engineer</status>
17 <location>London</location>
18 </Record>
19 </Records>
20 </EmployeesInCSV>


Processing details:
When using interface description files, a check is done, whether the number of fields per record in the description file and the data file are equal. An error occurs if they differ.
Only names, which are valid tag names,  are allowed in the record section of the description file. If an incorrect setting is made in the description file for a field , the tag "field" is used in the XML representation instead.

The mechanism to select header values and tag names is quite complex. The following table gives an overview, which names are taken depending on the content of the data and description file:

CSV file Description file Result
has header does not contain header No description file does not contain any information contains only Header information contains only Records information contains Header and Records information Header information taken from: Tag names of entries taken from:
X   X         CSV header CSV header
X     X       CSV header CSV header
X       X     Description file Header information in description file
X         X   CSV header Records information in description file
X           X Description file Records information in description file
  X X         No header Tag name = "field"
  X   X       No header Tag name = "field"
  X     X     Description file Header information in description file
  X       X   No header Records information in description file
  X         X Description file Records information in description file