Example – Basic XML
Situation
Document to be recognized
<?xml version="1.0" encoding="UTF-8"?> <INVOICES> <INVOICE> <SENDERID>8710400000006</SENDERID> <SENDERIDCODELIST>14</SENDERIDCODELIST> <RECIPIENTID>8712345678906</RECIPIENTID> <RECIPIENTIDCODELIST>14</RECIPIENTIDCODELIST> <TESTINDICATOR>TRUE</TESTINDICATOR> <BASEEDIVERSION>96A</BASEEDIVERSION> <BASEEDIVERSIONSTANDARD></BASEEDIVERSIONSTANDARD> <GROUP>ESC</GROUP> <DOCUMENTTYPE>380</DOCUMENTTYPE> <INVOICENUMBER>104109</INVOICENUMBER> <MSGFUNCTION>9</MSGFUNCTION> <ACK>NA</ACK> <INVOICEDATE>20201105</INVOICEDATE> <INVOICETIME>1635</INVOICETIME> <TAXRATE></TAXRATE> <TAXCATEGORY></TAXCATEGORY> <EXCISEFREE></EXCISEFREE> <DATES> <DELIVERYDATE>20201105</DELIVERYDATE> </DATES> </INVOICE> </INVOICES>
Configuration steps
- Preparation
- Examine how to recognize data (sub)types.
- Examine which metadata is necessary and available (not necessary for documents that will be discarded).
- Configuration
- Configure document recognition.
- Configure how to extract and set metadata.
- Testing your configuration.
Step 1: Preparation
Step 1a: Requirements
SmartBridge needs to extract metadata from the document, so it needs to know where to find this metadata in the document. Therefore, go over your document and prepare Xpaths to locate at least the following metadata:
- Document format (XML, etc.)
- Type of document (invoice, etc.)
- (Optional: Version of the type of document, e.g. D96A)
- Sender identifier (and the type of the identifier)
- Recipient identifier (and the type of the identifier)
- Envelope number or document number
- (Optional: Value to identify test documents)
Make a note for required metadata that cannot be found in the document, and set your own value for this metadata.
Step 1b: Create a new Document Structure
Click on the
+
at the bottom of the page to add a document structure, select ‘Add XML’.- Enter a descriptive name.
Field | Description | Example value |
---|---|---|
Name | Give a name to the document structure you are defining. Name will be used for your own recognition only. | Exact DESADV |
Step 2: Configuration
Step 2a: Configure how to recognize the document
Identify what content sets this document apart from other XML documents
We might be processing other XML documents that have a huge resemblance to this document. How can SmartBridge tell these XML documents apart? Identify what content sets this document apart from other XML documents. Configure one or more of the following fields as a recognition method:
Recognize XML document using | Possible identifier |
---|---|
Doctype (DTD) reference | N/a in example; leave empty |
Namespace reference | N/a in example; leave empty |
XML Schema reference | N/a in example; leave empty |
XML node | /INVOICES (we could also use /INVOICES/INVOICE or /INVOICES/INVOICE/INVOICENUMBER) |
Step 2b: Setting metadata
Skip this step for files that are not used for further processing.
When comparing the example XML document against the requirements, you will find that you need to:
- Have SmartBridge dynamically set values for standard metadata, by extracting values from the document.
- Provide your own static values for standard metadata.
- Skip adding custom metadata.
The nodes of our example document contain the following data:
Required metadata | Remark | Where to find it | Additionally set |
---|---|---|---|
Document format (XML, etc.) | XML is automatically recognized. No need to configure this. | n/a | n/a |
Type of document (invoice, etc.) | The example document has a node that contains the Document Type, but in a proprietary format: Document Type ’380′ stands for ‘invoice’. Therefore, we assume this information is unavailable. This means we should manually label the document with this property. | No Xpath available; needs to be assigned. Set value ‘INVOIC’. | Xpath = False |
Optional: Version of the type of document, e.g. D96A | Can be extracted from the document. | /INVOICES/INVOICE/BASEEDIVERSION | Xpath = True |
Sender identifier | Can be extracted from the document. | /INVOICES/INVOICE/SENDERID | Xpath = True |
Type of Sender identifier | /INVOICES/INVOICE/SENDERIDCODELIST | Xpath = True | |
Recipient identifier | Can be extracted from the document. | /INVOICES/INVOICE/RECIPIENTID | Xpath = True |
Type of Recipient identifier | Can be extracted from the document. | /INVOICES/INVOICE/RECIPIENTIDCODELIST | Xpath = True |
Envelope number or document number. | Can be extracted from the document. | /INVOICES/INVOICE/INVOICENUMBER | Xpath = True |
Optional: Value to identify test documents |
| /INVOICES/INVOICE/TESTINDICATOR |
|
Click ‘Save’ to save your settings.
Step 3: Test your Document Structure
- Have the test file ready (see Step 1).
In the upper right-hand corner click on the test button:
A new window will open.
Click on
Browse...
to select the test file, for testing whether the new Document Structure matches your test file.- (Optional step unless your Document Structure uses communication attributes) Configure the Inhouse Recognition Parameters section.
Click on
Test
to analyze your file. You will see all the information SmartBridge is able to extract from the document, using the Document Structure that you created.- Review the Results section. Correct your Document Structure in case you run into unexpected results (e.g. when the results show the name of a different Document Structure), then test again.
You might run into unexpected results when you test a Document Structure definition that contains Macros.
In most cases SmartBridge is able to set these Macros in case a communication module first processes the document. However, this testing method does not use a communication module for testing. As a consequence, during this type of testing you will likely encounter unprocessed Macros in the test results. This is expected behavior.
Visual example of end result
Example of a document structure for an XML file (click image to enlarge).