Developer's Diary
Daily software development, with Terry Ebdon
Parsing SimpleMind XML - Part One

Tuesday 11th July, 2017

SimpleMind is excellent, and inexpensive, mind mapping software. I've used it for many years, and like it a lot. But... I hit a couple of problem with it recently. Not with the software itself, but the way I was using it.

The Problems

  1. It appeared to be randomly losing images associated with map nodes. I now believe the issue was due to me purging the temp folder while it was running. While investigating the issue I needed to understand how the embedded images were stored, and write a script to check for the issue.
  2. Map topics can be hyperlinked to each other. These links don't have to be in the same mind map. I have hundreds of interlinked mind maps, in a complex folder tree. Moving, or renaming, a mind map can break links. There's no warning when that happens. I needed a way to scan the entire folder tree and find the broken links.

The SMMX files

SimpleMind files have a .smmx file type. The file is a zip archive containing an XML file and the embedded document images. I've created a test file with five map nodes, two external links and an embedded image. This was copied from a “corrupt” branch, so will have a problem.

The test Mind Map looks like this:

2017-07-11_Image_Test_smmx.png

Cracking open the zip file reveals this:

├───document
│       mindmap.xml
│
└───images
        db7f9c0c717f68833d76f51e4eeef85df359bba8.png
        hWPR03nipkq5KJPfDTtXIQ.png

The smmx file contains two embedded images, which is one more than you might expect.

A peek at the XML shows it's easy enough to understand. It would be nice if it was indented though. That's a job for the W3C's HTML tidy tool:

C:\SimpleMindTools>tidy -qe -xml mindmap.xml --output mindmap.tidied.xml

That, slightly obscure, command will:

  • Read the input file as well formed xml – that's the -xml part.
  • Suppress non-essential output, i.e. operate in 'quiet mode'– that's the '-q'.
  • Only show errors – that's the e in '-qe'.

This will show a heap of invalid character warnings. That's fine; the output file is throw-away. I'm just using it to “pretty-print” the file.

The tidied XML is now nicely indented, making it a lot easier to the see the XML's structure.

From the tidied file you can see this top level structure:

I've omitted some irrelevant tags from the above diagram, for clarity.

Parsing the XML

The example code, SmmxTest.groovy, is available on BitBucket under an Apache 2.0 licence. Line numbers, in the following description, match those in the file on BitBucket.

First I crack open the ZIP file, at line 12, using Java's ZipFile class:

10 import groovy.xml.DOMBuilder
11 import groovy.xml.dom.DOMCategory
12 def zipFile = new java.util.zip.ZipFile(new File('Image Test.smmx'))

Then iterate through the entries, until the mindmap XML file is found:

15 zipFile.entries().each {
16  if ( it.name == "document/mindmap.xml" ) {

Now load the XML into groovy's XmlParser:

19  def xml = zipFile.getInputStream( it )
20  def parser = new XmlParser(false,false,true)
21  def simpleMind = parser.parseText(xml.text[41..-1])

There's a little hack at line 21, notice it's not sending the entire XML string to the XmlParser. That odd looking xml.text[41..-1] strips off the first 40 characters. Why? Because that's the XML header and end-of-line line characters.

2017-07-11_strip_xml_header.png
Note
This is my initial test code, used to understand the XML file. This was written as throw-away code, not production software. Writing this throw-away gave me the information I needed to construct the real software, which I'll show you later.

At line 24 I check that the name of the top-level element matches the one I expect, based on the output from HTML Tidy. Note the use of an assertion; another indication that this is test software. Production software should anticipate an incorrect file being provided, and behave gracefully.

24  assert simpleMind.name() == "simplemind-mindmaps"
25  def mindmap = simpleMind.mindmap

At line 29 I display the mind map's title. Then, at lines 30 to 36, the child images are examined.

28  mindmap.with {
29  println "*** ${meta.title[0].attributes().text} <<<"
30  topics[0].each { topic ->
31  println "${topic.attributes().text} - ${topic.attributes().text}"
32  topic.children.each { child ->
33  child.image.each { img ->
34  println "child image name: ${img.attributes().name}"
35  }
36  }

Note the Groovy syntax used, which is very similar to XPath.

Related web pages.


10th July 👈 Top of page © 2017 Terry Ebdon

Find me on coding on BitBucket, networking on LinkedIn and hanging out on twitter.