Plump
Practically Lenient and Unimpressive Markup Parser

v0.1.11

What is Plump?

Plump is a parser for HTML/XML like documents, focusing on being lenient towards invalid markup. It can handle things like invalid attributes, bad closing tag order, unencoded entities, inexistent tag types, self-closing tags and so on. It parses documents to a class representation and offers a small set of DOM functions to manipulate it. You are free to change it to parse to your own classes though.

How To

Load Plump through Quicklisp or ASDF:

(ql:quickload :plump)

Using the PARSE function, plump will transform a string, pathname or file into a document:

(plump:parse "<foo><bar this is=\"a thing\">baz</bar><span id=\"test\">oh my")
#<PLUMP-DOM:ROOT {1004F3FCC3}>

This returns a root node. If you want to append a document to a root node (or any other node that accepts children) that you've made, you can pass it into the parse function. To return the document into a readable form, you can call SERIALIZE:

(plump:serialize *)
<foo><bar this="" is="a thing">baz</bar><span id="test">oh my</span></foo>

Using the DOM you can easily traverse the document and change it:

(plump:remove-child (plump:get-element-by-id ** "test"))
#<PLUMP-DOM:ELEMENT span {100517D8F3}>
(plump:serialize ***)
<foo><bar this="" is="a thing">baz</bar></foo>

By default plump includes a few special tag dispatchers to catch HTML oddities like doctype, self-closing tags and comments. Especially the self-closing tags can lead to problems in XML documents. In order to parse without any HTML "tricks", you can simply bind *TAG-DISPATCHERS* to NIL before parsing.

(let ((plump:*tag-dispatchers* ())) (plump:parse "foo"))

Extending Plump

If you want to handle a certain tag in a special way, you can write your own tag-dispatcher. For example comments, the doctype and self-closing tags are handled in this fashion by default.

(plump:define-tag-dispatcher my-dispatcher (name)
        (string-equal name "my-tag")
    (let ((attrs (plump:read-attributes)))
      (when (char= (plump:consume) #\/)
        (plump:consume)) ;; Consume closing
      (make-instance 'my-tag :parent plump:*root* :attributes attrs)))

During parsing, all elements are created through the functions MAKE-ROOT, MAKE-ELEMENT, MAKE-TEXT-NODE, and MAKE-COMMENT. By overriding these functions you can instead delegate the parsing to your own DOM.

Other Guff

Plump is licensed under the Artistic License 2.0 and ©2014 TymoonNET/NexT, Nicolas Hafner.
This library can be obtained via git on https://github.com/Shinmera/plump.git. For questions, patches or suggestions, please contact me via email or open a github issue.

Plump-Parser Package Index

Plump-DOM Package Index