zope.structuredtext
Documentation¶
Using Structured Text¶
The goal of StructuredText is to make it possible to express structured text using a relatively simple plain text format. Simple structures, like bullets or headings are indicated through conventions that are natural, for some definition of “natural”. Hierarchical structures are indicated through indentation. The use of indentation to express hierarchical structure is inspired by the Python programming language.
Use of StructuredText consists of one to three logical steps. In the
first step, a text string is converted to a network of objects using
the structurize()
facility, as in the following example:
raw = open("mydocument.txt").read()
from zope.structuredtext.stng import structurize
st = structurize(raw)
The output of structurize()
is simply a StructuredTextDocument
object containing StructuredTextParagraph
objects arranged in a
hierarchy. Paragraphs are delimited by strings of two or more
whitespace characters beginning and ending with newline
characters. Hierarchy is indicated by indentation. The indentation
of a paragraph is the minimum number of leading spaces in a line
containing non-white-space characters after converting tab
characters to spaces (assuming a tab stop every eight characters).
StructuredTextNode
objects support the read-only subset of the
Document Object Model (DOM) API. It should be possible to process
StructuredTextNode
hierarchies using XML tools such as XSLT.
The second step in using StructuredText is to apply additional structuring rules based on text content. A variety of differentText rules can be used. Typically, these are used to implement a structured text language for producing documents, but any sort of structured text language could be implemented in the second step. For example, it is possible to use StructuredText to implement structured text formats for representing structured data. The second step, which could consist of multiple processing steps, is performed by processing, or “coloring”, the hierarchy of generic StructuredTextParagraph objects into a network of more specialized objects. Typically, the objects produced should also implement the DOM API to allow processing with XML tools.
A document processor is provided to convert a StructuredTextDocument object containing only StructuredTextParagraph objects into a StructuredTextDocument object containing a richer collection of objects such as bullets, headings, emphasis, and so on using hints in the text. Hints are selected based on conventions of the sort typically seen in electronic mail or news-group postings. It should be noted, however, that these conventions are somewhat culturally dependent, fortunately, the document processor is easily customized to implement alternative rules. Here’s an example of using the DOC processor to convert the output of the previous example:
from zope.structuredtext.document import Document
doc = Document()(st)
The final step is to process the colored networks produced from the second step to produce additional outputs. The final step could be performed by Python programs, or by XML tools. A Python outputter is provided for the document processor output that produces Hypertext Markup Language (HTML) text:
from zope.structuredtext.html import HTML
html = HTML()(doc)
Customizing the document processor¶
The document processor is driven by two tables. The first table,
named paragraph_types
, is a sequence of callable objects or method
names for coloring paragraphs. If a table entry is a string, then it
is the name of a method of the document processor to be used. For
each input paragraph, the objects in the table are called until one
returns a value (not ‘None’). The value returned replaces the
original input paragraph in the output. If none of the objects in
the paragraph types table return a value, then a copy of the
original paragraph is used. The new object returned by calling a
paragraph type should implement the ReadOnlyDOM
,
StructuredTextColorizable
, and StructuredTextSubparagraphContainer
interfaces. See the zope.structuredtext.document
source file for
examples.
A paragraph type may return a list or tuple of replacement paragraphs, this allowing a paragraph to be split into multiple paragraphs.
The second table, text_types
, is a sequence of callable objects or
method names for coloring text. The callable objects in this table
are used in sequence to transform the input text into new text or
objects. The callable objects are passed a string and return
nothing (None
) or a three-element tuple consisting of:
- a replacement object,
- a starting position, and
- an ending position
The text from the starting position is (logically) replaced with the
replacement object. The replacement object is typically an object
that implements that implements the ReadOnlyDOM
and
StructuredTextColorizable
interfaces. The replacement object can
also be a string or a list of strings or objects. Replacement is
done from beginning to end and text after the replacement ending
position will be passed to the character type objects for processing.
Contents:
zope.structuredtext
API¶
zope.structuredtext.document
¶
Structured text document parser
-
class
zope.structuredtext.document.
Document
[source]¶ Bases:
object
Class instance calls [ex.=> x()] require a structured text structure. Doc will then parse each paragraph in the structure and will find the special structures within each paragraph. Each special structure will be stored as an instance. Special structures within another special structure are stored within the ‘top’ structure EX : ‘-underline this-‘ => would be turned into an underline instance. ‘-underline this’ would be stored as an underline instance with a strong instance stored in its string
-
parse
(raw_string, text_type, type=<type 'type'>)[source]¶ Parse accepts a raw_string, an expr to test the raw_string, and the raw_string’s subparagraphs.
Parse will continue to search through raw_string until all instances of expr in raw_string are found.
If no instances of expr are found, raw_string is returned. Otherwise a list of substrings and instances is returned
-
-
class
zope.structuredtext.document.
DocumentWithImages
[source]¶ Bases:
zope.structuredtext.document.Document
Document with images
zope.structuredtext.stletters
¶
Structured text character classes
zope.structuredtext.stng
¶
Core document model.
-
zope.structuredtext.stng.
indention
(str, front=<built-in method match of _sre.SRE_Pattern object>)[source]¶ Find the number of leading spaces. If none, return 0.
-
zope.structuredtext.stng.
insert
(struct, top, level)[source]¶ Find what will be the parant paragraph of a sentence and return that paragraph’s sub-paragraphs. The new paragraph will be appended to those sub-paragraphs
-
zope.structuredtext.stng.
display
(struct)[source]¶ Runs through the structure and prints out the paragraphs. If the insertion works correctly, display’s results should mimic the orignal paragraphs.
-
zope.structuredtext.stng.
display2
(struct)[source]¶ Runs through the structure and prints out the paragraphs. If the insertion works correctly, display’s results should mimic the orignal paragraphs.
-
zope.structuredtext.stng.
findlevel
(levels, indent)[source]¶ Remove all level information of levels with a greater level of indentation. Then return which level should insert this paragraph
-
zope.structuredtext.stng.
structurize
(paragraphs, delimiter=<_sre.SRE_Pattern object>)[source]¶ Accepts paragraphs, which is a list of lines to be parsed. structurize creates a structure which mimics the structure of the paragraphs. Structure => [paragraph,[sub-paragraphs]]
-
class
zope.structuredtext.stng.
StructuredTextDocument
(subs=None, **kw)[source]¶ Bases:
zope.structuredtext.stng.StructuredTextParagraph
A StructuredTextDocument holds StructuredTextParagraphs as its subparagraphs.
-
class
zope.structuredtext.stng.
StructuredTextExample
(subs, **kw)[source]¶ Bases:
zope.structuredtext.stng.StructuredTextParagraph
Represents a section of document with literal text, as for examples
-
class
zope.structuredtext.stng.
StructuredTextBullet
(src, subs=None, **kw)[source]¶ Bases:
zope.structuredtext.stng.StructuredTextParagraph
Represents a section of a document with a title and a body
-
class
zope.structuredtext.stng.
StructuredTextNumbered
(src, subs=None, **kw)[source]¶ Bases:
zope.structuredtext.stng.StructuredTextParagraph
Represents a section of a document with a title and a body
-
class
zope.structuredtext.stng.
StructuredTextDescriptionTitle
(src, subs=None, **kw)[source]¶ Bases:
zope.structuredtext.stng.StructuredTextParagraph
Represents a section of a document with a title and a body
-
class
zope.structuredtext.stng.
StructuredTextDescriptionBody
(src, subs=None, **kw)[source]¶ Bases:
zope.structuredtext.stng.StructuredTextParagraph
Represents a section of a document with a title and a body
-
class
zope.structuredtext.stng.
StructuredTextDescription
(title, src, subs, **kw)[source]¶ Bases:
zope.structuredtext.stng.StructuredTextParagraph
Represents a section of a document with a title and a body
-
class
zope.structuredtext.stng.
StructuredTextSectionTitle
(src, subs=None, **kw)[source]¶ Bases:
zope.structuredtext.stng.StructuredTextParagraph
Represents a section of a document with a title and a body
-
class
zope.structuredtext.stng.
StructuredTextSection
(src, subs=None, **kw)[source]¶ Bases:
zope.structuredtext.stng.StructuredTextParagraph
Represents a section of a document with a title and a body
-
class
zope.structuredtext.stng.
StructuredTextTable
(rows, src, subs, **kw)[source]¶ Bases:
zope.structuredtext.stng.StructuredTextParagraph
rows is a list of lists containing tuples, which represent the columns/cells in each rows. EX rows = [[(‘row 1:column1’,1)],[(‘row2:column1’,1)]]
-
class
zope.structuredtext.stng.
StructuredTextRow
(row, kw)[source]¶ Bases:
zope.structuredtext.stng.StructuredTextParagraph
row is a list of tuples, where each tuple is the raw text for a cell/column and the span of that cell/column. EX [(‘this is column one’,1), (‘this is column two’,1)]
-
class
zope.structuredtext.stng.
StructuredTextColumn
(text, span, align, valign, typ, kw)[source]¶ Bases:
zope.structuredtext.stng.StructuredTextParagraph
StructuredTextColumn is a cell/column in a table. A cell can hold multiple paragraphs. The cell is either classified as a StructuredTextTableHeader or StructuredTextTableData.
-
class
zope.structuredtext.stng.
StructuredTextImage
(value, **kw)[source]¶ Bases:
zope.structuredtext.stng.StructuredTextMarkup
A simple embedded image
zope.structuredtext.stdom
¶
DOM implementation in StructuredText: read-only methods
-
class
zope.structuredtext.stdom.
ParentNode
[source]¶ Bases:
object
A node that can have children, or, more precisely, that implements the child access methods of the DOM.
-
getChildNodes
(type=<type 'type'>, sts=(<type 'unicode'>, <type 'str'>))[source]¶ Returns a NodeList that contains all children of this node. If there are no children, this is a empty NodeList
-
-
class
zope.structuredtext.stdom.
NodeWrapper
(aq_self, aq_parent)[source]¶ Bases:
zope.structuredtext.stdom.ParentNode
This is an acquisition-like wrapper that provides parent access for DOM sans circular references!
-
getParentNode
()[source]¶ The parent of this node. All nodes except Document DocumentFragment and Attr may have a parent
-
getPreviousSibling
()[source]¶ The node immediately preceding this node. If there is no such node, this returns None.
-
-
class
zope.structuredtext.stdom.
Node
[source]¶ Bases:
zope.structuredtext.stdom.ParentNode
Node Interface
-
getParentNode
()[source]¶ The parent of this node. All nodes except Document DocumentFragment and Attr may have a parent
-
getPreviousSibling
()[source]¶ The node immediately preceding this node. If there is no such node, this returns None.
-
getNextSibling
()[source]¶ The node immediately preceding this node. If there is no such node, this returns None.
-
-
class
zope.structuredtext.stdom.
Element
[source]¶ Bases:
zope.structuredtext.stdom.Node
Element interface
-
getNodeName
()¶ The name of the element
-
getParentNode
()[source]¶ The parent of this node. All nodes except Document DocumentFragment and Attr may have a parent
-
getAttributeNode
(name)[source]¶ Retrieves an Attr node by name or None if there is no such attribute.
-
getAttributes
()[source]¶ Returns a NamedNodeMap containing the attributes of this node (if it is an element) or None otherwise.
-
getElementsByTagName
(tagname)[source]¶ Returns a NodeList of all the Elements with a given tag name in the order in which they would be encountered in a preorder traversal of the Document tree. Parameter: tagname The name of the tag to match (* = all tags). Return Value: A new NodeList object containing all the matched Elements.
-
-
class
zope.structuredtext.stdom.
NodeList
(list=None)[source]¶ Bases:
object
NodeList interface - Provides the abstraction of an ordered collection of nodes.
Python extensions: can use sequence-style ‘len’, ‘getitem’, and ‘for..in’ constructs.
-
class
zope.structuredtext.stdom.
NamedNodeMap
(data=None)[source]¶ Bases:
object
NamedNodeMap interface - Is used to represent collections of nodes that can be accessed by name. NamedNodeMaps are not maintained in any particular order.
Python extensions: can use sequence-style ‘len’, ‘getitem’, and ‘for..in’ constructs, and mapping-style ‘getitem’.
-
class
zope.structuredtext.stdom.
Attr
(name, value, specified=1)[source]¶ Bases:
zope.structuredtext.stdom.Node
Attr interface - The Attr interface represents an attriubte in an Element object. Attr objects inherit the Node Interface
-
getName
()¶ The name of this node, depending on its type
-
zope.structuredtext.html
¶
HTML renderer for STX documents.
zope.structuredtext.docbook
¶
Render STX document as docbook.
-
class
zope.structuredtext.docbook.
DocBook
[source]¶ Bases:
object
Structured text document renderer for Docbook.
zope.structuredtext
¶
Zope structured text markeup
Consider the following example:
>>> from zope.structuredtext.stng import structurize
>>> from zope.structuredtext.document import DocumentWithImages
>>> from zope.structuredtext.html import HTMLWithImages
>>> from zope.structuredtext.docbook import DocBook
>>> from zope.structuredtext.docbook import DocBookChapterWithFigures
>>> from zope.structuredtext.docbook import DocBookArticle
We first need to structurize the string and make a full-blown document out of it:
>>> structured_string = '''
... Title Here
...
... Body text here.'''
>>> struct = structurize(structured_string)
>>> doc = DocumentWithImages()(struct)
Now feed it to some output generator, in this case HTML or DocBook:
>>> HTMLWithImages()(doc, level=1)
'<html>...'
>>> DocBook()(doc, level=1)
'<!DOCTYPE book ...<book>...'
>>> DocBookArticle()(doc, level=1)
'<!DOCTYPE article ...<article>...'
>>> DocBookChapterWithFigures()(doc, level=1)
'<chapter>...'
For HTML, there is a shortcut:
>>> from zope.structuredtext import stx2html
>>> stx2html(structured_string)
'<html>...'
If we have references in the text we can use a different function:
>>> from zope.structuredtext import stx2htmlWithReferences
>>> stx2htmlWithReferences(structured_string)
'<html>...'
Changes¶
4.3 (2018-10-09)¶
- Add support for Python 3.7.
4.2.0 (2017-09-05)¶
- Add support for Python 3.5 and 3.6.
- Drop support for Python 2.6 and 3.3.
- Add support for PyPy and PyPy3.
- Support several new elements (inner and named links, underlines, etc) in the docbook writer.
- Fix the XML output of
DocBookBook
. - 100% test coverage, maintained by CI and tox.
- Unused internal code in the
stdom
module was removed. See issue 3.
4.1.0 (2014-12-29)¶
- Drop dependency on
six
. - Add support for Python 3.4.
- Add support for testing on Travis.
4.0.0 (2013-02-25)¶
- Add support for Python 3.3.
- Drop support for Python 2.4 and 2.5.
3.5.1 (2010-12-03)¶
- Remove antique copyright assertions in regression texts, in conformance with repository policy.
3.5.0 (2010-04-30)¶
- Update docs to conform to ZTK / Sphinx usage.
- LP #120376: Output valid html for non-ASCII characters.
3.4.0 (2007/09/01)¶
- Public release for completeness of Zope 3.4.
3.2.0 (2006/01/05)¶
- Corresponds to the verison of the
zope.structuredtext
package shipped as part of the Zope 3.2.0 release. - Only coding style / documentation changes.
3.0.0 (2004/11/07)¶
- Corresponds to the verison of the
zope.structuredtext
package shipped as part of the Zope X3.0.0 release.