Copyright © 2007 Dave Bayer. Subject to a BSD-style license.
This module is part of the Annote project.
module Split (Split(..),Delims,split,unsplit,joinDoc,joinCode,joinDebug) where
Split
divides an input file into code, documentation, and external documentation.
Regex
provides regular expression matching.
It is a wrapper around Text.Regex.
import Regex (mkRegex,isMatch)
Split is the type of a line of text, marked either as
Code, Delim, Doc, Blank, or Ext.
Code line represents source code in the target language.Delim line marks a transition to or from documentation.Doc line is documentation to be processed by Annote.Blank line contains only whitespace.Ext line is intended for an alternative documentation system,
and is ignored by Annote.Shell line is the result of an external computation.A Code, Delim, Doc, Blank, or Ext line is expected to be a single
line of text, not terminated with a newline.
A Shell line can evaluate to many
lines of text, and is expected to be terminated with a newline.
data Split
= Code String
| Delim String
| Doc String
| Blank String
| Ext String
| Shell (IO String)
Delims is the type of a tuple describing the start and end delimiters
for Doc and Ext blocks. These are the strings corresponding to the
option keys
(DocStart, DocEnd, ExtStart, ExtEnd)
type Delims = (String,String,String,String)
split is a finite state machine which transforms input text into
an array of Split lines.
split :: Bool → Delims → String → [Split]
split isCode (start,end,xstart,xend) text = startFilt $ lines text
where
Note that the indented functions below are contained within the where clause
of split, so its arguments are in scope.
startFilt is the initial line-oriented filter to apply to text.
If isCode is true, then we are processing annoted source code,
and text starts off as code. Otherwise, we are processing markup for a
supporting web page, and text starts off as documentation. One can still
include code fragments, by surrounding them with inside-out delimiters.
startFilt = if isCode then inCode else inDoc
startDoc, endDoc, startExt, endExt are predicates matching
the corresponding delimiter lines.
startDoc, endDoc, startExt, endExt :: String → Bool
startDoc x = isMatch x $ mkRegex start
endDoc x = isMatch x $ mkRegex end
endExt x = isMatch x $ mkRegex xend
startExt x = if null xstart
then False
else isMatch x $ mkRegex xstart
isBlank is a predicate matching blank lines.
isBlank :: String → Bool
isBlank x = all (`elem` " \t") x
inCode, inDoc, inExt
are line-oriented filters that call one another; they can be thought of
as the states of the finite state machine.
inCode, inDoc, inExt :: [String] → [Split]
inCode [] = []
inCode (x:xt)
| startDoc x = Delim x : inDoc xt
| startExt x = Ext x : inExt xt
| isBlank x = Blank x : inCode xt
| otherwise = Code x : inCode xt
inDoc [] = []
inDoc (x:xt)
| endDoc x = Delim x : inCode xt
| isBlank x = Blank x : inDoc xt
| otherwise = Doc x : inDoc xt
inExt [] = []
inExt (x:xt)
| endExt x = Ext x : inCode xt
| otherwise = Ext x : inExt xt
unsplit is the inverse to split; it transform an array of Split
lines into output text of type IO String.
The IO monad is necessary because of Shell lines that involve
external computations.
unsplit :: [Split] → IO String unsplit xs =
f turns a Split into an IO ShowS; the IO monad is needed
because of Shell lines that involve external computations.
Recall the type
type ShowS = String -> String
found in
Prelude to facilitate constant-time concatenation using function
composition. We use ShowS because a naive implementation of unsplit
causes Annote to spend the majority of its time concatenating.
let f :: Split → IO ShowS
f x = let ln s = s ++ "\n"
io s = return (ln s ++)
in case x of
Code s → io s
Delim s → io s
Doc s → io s
Blank s → io s
Ext s → io s
Shell y → do { s ← y; io s }
g is the IO ShowS analog to string concatenation:
g :: IO ShowS → IO ShowS → IO ShowS
g x y = do { s ← x; t ← y; return (s . t) }
We use f to convert each Split into an IO ShowS, then use
g to concatenate these into a single IO ShowS. We evaluate
s on the empty string to return an IO String.
in do s ← foldr1 g $ map f xs
return (s [])
joinDoc recombines a split list as documentation output,
combining blank lines, leaving out Delim and Ext lines,
and delimiting code using startCode and endCode.
The IO monad is necessary because of Shell lines that involve
external computations.
joinDoc :: [Split] → IO String
joinDoc text = (unsplit . preCode) text
where
Note that the indented functions below are contained within the where
clause of joinDoc, so its arguments are in scope.
startCode, endCode, and blank are strings used
to delimit code, or replace blank lines.
The Split constructors will be stripped by unsplit, so the ambiguity as
to whether blank should be constructed using Doc or Code turns out
not to matter.
startCode, endCode, blank :: Split
startCode = Doc "\n<pre class=\"code\">"
endCode = Doc "</pre>\n"
blank = Doc ""
preCode, preDoc, inCode, inDoc, skipCode, skipDoc
can again be thought of as the states of a finite state machine.
We skip when reading blanks, writing blank lines only as needed.
Note the invariant that Doc and Shell constructors get identical
treatment in each function.
preCode, preDoc, inCode, inDoc, skipCode, skipDoc :: [Split] → [Split]
preCode, preDoc:
We are potentially reading code or documentation,
but we have not yet read a non-blank line.
preCode [] = [blank]
preCode (x:xt) = case x of
Code _ → startCode : x : inCode xt
Delim _ → preDoc xt
Doc _ → x : inDoc xt
Shell _ → x : inDoc xt
_ → preCode xt
preDoc [] = [blank]
preDoc (x:xt) = case x of
Code _ → startCode : x : inCode xt
Delim _ → preCode xt
Doc _ → x : inDoc xt
Shell _ → x : inDoc xt
_ → preDoc xt
inCode, inDoc: We are reading code or documentation.
The most recently read lines were non-blank.
inCode [] = [endCode]
inCode (x:xt) = case x of
Code _ → x : inCode xt
Delim _ → endCode : preDoc xt
Doc _ → endCode : x : inDoc xt
Shell _ → endCode : x : inDoc xt
Blank _ → skipCode xt
Ext _ → inCode xt
inDoc [] = [blank]
inDoc (x:xt) = case x of
Code _ → startCode : x : inCode xt
Delim _ → preCode xt
Doc _ → x : inDoc xt
Shell _ → x : inDoc xt
Blank _ → skipDoc xt
Ext _ → inDoc xt
skipCode, skipDoc: We are reading code or documentation.
We have read a non-blank line; the most recently read lines were blank.
skipCode [] = [endCode]
skipCode (x:xt) = case x of
Code _ → blank : x : inCode xt
Delim _ → endCode : preDoc xt
Doc _ → endCode : x : inDoc xt
Shell _ → endCode : x : inDoc xt
_ → skipCode xt
skipDoc [] = [blank]
skipDoc (x:xt) = case x of
Code _ → startCode : x : inCode xt
Delim _ → preCode xt
Doc _ → blank : x : inDoc xt
Shell _ → blank : x : inDoc xt
_ → skipDoc xt
joinCode recombines a split list as code output,
leaving out documentation.
We avoid unsplit in order to directly return a String.
joinCode :: [Split] → String
joinCode text = (unlines . inCode) text
where
inCode, inDoc :: [Split] → [String]
inCode [] = []
inCode (x:xt) = case x of
Code s → s : inCode xt
Blank s → s : inCode xt
_ → inDoc xt
inDoc [] = []
inDoc (x:xt) = case x of
Code s → s : inCode xt
Delim _ → inCode xt
_ → inDoc xt
joinDebug recombines a split list, tagged with Split constructor
names for debugging purposes.
We avoid unsplit in order to directly return a String.
joinDebug :: [Split] → String
joinDebug text = (unlines . tag) text
where
tag :: [Split] → [String]
tag [] = []
tag (x:xt) = t : tag xt where
t = case x of
Code s → "C " ++ s
Delim s → "= " ++ s
Doc s → "D " ++ s
Blank s → "B " ++ s
Ext s → "E " ++ s
Shell _ → "S"