Lou Burnard Consulting
Up to a point...
Documents might come in...
Good luck!
Regexp (regular expressions) allow you to specify patterns which match strings of characters and manipulate the resulting matches
xyz
: the string xyz[xyz]
any one of the characters x y z[\d+]
one or more digits[\p{Lu}+]
one or more uppercase unicode charactersXpath is a standard syntax for matching parts of an XML tree in terms of its elements and attributes
//p
: any <p> anywhere
//p[anchor]
: any <p> containing an <anchor>//p[@rend]
: any p with a rend attribute//p[starts-with(@rend,'Head')]
any p whose rend attribute has a value starting with HeadRegexp and xpath are both built-in to oXygen
If you have lots of texts in a specific arcane format, it is worth investing time and effort to translate them, using whatever tools are at your disposal.
A multi-stage path is usually easiest, e.g.
Remember: the computer should be doing the boring repetitive work, not you!