parmancer
Parse text into structured data types with parser combinators.
Parmancer has type annotations for parsers and intermediate results. Using a type checker with Parmancer gives immediate feedback about parser result types, and gives type errors when creating invalid combinations of parsers.
Installation
pip install parmancer
Documentation
Introductory example
This example shows a parser which can parse text like "Hello World! 1 + 2 + 3" to extract the name in Hello <name>! and find the sum of the numbers which come after it:
from parmancer import regex, digits, seq, string
# A parser which extracts a name from a greeting using a regular expression
greeting = regex(r"Hello (\w+)! ", group=1)
# A parser which takes integers separated by ` + `,
# converts them to `int`s, and sums them.
adder = digits.map(int).sep_by(string(" + ")).map(sum)
# The `greeting` and `adder` parsers are combined in sequence
parser = seq(greeting, adder)
# The type of `parser` is `Parser[tuple[str, int]]`, meaning it's a parser which
# will return a `tuple[str, int]` when it parses text.
# Now the parser can be applied to the example string, or other strings following the
# same pattern.
result = parser.parse("Hello World! 1 + 2 + 3")
# The result is a tuple containing the `greeting` result followed by the `adder` result
assert result == ("World", 6)
# Parsing different text which matches the same structure:
assert parser.parse("Hello Example! 10 + 11") == ("Example", 21)
Type checkers such as mypy and Pylance's type checker help during development by revealing type information and catching type errors.
Here the in-line types are displayed automatically with VSCode's Python extension and the 'Inlay Hints' setting:
When the type of a parser doesn't match what's expected, such as in the following example, a type error reveals the problem as soon as the code is type checked, without having to run the code.
In this example the Parser.unpack method is being used to unpack the result tuple of type (str, int) into a function which expects arguments of type (str, str) which is a type incompatibility:
Dataclass parsers
A key feature of Parmancer is the ability to create parsers which return dataclass instances using a short syntax where parsers are directly associated with each field of a dataclass.
Each dataclass field has a parser associated with it using the take field descriptor instead of the usual dataclasses.field.
The entire dataclass parser is then combined using the gather function, creating a parser which sequentially applies each field's parser, assigning each result to the dataclass field it is associated with.
from dataclasses import dataclass
from parmancer import regex, string, take, gather
# Example text which a sensor might produce
sample_text = """Device: SensorA
ID: abc001
Readings (3:01 PM)
300.1, 301, 300
Readings (3:02 PM)
302, 1000, 2500
"""
numeric = regex(r"\d+(\.\d+)?").map(float)
any_text = regex(r"[^\n]+")
line_break = string("\n")
# Define parsers for the sensor readings and device information
@dataclass
class Reading:
# Matches text like `Readings (3:01 PM)`
timestamp: str = take(regex(r"Readings \(([^)]+)\)", group=1) << line_break)
# Matches text like `300.1, 301, 300`
values: list[float] = take(numeric.sep_by(string(", ")) << line_break)
@dataclass
class Device:
# Matches text like `Device: SensorA`
name: str = take(string("Device: ") >> any_text << line_break)
# Matches text like `ID: abc001`
id: str = take(string("ID: ") >> any_text << line_break)
# Matches the entire `Reading` dataclass parser 0, 1 or many times
readings: list[Reading] = take(gather(Reading).many())
# Gather the fields of the `Device` dataclass into a single combined parser
# Note the `Device.readings` field parser uses the `Reading` dataclass parser
parser = gather(Device)
# The result of the parser is a nicely structured `Device` dataclass instance,
# ready for use in the rest of the code with minimal boilerplate to get this far
assert parser.parse(sample_text) == Device(
name="SensorA",
id="abc001",
readings=[
Reading(timestamp="3:01 PM", values=[300.1, 301, 300]),
Reading(timestamp="3:02 PM", values=[302, 1000, 2500]),
],
)
Dataclass parsers come with type annotations which make it easy to write them with hints from an IDE.
For example, a dataclass field of type str cannot be associated with a parser of type Parser[int] - the parser has to produce a string (Parser[str]) for it to be compatible, and a type checker can reveal this while writing code in an IDE:
Why use Parmancer?
- Simple construction: Simple parsers can be defined concisely and independently, and then combined with short, understandable combinator functions and methods which replace the usual branching and sequencing boilerplate of parsers written in vanilla Python.
- Modularity, testability, maintainability: Each intermediate parser component is a complete parser in itself, which means it can be understood, tested and modified in isolation from the rest of the parser.
- Regular Python: Some approaches to parsing use a separate grammar definition outside of Python which goes through a compilation or generation step before it can be used in Python, which can lead to black boxes. Parmancer parsers are defined as Python code rather than a separate grammar syntax.
- Combination features: The parser comes with standard parser combinator methods and functions such as: combining parsers in sequence; matching alternative parsers until one matches; making a parser optional; repeatedly matching a parser until it no longer matches; mapping a parsing result through a function, and more.
- Type checking: Parmancer has a lot of type information which makes it easier to use with IDEs and type checkers.
- Debug mode: Built-in debug mode (
parser.parse(text, debug=True)) provides detailed parse tree visualization including failures to help understand and fix parsing issues.
Parmancer is not for creating performant parsers, its speed is similar to other pure Python parsing libraries. Its purpose is to create understandable, testable and maintainable parsers.
Parmancer is in development so its public API is not stable. Please leave feedback and suggestions in the GitHub issue tracker.
Parmancer is based on Parsy (and typed-parsy) which is an excellent parsing library.
Debug mode
When developing parsers, it can be helpful to understand why a parser fails on certain input. Parmancer includes a debug mode that provides detailed information about parser execution when parsing fails.
To enable debug mode, pass debug=True to the parse() method:
from parmancer import string, regex, seq, ParseError
# Create a simple parser that expects a greeting followed by a number
parser = seq(string("Hello "), regex(r"\d+"))
# This will fail - let's see why
try:
parser.parse("Hello world", debug=True)
except ParseError as e:
print(e)
The debug output shows a parse tree indicating which parsers succeeded and which failed:
failed with '\d+'
Furthest parsing position:
Hello world
~~~~~~^
Debug information:
==================
Parse tree:
Parser
└─KeepOne
└─sequence
├─'Hello ' = 'Hello '
└─\d+ X (failed)
This shows that the 'Hello ' parser succeeded, but the \d+ regex parser failed when it encountered "world" instead of digits.
Debug mode is useful during development but has performance overhead, so it should be disabled in production code.
API documentation and examples
The API docs include minimal examples of each parser and combinator.
The GitHub repository has an examples folder containing larger examples which use multiple features.
1r''' 2Parse text into **structured data types** with **parser combinators**. 3 4Parmancer has **type annotations** for parsers and intermediate results. 5Using a type checker with Parmancer gives immediate feedback about parser result types, and gives type errors when creating invalid combinations of parsers. 6 7## Installation 8 9```sh 10pip install parmancer 11``` 12 13## Documentation 14 15https://parmancer.com 16 17## Introductory example 18 19This example shows a parser which can parse text like `"Hello World! 1 + 2 + 3"` to extract the name in `Hello <name>!` and find the sum of the numbers which come after it: 20 21```python 22from parmancer import regex, digits, seq, string 23 24# A parser which extracts a name from a greeting using a regular expression 25greeting = regex(r"Hello (\w+)! ", group=1) 26 27# A parser which takes integers separated by ` + `, 28# converts them to `int`s, and sums them. 29adder = digits.map(int).sep_by(string(" + ")).map(sum) 30 31# The `greeting` and `adder` parsers are combined in sequence 32parser = seq(greeting, adder) 33# The type of `parser` is `Parser[tuple[str, int]]`, meaning it's a parser which 34# will return a `tuple[str, int]` when it parses text. 35 36# Now the parser can be applied to the example string, or other strings following the 37# same pattern. 38result = parser.parse("Hello World! 1 + 2 + 3") 39 40# The result is a tuple containing the `greeting` result followed by the `adder` result 41assert result == ("World", 6) 42 43# Parsing different text which matches the same structure: 44assert parser.parse("Hello Example! 10 + 11") == ("Example", 21) 45``` 46 47Type checkers such as `mypy` and `Pylance`'s type checker help during development by revealing type information and catching type errors. 48 49Here the in-line types are displayed automatically with VSCode's Python extension and the 'Inlay Hints' setting: 50 51 52 53When the type of a parser doesn't match what's expected, such as in the following example, a type error reveals the problem as soon as the code is type checked, without having to run the code. 54In this example the `Parser.unpack` method is being used to unpack the result tuple of type `(str, int)` into a function which expects arguments of type `(str, str)` which is a type incompatibility: 55 56 57 58## Dataclass parsers 59 60A key feature of Parmancer is the ability to create parsers which return dataclass instances using a short syntax where parsers are directly associated with each field of a dataclass. 61 62Each dataclass field has a parser associated with it using the `take` field descriptor instead of the usual `dataclasses.field`. 63 64The entire dataclass parser is then **combined** using the `gather` function, creating a parser which sequentially applies each field's parser, assigning each result to the dataclass field it is associated with. 65 66```python 67from dataclasses import dataclass 68from parmancer import regex, string, take, gather 69 70# Example text which a sensor might produce 71sample_text = """Device: SensorA 72ID: abc001 73Readings (3:01 PM) 74300.1, 301, 300 75Readings (3:02 PM) 76302, 1000, 2500 77""" 78 79numeric = regex(r"\d+(\.\d+)?").map(float) 80any_text = regex(r"[^\n]+") 81line_break = string("\n") 82 83 84# Define parsers for the sensor readings and device information 85@dataclass 86class Reading: 87 # Matches text like `Readings (3:01 PM)` 88 timestamp: str = take(regex(r"Readings \(([^)]+)\)", group=1) << line_break) 89 # Matches text like `300.1, 301, 300` 90 values: list[float] = take(numeric.sep_by(string(", ")) << line_break) 91 92 93@dataclass 94class Device: 95 # Matches text like `Device: SensorA` 96 name: str = take(string("Device: ") >> any_text << line_break) 97 # Matches text like `ID: abc001` 98 id: str = take(string("ID: ") >> any_text << line_break) 99 # Matches the entire `Reading` dataclass parser 0, 1 or many times 100 readings: list[Reading] = take(gather(Reading).many()) 101 102 103# Gather the fields of the `Device` dataclass into a single combined parser 104# Note the `Device.readings` field parser uses the `Reading` dataclass parser 105parser = gather(Device) 106 107# The result of the parser is a nicely structured `Device` dataclass instance, 108# ready for use in the rest of the code with minimal boilerplate to get this far 109assert parser.parse(sample_text) == Device( 110 name="SensorA", 111 id="abc001", 112 readings=[ 113 Reading(timestamp="3:01 PM", values=[300.1, 301, 300]), 114 Reading(timestamp="3:02 PM", values=[302, 1000, 2500]), 115 ], 116) 117``` 118 119Dataclass parsers come with type annotations which make it easy to write them with hints from an IDE. 120For example, a dataclass field of type `str` cannot be associated with a parser of type `Parser[int]` - the parser has to produce a string (`Parser[str]`) for it to be compatible, and a type checker can reveal this while writing code in an IDE: 121 122 123 124## Why use Parmancer? 125 126- **Simple construction**: Simple parsers can be defined concisely and independently, and then combined with short, understandable **combinator** functions and methods which replace the usual branching and sequencing boilerplate of parsers written in vanilla Python. 127- **Modularity, testability, maintainability**: Each intermediate parser component is a complete parser in itself, which means it can be understood, tested and modified in isolation from the rest of the parser. 128- **Regular Python**: Some approaches to parsing use a separate grammar definition outside of Python which goes through a compilation or generation step before it can be used in Python, which can lead to black boxes. Parmancer parsers are defined as Python code rather than a separate grammar syntax. 129- **Combination features**: The parser comes with standard parser combinator methods and functions such as: combining parsers in sequence; matching alternative parsers until one matches; making a parser optional; repeatedly matching a parser until it no longer matches; mapping a parsing result through a function, and more. 130- **Type checking**: Parmancer has a lot of type information which makes it easier to use with IDEs and type checkers. 131- **Debug mode**: Built-in debug mode (`parser.parse(text, debug=True)`) provides detailed parse tree visualization including failures to help understand and fix parsing issues. 132 133Parmancer is not for creating performant parsers, its speed is similar to other pure Python parsing libraries. 134Its purpose is to create understandable, testable and maintainable parsers. 135 136Parmancer is in development so its public API is not stable. 137Please leave feedback and suggestions in the GitHub issue tracker. 138 139Parmancer is based on [Parsy](https://parsy.readthedocs.io/en/latest/overview.html) (and [typed-parsy](https://github.com/python-parsy/typed-parsy)) which is an excellent parsing library. 140 141## Debug mode 142 143When developing parsers, it can be helpful to understand why a parser fails on certain input. Parmancer includes a debug mode that provides detailed information about parser execution when parsing fails. 144 145To enable debug mode, pass `debug=True` to the `parse()` method: 146 147```python 148from parmancer import string, regex, seq, ParseError 149 150# Create a simple parser that expects a greeting followed by a number 151parser = seq(string("Hello "), regex(r"\d+")) 152 153# This will fail - let's see why 154try: 155 parser.parse("Hello world", debug=True) 156except ParseError as e: 157 print(e) 158``` 159 160The debug output shows a parse tree indicating which parsers succeeded and which failed: 161 162``` 163failed with '\d+' 164Furthest parsing position: 165Hello world 166~~~~~~^ 167 168Debug information: 169================== 170Parse tree: 171Parser 172└─KeepOne 173 └─sequence 174 ├─'Hello ' = 'Hello ' 175 └─\d+ X (failed) 176``` 177 178This shows that the `'Hello '` parser succeeded, but the `\d+` regex parser failed when it encountered `"world"` instead of digits. 179 180Debug mode is useful during development but has performance overhead, so it should be disabled in production code. 181 182## API documentation and examples 183 184The API docs include minimal examples of each parser and combinator. 185 186The [GitHub repository](https://github.com/parmancer/parmancer) has an `examples` folder containing larger examples which use multiple features. 187''' 188 189from parmancer.parser import ( 190 FailureInfo, 191 ParseError, 192 Parser, 193 Result, 194 TextState, 195 any_char, 196 char_from, 197 end_of_text, 198 forward_parser, 199 from_enum, 200 gather, 201 gather_perm, 202 look_ahead, 203 one_of, 204 regex, 205 seq, 206 span, 207 stateful_parser, 208 string, 209 string_from, 210 success, 211 take, 212) 213from parmancer.debug import DebugTextState 214 215__all__ = [ 216 "string", 217 "regex", 218 "whitespace", 219 "padding", 220 "digit", 221 "digits", 222 "letter", 223 "string_from", 224 "char_from", 225 "span", 226 "any_char", 227 "end_of_text", 228 "from_enum", 229 "seq", 230 "one_of", 231 "success", 232 "look_ahead", 233 "take", 234 "gather", 235 "gather_perm", 236 "stateful_parser", 237 "forward_parser", 238 "Parser", 239 "Result", 240 "ParseError", 241 "FailureInfo", 242 "TextState", 243 "DebugTextState", 244] 245 246 247whitespace: Parser[str] = regex(r"\s+") 248r"""1 or more spaces: `regex(r"\s+")`""" 249 250padding: Parser[str] = regex(r"\s*") 251r"""0 or more spaces: `regex(r"\s*")`""" 252 253letter: Parser[str] = any_char.gate(lambda c: c.isalpha()).with_name("Letter") 254r"""A character ``c`` for which ``c.isalpha()`` is true.""" 255 256digit: Parser[str] = regex(r"[0-9]").with_name("Digit") 257"""A numeric digit.""" 258 259digits: Parser[str] = regex(r"[0-9]+").with_name("Digits") 260"""Any number of numeric digits in a row."""
927def string(string: str) -> Parser[str]: 928 """A parser which matches the value of ``string`` exactly. 929 930 For example: 931 932 ```python 933 from parmancer import string 934 935 assert string("ab").many().parse("abab") == ["ab", "ab"] 936 ``` 937 """ 938 return String(string)
A parser which matches the value of string exactly.
For example:
from parmancer import string
assert string("ab").many().parse("abab") == ["ab", "ab"]
1052def regex( 1053 pattern: PatternType, 1054 *, 1055 flags: re.RegexFlag = re.RegexFlag(0), 1056 group: str 1057 | int 1058 | Tuple[str | int] 1059 | Tuple[str | int, str | int] 1060 | Tuple[str | int, str | int, str | int] 1061 | Tuple[str | int, str | int, str | int, str | int] 1062 | Tuple[str | int, str | int, str | int, str | int, str | int] 1063 | Tuple[str | int, ...] = 0, 1064) -> Parser[str | Tuple[str, ...]]: 1065 r"""Match a regex ``pattern``. 1066 1067 The optional ``group`` specifies which regex group(s) to keep as the parser result 1068 using the `re.match` syntax. 1069 The default it is `0`, meaning the entire string matched by the regex is used as the 1070 result. 1071 1072 Numbered and named capture groups are supported. 1073 1074 When ``group`` contains a single value: ``int``; ``str``; ``tuple[int]``; 1075 ``tuple[str]``; then the result is a string: ``Parser[str]``. 1076 1077 When ``group`` contains a tuple of 2 or more elements, the result is a tuple of 1078 those strings, for example a ``group`` of `(1, 2, 3)` produces 1079 a ``Parser[tuple[str, str, str]]``: the result is a tuple of 3 strings. 1080 1081 Some examples: 1082 1083 ```python 1084 from parmancer import regex 1085 1086 assert regex(r".").parse(">") == ">" 1087 assert regex(r".(a)", group=1).parse("1a") == "a" 1088 assert regex(r".(?P<name>a)", group="name").parse("1a") == "a" 1089 assert regex( 1090 r"(?P<hours>\d\d):(?P<minutes>\d\d)", group=("hours", "minutes") 1091 ).parse("10:20") == ("10", "20") 1092 ``` 1093 1094 The optional ``flags`` is passed to ``re.compile``. 1095 """ 1096 if isinstance(pattern, str): 1097 exp = re.compile(pattern, flags) 1098 else: 1099 if flags: 1100 # Need to recompile with the specified flags 1101 exp = re.compile(pattern.pattern, flags) 1102 else: 1103 exp = pattern 1104 1105 return Regex(exp, flags, group)
Match a regex pattern.
The optional group specifies which regex group(s) to keep as the parser result
using the re.match syntax.
The default it is 0, meaning the entire string matched by the regex is used as the
result.
Numbered and named capture groups are supported.
When group contains a single value: int; str; tuple[int];
tuple[str]; then the result is a string: Parser[str].
When group contains a tuple of 2 or more elements, the result is a tuple of
those strings, for example a group of (1, 2, 3) produces
a Parser[tuple[str, str, str]]: the result is a tuple of 3 strings.
Some examples:
from parmancer import regex
assert regex(r".").parse(">") == ">"
assert regex(r".(a)", group=1).parse("1a") == "a"
assert regex(r".(?P<name>a)", group="name").parse("1a") == "a"
assert regex(
r"(?P<hours>\d\d):(?P<minutes>\d\d)", group=("hours", "minutes")
).parse("10:20") == ("10", "20")
The optional flags is passed to re.compile.
1 or more spaces: regex(r"\s+")
0 or more spaces: regex(r"\s*")
A numeric digit.
Any number of numeric digits in a row.
A character c for which c.isalpha() is true.
1845def string_from(*strings: str) -> Parser[str]: 1846 """Any string from a given collection of strings. 1847 1848 ```python 1849 from parmancer import string_from 1850 1851 parser = string_from("cat", "dog") 1852 1853 assert parser.parse("cat") == "cat" 1854 ``` 1855 """ 1856 return reduce( 1857 operator.or_, 1858 # Sort longest first, so that overlapping options work correctly 1859 (string(s) for s in sorted(strings, key=len, reverse=True)), 1860 )
Any string from a given collection of strings.
from parmancer import string_from
parser = string_from("cat", "dog")
assert parser.parse("cat") == "cat"
1863def char_from(string: str) -> Parser[str]: 1864 """Any character contained in ``string``. 1865 1866 For example: 1867 1868 ```python 1869 from parmancer import char_from 1870 1871 assert char_from("abc").parse("c") == "c" 1872 assert char_from("abc").match("d").status is False 1873 ``` 1874 """ 1875 return any_char.gate(lambda c: c in string).with_name(f"[{string}]")
Any character contained in string.
For example:
from parmancer import char_from
assert char_from("abc").parse("c") == "c"
assert char_from("abc").match("d").status is False
956def span(length: int) -> Parser[str]: 957 """A parser which matches any string span of length ``length``. 958 959 For example, to match any strings of length 3 and then check that it matches a 960 condition: 961 962 ```python 963 from parmancer import span 964 965 # Match any 3 characters where the first character equals the last character 966 parser = span(3).gate(lambda s: s[0] == s[2]) 967 968 assert parser.parse("aba") == "aba" 969 # A case which doesn't match: 970 assert parser.match("abc").status is False 971 ``` 972 """ 973 return Span(length)
A parser which matches any string span of length length.
For example, to match any strings of length 3 and then check that it matches a condition:
from parmancer import span
# Match any 3 characters where the first character equals the last character
parser = span(3).gate(lambda s: s[0] == s[2])
assert parser.parse("aba") == "aba"
# A case which doesn't match:
assert parser.match("abc").status is False
1800def from_enum(enum: Type[E]) -> Parser[E]: 1801 """Match any value from an enum, producing the enum value as a result. 1802 1803 For example: 1804 1805 ```python 1806 import enum 1807 from parmancer import from_enum 1808 1809 1810 class Pet(enum.Enum): 1811 CAT = "cat" 1812 DOG = "dog" 1813 1814 1815 pet = from_enum(Pet) 1816 assert pet.parse("cat") == Pet.CAT 1817 assert pet.parse("dog") == Pet.DOG 1818 # This case doesn't match: 1819 assert pet.match("foo").status is False 1820 ``` 1821 """ 1822 return EnumMember(enum)
Match any value from an enum, producing the enum value as a result.
For example:
import enum
from parmancer import from_enum
class Pet(enum.Enum):
CAT = "cat"
DOG = "dog"
pet = from_enum(Pet)
assert pet.parse("cat") == Pet.CAT
assert pet.parse("dog") == Pet.DOG
# This case doesn't match:
assert pet.match("foo").status is False
1460def seq(*parsers: Parser[Any]) -> Parser[Tuple[Any, ...]]: 1461 r""" 1462 A sequence of parsers are applied in order, and their results are stored in a tuple. 1463 1464 For example: 1465 1466 ```python 1467 from parmancer import seq, regex 1468 1469 word = regex(r"[a-zA-Z]+") 1470 number = regex(r"\d").map(int) 1471 1472 parser = seq(word, number, word, number, word | number) 1473 1474 assert parser.parse("a1b2a") == ("a", 1, "b", 2, "a") 1475 assert parser.parse("a1b23") == ("a", 1, "b", 2, 3) 1476 ``` 1477 1478 There are multiple related methods for combining parsers where the result is a 1479 tuple: adding another parser result to the end of the tuple; concatenating two 1480 tuple parsers together; unpacking the tuple result as args to a function, etc. 1481 1482 Here is an example which includes more tuple-related methods. Note that type 1483 annotations are available throughout: a type checker can find the tuple type 1484 for each parser, and it can tell that the `unpack` method is correctly unpacking 1485 a `tuple[int, str, bool]` to a function which expects those types for its arguments. 1486 1487 ```python 1488 from parmancer import digit, letter, seq, string 1489 1490 1491 def demo(score: int, letter: str, truth: bool) -> str: 1492 return str(score) if truth else letter 1493 1494 1495 score = digit.map(int) 1496 truth = string("T").result(True) | string("F").result(False) 1497 1498 # This parser's result is a tuple[int, str, bool] 1499 params = seq(score, letter, truth) 1500 assert params.parse("1aT") == (1, "a", True) 1501 1502 # That tuple can be unpacked as arguments for the demo function 1503 parser = params.unpack(demo) 1504 1505 assert parser.parse("1aT") == "1" 1506 assert parser.parse("2bF") == "b" 1507 1508 # Another parser which returns a tuple[int, int, int] 1509 triple_score = seq(score, score, score) 1510 1511 assert triple_score.parse("123") == (1, 2, 3) 1512 assert triple_score.parse("900") == (9, 0, 0) 1513 1514 # These tuple parsers can be concatenated in sequence by adding them 1515 combined = params + triple_score 1516 1517 assert combined.parse("1aT234") == (1, "a", True, 2, 3, 4) 1518 ``` 1519 """ 1520 1521 return Sequence(parsers)
A sequence of parsers are applied in order, and their results are stored in a tuple.
For example:
from parmancer import seq, regex
word = regex(r"[a-zA-Z]+")
number = regex(r"\d").map(int)
parser = seq(word, number, word, number, word | number)
assert parser.parse("a1b2a") == ("a", 1, "b", 2, "a")
assert parser.parse("a1b23") == ("a", 1, "b", 2, 3)
There are multiple related methods for combining parsers where the result is a tuple: adding another parser result to the end of the tuple; concatenating two tuple parsers together; unpacking the tuple result as args to a function, etc.
Here is an example which includes more tuple-related methods. Note that type
annotations are available throughout: a type checker can find the tuple type
for each parser, and it can tell that the unpack method is correctly unpacking
a tuple[int, str, bool] to a function which expects those types for its arguments.
from parmancer import digit, letter, seq, string
def demo(score: int, letter: str, truth: bool) -> str:
return str(score) if truth else letter
score = digit.map(int)
truth = string("T").result(True) | string("F").result(False)
# This parser's result is a tuple[int, str, bool]
params = seq(score, letter, truth)
assert params.parse("1aT") == (1, "a", True)
# That tuple can be unpacked as arguments for the demo function
parser = params.unpack(demo)
assert parser.parse("1aT") == "1"
assert parser.parse("2bF") == "b"
# Another parser which returns a tuple[int, int, int]
triple_score = seq(score, score, score)
assert triple_score.parse("123") == (1, 2, 3)
assert triple_score.parse("900") == (9, 0, 0)
# These tuple parsers can be concatenated in sequence by adding them
combined = params + triple_score
assert combined.parse("1aT234") == (1, "a", True, 2, 3, 4)
1215def one_of(parser: Parser[Any], *parsers: Parser[Any]) -> Parser[Any]: 1216 r"""All parsers are tried, exactly one must succeed. 1217 1218 For example, this can be used to fail on ambiguous inputs by specifying that exactly 1219 one parser must match the input. For date formats, the date string `"01-02-03"` may 1220 be ambiguous in general whereas `"2001-02-03"` may be considered unambiguous: 1221 1222 ```python 1223 from parmancer import one_of, seq, string, regex, ParseError 1224 1225 two_digit = regex(r"\d{2}").map(int) 1226 four_digit = regex(r"\d{4}").map(int) 1227 sep = string("-") 1228 1229 ymd = seq((four_digit | two_digit) << sep, two_digit << sep, two_digit) 1230 dmy = seq(two_digit << sep, two_digit << sep, four_digit | two_digit) 1231 1232 # Exactly one of the formats must match: year-month-day or day-month-year 1233 date = one_of(ymd, dmy) 1234 1235 # This unambiguous input leads to a successful parse 1236 assert date.parse("2001-02-03") == (2001, 2, 3) 1237 1238 # This ambiguous input leads to a failure to parse 1239 assert date.match("01-02-03").status is False 1240 ``` 1241 """ 1242 return OneOf((parser, *parsers))
All parsers are tried, exactly one must succeed.
For example, this can be used to fail on ambiguous inputs by specifying that exactly
one parser must match the input. For date formats, the date string "01-02-03" may
be ambiguous in general whereas "2001-02-03" may be considered unambiguous:
from parmancer import one_of, seq, string, regex, ParseError
two_digit = regex(r"\d{2}").map(int)
four_digit = regex(r"\d{4}").map(int)
sep = string("-")
ymd = seq((four_digit | two_digit) << sep, two_digit << sep, two_digit)
dmy = seq(two_digit << sep, two_digit << sep, four_digit | two_digit)
# Exactly one of the formats must match: year-month-day or day-month-year
date = one_of(ymd, dmy)
# This unambiguous input leads to a successful parse
assert date.parse("2001-02-03") == (2001, 2, 3)
# This ambiguous input leads to a failure to parse
assert date.match("01-02-03").status is False
895def success(success_value: T) -> Parser[T]: 896 """ 897 A parser which always succeeds with a result of ``success_value`` and doesn't modify 898 the input state. 899 """ 900 return Success(success_value)
A parser which always succeeds with a result of success_value and doesn't modify
the input state.
1837def look_ahead(parser: Parser[T]) -> Parser[T]: 1838 """ 1839 Check whether a parser matches the next part of the input without changing the state 1840 of the parser: no input is consumed and no result is kept. 1841 """ 1842 return LookAhead(parser)
Check whether a parser matches the next part of the input without changing the state of the parser: no input is consumed and no result is kept.
1595def take( 1596 parser: Parser[T], 1597 *, 1598 init: bool = True, 1599 repr: bool = True, 1600 hash: bool | None = None, 1601 compare: bool = True, 1602 metadata: Mapping[Any, Any] | None = None, 1603) -> T: 1604 r""" 1605 Assign a parser to a field of a dataclass. 1606 1607 Use this in a dataclass in conjunction with ``gather`` to concisely define parsers 1608 which return dataclass instances. 1609 1610 ```python 1611 from dataclasses import dataclass 1612 1613 from parmancer import gather, regex, take, whitespace 1614 1615 1616 @dataclass 1617 class Person: 1618 # Each field has a parser associated with it. 1619 name: str = take(regex(r"\w+") << whitespace) 1620 age: int = take(regex(r"\d+").map(int)) 1621 1622 1623 # "Gather" the dataclass fields into a combined parser which returns 1624 # an instance of the dataclass 1625 person_parser = gather(Person) 1626 person = person_parser.parse("Bilbo 111") 1627 1628 assert person == Person(name="Bilbo", age=111) 1629 ``` 1630 1631 """ 1632 if metadata is None: 1633 metadata = {} 1634 return cast( 1635 T, 1636 field( 1637 init=init, 1638 repr=repr, 1639 hash=hash, 1640 compare=compare, 1641 metadata={**metadata, "parser": parser}, 1642 ), 1643 )
Assign a parser to a field of a dataclass.
Use this in a dataclass in conjunction with gather to concisely define parsers
which return dataclass instances.
from dataclasses import dataclass
from parmancer import gather, regex, take, whitespace
@dataclass
class Person:
# Each field has a parser associated with it.
name: str = take(regex(r"\w+") << whitespace)
age: int = take(regex(r"\d+").map(int))
# "Gather" the dataclass fields into a combined parser which returns
# an instance of the dataclass
person_parser = gather(Person)
person = person_parser.parse("Bilbo 111")
assert person == Person(name="Bilbo", age=111)
1663def gather( 1664 model: Type[DataclassType], field_order: Optional[Iterable[str]] = None 1665) -> Parser[DataclassType]: 1666 r""" 1667 Gather parsers from the fields of a dataclass into a single combined parser. 1668 Each field parser is applied in sequence, and each value is then assigned to that 1669 field to create an instance of the dataclass. That dataclass is the result of the 1670 combined parser. 1671 1672 ```python 1673 from dataclasses import dataclass 1674 from parmancer import take, string, gather, regex 1675 1676 1677 @dataclass 1678 class Example: 1679 foo: int = take(regex(r"\d+").map(int)) 1680 bar: bool = take(string("T").result(True) | string("F").result(False)) 1681 1682 1683 parser = gather(Example) 1684 assert parser.parse("123T") == Example(foo=123, bar=True) 1685 ``` 1686 """ 1687 field_parsers = get_parsers_from_fields(model) 1688 if field_order is not None: 1689 field_parsers = {name: field_parsers[name] for name in field_order} 1690 return DataclassSequence(model, field_parsers)
Gather parsers from the fields of a dataclass into a single combined parser. Each field parser is applied in sequence, and each value is then assigned to that field to create an instance of the dataclass. That dataclass is the result of the combined parser.
from dataclasses import dataclass
from parmancer import take, string, gather, regex
@dataclass
class Example:
foo: int = take(regex(r"\d+").map(int))
bar: bool = take(string("T").result(True) | string("F").result(False))
parser = gather(Example)
assert parser.parse("123T") == Example(foo=123, bar=True)
1718def gather_perm(model: Type[DataclassType]) -> Parser[DataclassType]: 1719 r""" 1720 Parse all fields of a dataclass parser in any order. 1721 1722 Example: 1723 1724 ```python 1725 from dataclasses import dataclass 1726 from parmancer import take, string, gather_perm, regex 1727 1728 1729 @dataclass 1730 class Example: 1731 foo: int = take(regex(r"\d+").map(int)) 1732 bar: bool = take(string("T").result(True) | string("F").result(False)) 1733 1734 1735 parser = gather_perm(Example) 1736 assert parser.parse("T123") == Example(foo=123, bar=True) 1737 ``` 1738 """ 1739 return DataclassPermutation(model)
Parse all fields of a dataclass parser in any order.
Example:
from dataclasses import dataclass
from parmancer import take, string, gather_perm, regex
@dataclass
class Example:
foo: int = take(regex(r"\d+").map(int))
bar: bool = take(string("T").result(True) | string("F").result(False))
parser = gather_perm(Example)
assert parser.parse("T123") == Example(foo=123, bar=True)
1893def forward_parser(parser_iterator: Callable[[], Iterator[Parser[T]]]) -> Parser[T]: 1894 """Define a parser which refers to another parser which hasn't been defined yet 1895 1896 Wrap a generator which yields the parser to refer to. 1897 This makes recursive parser definitions possible, for example: 1898 1899 ```python 1900 from parmancer import forward_parser, string, Parser 1901 from typing import Iterator 1902 1903 1904 @forward_parser 1905 def _parser() -> Iterator[Parser[str]]: 1906 yield parser 1907 1908 1909 # `parser` refers to itself recursively via `_parser`. 1910 parser = string("a") | string("(") >> _parser << string(")") 1911 1912 assert parser.parse("(a)") == "a" 1913 assert parser.parse("(((a)))") == "a" 1914 ``` 1915 1916 """ 1917 return ForwardParser(parser_iterator=parser_iterator)
Define a parser which refers to another parser which hasn't been defined yet
Wrap a generator which yields the parser to refer to. This makes recursive parser definitions possible, for example:
from parmancer import forward_parser, string, Parser
from typing import Iterator
@forward_parser
def _parser() -> Iterator[Parser[str]]:
yield parser
# `parser` refers to itself recursively via `_parser`.
parser = string("a") | string("(") >> _parser << string(")")
assert parser.parse("(a)") == "a"
assert parser.parse("(((a)))") == "a"
348class Parser(Generic[T_co]): 349 """ 350 Parser base class that defines the core parsing interface. 351 352 The generic type parameter `T_co` represents the type of value that the parser produces 353 when it successfully parses input text. For example: 354 355 - `Parser[str]`: A parser that produces string values 356 - `Parser[int]`: A parser that produces integer values 357 - `Parser[List[str]]`: A parser that produces lists of strings 358 - `Parser[Tuple[str, int]]`: A parser that produces tuples containing a string and an integer 359 360 The `_co` suffix indicates that the type parameter is covariant, which means that if 361 `Child` is a subtype of `Parent`, then `Parser[Child]` is a subtype of `Parser[Parent]`. 362 363 Subclasses can override the `parse_result` method to create a specific parser, see 364 `String` for example. 365 """ 366 367 name: str = "Parser" 368 369 @overload 370 def parse(self: Parser[T_co], text: str, *, debug: Literal[True]) -> T_co: ... 371 372 @overload 373 def parse( 374 self: Parser[T_co], 375 text: str, 376 state_handler: Type[TextState] = TextState, 377 debug: Literal[False] = False, 378 ) -> T_co: ... 379 380 @overload 381 def parse( 382 self: Parser[T_co], 383 text: str, 384 state_handler: Type[TextState] = TextState, 385 debug: bool = False, 386 ) -> T_co: ... 387 388 def parse( 389 self: Parser[T_co], 390 text: str, 391 state_handler: Type[TextState] = TextState, 392 debug: bool = False, 393 ) -> T_co: 394 """ 395 Run the parser on input text, returning the parsed value or raising a 396 `ParseError` on failure. 397 398 `text` - the text to be parsed 399 `state_handler` (optional) - the class to use for handling parser state 400 `debug` (optional) - if True, enables debug mode with detailed error information 401 402 Debug mode provides detailed information about parser execution when parsing fails, 403 including a parse tree that shows successful parsers (marked with "= value") and 404 failed parsers (marked with "X (failed)"). This is useful during development but 405 has performance overhead. 406 """ 407 if debug: 408 # Import here to avoid circular imports 409 from parmancer.debug import DebugTextState 410 411 state: TextState = DebugTextState.start(text) 412 else: 413 state = state_handler.start(text) 414 result = (self << end_of_text).parse_result(state) 415 if not result.status: 416 raise ParseError(result.state.failures, result.state) 417 return result.value 418 419 @overload 420 def match(self, text: str, *, debug: Literal[True]) -> Result[T_co]: ... 421 422 @overload 423 def match( 424 self, 425 text: str, 426 state_handler: Type[TextState] = TextState, 427 debug: Literal[False] = False, 428 ) -> Result[T_co]: ... 429 430 @overload 431 def match( 432 self, text: str, state_handler: Type[TextState] = TextState, debug: bool = False 433 ) -> Result[T_co]: ... 434 435 def match( 436 self, text: str, state_handler: Type[TextState] = TextState, debug: bool = False 437 ) -> Result[T_co]: 438 """ 439 Run the parser on input text, returning the parsed result. 440 441 Unlike `Parser.parse`, this method does not raise an error if parsing fails, it 442 returns a `Result` type wrapping the parser output or the failure state. 443 444 `text` - the text to be parsed 445 `state_handler` (optional) - the class to use for handling parser state 446 `debug` (optional) - if True, enables debug mode with detailed error information 447 448 Debug mode provides the same detailed parser execution information as `Parser.parse`, 449 but accessible through the Result object's state rather than a raised exception. 450 """ 451 if debug: 452 # Import here to avoid circular imports 453 from parmancer.debug import DebugTextState 454 455 state: TextState = DebugTextState.start(text) 456 else: 457 state = state_handler.start(text) 458 return (self << end_of_text).parse_result(state) 459 460 def parse_result(self, state: TextState) -> Result[T_co]: 461 """ 462 Given the input text and the current parsing position (state), parse and return 463 a result (success with the parsed value, or failure with failure info). 464 465 Override this method in subclasses to create a specific parser. 466 """ 467 return NotImplemented # type: ignore[no-any-return] 468 469 @overload 470 def result(self: Parser[Any], value: AnyLiteral) -> Parser[AnyLiteral]: ... 471 472 @overload 473 def result(self: Parser[Any], value: T) -> Parser[T]: ... 474 475 def result(self: Parser[Any], value: T) -> Parser[T]: 476 """Replace the current result with the given ``value``.""" 477 return self >> Success(value) 478 479 def __or__(self: Parser[T1], other: Parser[T2]) -> Parser[T1 | T2]: 480 """Match either self or other, returning the first parser which succeeds.""" 481 if isinstance(self, Choice): 482 self_parsers = self.parsers 483 else: 484 self_parsers = (self,) 485 486 if isinstance(other, Choice): 487 other_parsers = other.parsers 488 else: 489 other_parsers = (other,) 490 491 return Choice((*self_parsers, *other_parsers)) 492 493 def many( 494 self: Parser[T_co], 495 min_count: int = 0, 496 max_count: int | float = float("inf"), 497 ) -> Parser[List[T_co]]: 498 """Repeat the parser until it doesn't match, storing all matches in a list. 499 Optionally set a minimum or maximum number of times to match. 500 501 :param min_count: Match at least this many times 502 :param max_count: Match at most this many times 503 :return: A new parser which will repeatedly apply the previous parser 504 """ 505 return Range(self, min_count=min_count, max_count=max_count) 506 507 def times(self: Parser[T_co], count: int) -> Parser[List[T_co]]: 508 """Repeat the parser a fixed number of times, storing all matches in a list. 509 510 :param count: Number of times to apply the parser 511 :return: A new parser which will repeat the previous parser ``count`` times 512 """ 513 return self.many(min_count=count, max_count=count).with_name(f"times({count})") 514 515 def at_most(self: Parser[T_co], count: int) -> Parser[List[T_co]]: 516 """Repeat the parser at most ``count`` times. 517 518 :param count: Maximum number of repeats 519 :return: A new parser which will repeat the previous parser up to ``count`` times 520 """ 521 return self.many(0, count).with_name(f"at_most({count})") 522 523 def at_least(self: Parser[T_co], count: int) -> Parser[List[T_co]]: 524 """Repeat the parser at least ``count`` times. 525 526 :param count: Minimum number of repeats 527 :return: A new parser which will repeat the previous parser at least ``count`` times 528 """ 529 return self.many(min_count=count, max_count=float("inf")).with_name( 530 f"at_least({count})" 531 ) 532 533 def until( 534 self: Parser[T_co], 535 until_parser: Parser[Any], 536 min_count: int = 0, 537 max_count: int | float = float("inf"), 538 ) -> Parser[List[T_co]]: 539 """Repeatedly apply the parser until the ``until_parser`` matches, optionally 540 setting a minimum or maximum number of times to repeat. 541 542 :param until_parser: Repeats will stop when this parser matches 543 :param min_count: Optional minimum number of repeats required to succeed 544 :param max_count: Optional maximum number of repeats before the ``until_parser`` 545 must succeed 546 :return: A new parser which will repeat the previous parser until ``until_parser`` 547 """ 548 return Until(self, until_parser, min_count, max_count) 549 550 def sep_by( 551 self: Parser[T_co], 552 sep: Parser[Any], 553 *, 554 min_count: int = 0, 555 max_count: int | float = float("inf"), 556 ) -> Parser[List[T_co]]: 557 r""" 558 Alternately apply this parser and the ``sep`` parser, keeping a list of results 559 from this parser. 560 561 For example, to match a comma-separated list of values, keeping only the values 562 and discarding the commas: 563 564 ```python 565 from parmancer import regex, string 566 567 value = regex(r"\d+") 568 sep = string(", ") 569 parser = value.sep_by(sep) 570 assert parser.parse("1, 2, 30") == ["1", "2", "30"] 571 ``` 572 573 :param sep: The parser acting as a separator 574 :param min_count: Optional minimum number of repeats 575 :param max_count: Optional maximum number of repeats 576 :return: A new parser which will apply this parser multiple times, with ``sep`` 577 applied between each repeat. 578 """ 579 return Range( 580 self, separator_parser=sep, min_count=min_count, max_count=max_count 581 ) 582 583 def bind( 584 self: Parser[T1], 585 bind_fn: Callable[[T1], Parser[T2]], 586 ) -> Parser[T2]: 587 """ 588 Bind the result of the current parser to a function which returns another 589 parser. 590 591 :param bind_fn: A function which will take the result of the current parser as 592 input and return another parser which may depend on the result. 593 :return: The bound parser created by ``bind_fn`` 594 """ 595 return Bind(self, bind_fn) 596 597 def map( 598 self: Parser[T1], 599 map_fn: Callable[[T1], T2], 600 map_name: Optional[str] = None, 601 ) -> Parser[T2]: 602 """Convert the current result to a new result by passing its value through 603 ``map_fn`` 604 605 :param map_fn: The current parser result value will be passed through this 606 function, creating a new result. 607 :param map_name: A name to give to the map function 608 :return: A new parser which will convert the previous parser's result to a new 609 value using ``map_fn`` 610 """ 611 if map_name is None: 612 map_name = "map" 613 if hasattr(map_fn, "__name__"): 614 map_name = map_fn.__name__ 615 616 return Map(parser=self, map_callable=map_fn, map_name=map_name) 617 618 def map_failure( 619 self, failure_transform: Callable[[FailureInfo], FailureInfo] 620 ) -> Parser[T_co]: 621 """Transform a failure state using a transform function, used for example to add 622 additional context to a parser failure. 623 624 :param failure_transform: A function which converts a ``FailureInfo`` into 625 another ``FailureInfo`` 626 :return: A parser which will map its failure info using ``failure_transform`` 627 """ 628 return MapFailure(self, failure_transform) 629 630 def unpack( 631 self: Parser[Tuple[Unpack[Ts]]], 632 transform_fn: Callable[[Unpack[Ts]], T2], 633 ) -> Parser[T2]: 634 """When the result is a tuple, it can be unpacked and passed as *args to 635 ``transform_fn``, creating a new result containing the function's output. 636 637 :param transform_fn: Function to unpack the current result tuple into as args 638 :return: An updated parser which will unpack its result into ``transform_fn`` 639 to produce a new result 640 """ 641 return self.bind(lambda value: Success(transform_fn(*value))).with_name( 642 "unpack" 643 ) 644 645 def tuple(self: Parser[T]) -> Parser[Tuple[T]]: 646 """Wrap the result in a tuple of length 1.""" 647 return self.map(lambda value: (value,), "Wrap tuple") 648 649 def append( 650 self: Parser[Tuple[Unpack[Ts]]], other: Parser[T2] 651 ) -> Parser[Tuple[Unpack[Ts], T2]]: 652 """ 653 Append the result of another parser to the end of the current parser's result tuple 654 655 ```python 656 from parmancer import string 657 658 initial = string("First").tuple() 659 appended = initial.append(string("Second")) 660 661 assert appended.parse("FirstSecond") == ("First", "Second") 662 ``` 663 """ 664 return self.bind( 665 lambda self_value: other.bind( 666 lambda other_value: Success((*self_value, other_value)) 667 ) 668 ) 669 670 def list(self: Parser[T]) -> Parser[List[T]]: 671 """Wrap the result in a list.""" 672 return self.map(lambda value: [value], map_name="Wrap list") 673 674 # Unpack first arg 675 @overload 676 def __add__( 677 self: Parser[Tuple[Unpack[Ts]]], 678 other: Parser[Tuple[T1]], 679 ) -> Parser[Tuple[Unpack[Ts], T1]]: ... 680 681 @overload 682 def __add__( 683 self: Parser[Tuple[Unpack[Ts]]], 684 other: Parser[Tuple[T1, T2]], 685 ) -> Parser[Tuple[Unpack[Ts], T1, T2]]: ... 686 687 @overload 688 def __add__( 689 self: Parser[Tuple[Unpack[Ts]]], 690 other: Parser[Tuple[T1, T2, T3]], 691 ) -> Parser[Tuple[Unpack[Ts], T1, T2, T3]]: ... 692 693 @overload 694 def __add__( 695 self: Parser[Tuple[Unpack[Ts]]], 696 other: Parser[Tuple[T1, T2, T3, T4]], 697 ) -> Parser[ 698 Tuple[ 699 Unpack[Ts], 700 T1, 701 T2, 702 T3, 703 T4, 704 ] 705 ]: ... 706 707 @overload 708 def __add__( 709 self: Parser[Tuple[Unpack[Ts]]], 710 other: Parser[Tuple[T1, T2, T3, T4, T5]], 711 ) -> Parser[ 712 Tuple[ 713 Unpack[Ts], 714 T1, 715 T2, 716 T3, 717 T4, 718 T5, 719 ] 720 ]: ... 721 722 # Cover the rest of cases which can't return a homogeneous tuple 723 @overload 724 def __add__( 725 self: Parser[Tuple[T1, ...]], other: Parser[Tuple[T2, ...]] 726 ) -> Parser[Tuple[T1 | T2, ...]]: ... 727 728 @overload 729 def __add__( 730 self: Parser[Tuple[Any, ...]], other: Parser[Tuple[Any, ...]] 731 ) -> Parser[Tuple[Any, ...]]: ... 732 733 # Literal strings are not caught by the other cases 734 @overload 735 def __add__(self: Parser[LiteralString], other: Parser[str]) -> Parser[str]: ... 736 737 # Mypy calls this unreachable; pyright calls it reachable 738 @overload 739 def __add__( # type: ignore[overload-cannot-match] 740 self: Parser[str], other: Parser[LiteralString] 741 ) -> Parser[str]: ... 742 743 # SupportsAdd compatible 744 @overload 745 def __add__( 746 self: Parser[SupportsAdd[Addable, AddResult]], other: Parser[Addable] 747 ) -> Parser[AddResult]: ... 748 749 @overload 750 def __add__( 751 self: Parser[Addable], other: Parser[SupportsRAdd[Addable, AddResult]] 752 ) -> Parser[AddResult]: ... 753 754 def __add__(self: Parser[Any], other: Parser[Any]) -> Parser[Any]: 755 """Run this parser followed by ``other``, and add the result values together.""" 756 if isinstance(self, Sequence) and isinstance(other, Sequence): 757 # Merge two sequences into one 758 return Sequence((*self.parsers, *other.parsers)) 759 760 return seq(self, other).map(lambda x: x[0] + x[1], "Add") 761 762 def concat( 763 self: Parser[Iterable[SupportsAdd[T, T1]]], 764 ) -> Parser[T1]: 765 """ 766 Add all the elements of an iterable result together. 767 768 For an iterable of strings, this concatenates the strings: 769 770 ```python 771 from parmancer import digits, string 772 773 delimited = digits.sep_by(string("-")) 774 775 assert delimited.parse("0800-12-3") == ["0800", "12", "3"] 776 777 assert delimited.concat().parse("0800-12-3") == "0800123" 778 ``` 779 """ 780 781 return self.map(partial(reduce, operator.add), "Concat") 782 783 # >> 784 def __rshift__(self, other: Parser[T]) -> Parser[T]: 785 """Run this parser followed by ``other``, keeping only ``other``'s result.""" 786 return KeepOne(left=(self,), keep=other) 787 788 def keep_right(self, other: Parser[T]) -> Parser[T]: 789 """ 790 This parser is run, followed by the other parser, but only the result of the 791 other parser is kept. 792 793 Another way to use this is with the `>>` operator: 794 795 ```python 796 from parmancer import string 797 798 parser = string("a") >> string("b") 799 # The "a" is matched but not kept as part of the result 800 assert parser.parse("ab") == "b" 801 ``` 802 """ 803 return KeepOne(left=(self,), keep=other) 804 805 # << 806 def __lshift__(self: Parser[T], other: Parser[Any]) -> Parser[T]: 807 """Run this parser followed by ``other``, keeping only this parser's result.""" 808 return KeepOne(keep=self, right=(other,)) 809 810 def keep_left(self: Parser[T], other: Parser[Any]) -> Parser[T]: 811 """ 812 This parser is run, followed by the other parser, but only the result of this 813 parser is kept. 814 815 Another way to use this is with the `<<` operator: 816 817 ```python 818 from parmancer import string 819 820 parser = string("a") << string("b") 821 # The "b" is matched but not kept as part of the result 822 assert parser.parse("ab") == "a" 823 ``` 824 """ 825 return KeepOne(keep=self, right=(other,)) 826 827 def gate(self: Parser[T], gate_function: Callable[[T], bool]) -> Parser[T]: 828 """ 829 Fail the parser if ``gate_function`` returns False when called on the result, 830 otherwise succeed without changing the result. 831 """ 832 return Gate(self, gate_function) 833 834 @overload 835 def optional( 836 self: Parser[T1], default: Literal[None] = None 837 ) -> Parser[T1 | None]: ... 838 839 @overload 840 def optional(self: Parser[T1], default: AnyLiteral) -> Parser[T1 | AnyLiteral]: ... 841 842 @overload 843 def optional(self: Parser[T1], default: T2) -> Parser[T1 | T2]: ... 844 845 def optional( 846 self: Parser[T1], default: Optional[T2] = None 847 ) -> Parser[T1 | Optional[T2]]: 848 """ 849 Make the previous parser optional by returning a result with a value of 850 ``default`` if the parser failed. 851 """ 852 return Choice((self, success(default))) 853 854 def with_name(self, name: str) -> Parser[T_co]: 855 """Set the name of the parser.""" 856 return NamedParser(name=name, parser=self) 857 858 def breakpoint(self) -> Parser[T_co]: 859 """Insert a breakpoint before the current parser runs, for debugging.""" 860 861 @stateful_parser 862 def parser(state: TextState) -> Result[T_co]: 863 breakpoint() 864 result = self.parse_result(state) 865 return result 866 867 return parser
Parser base class that defines the core parsing interface.
The generic type parameter T_co represents the type of value that the parser produces
when it successfully parses input text. For example:
Parser[str]: A parser that produces string valuesParser[int]: A parser that produces integer valuesParser[List[str]]: A parser that produces lists of stringsParser[Tuple[str, int]]: A parser that produces tuples containing a string and an integer
The _co suffix indicates that the type parameter is covariant, which means that if
Child is a subtype of Parent, then Parser[Child] is a subtype of Parser[Parent].
Subclasses can override the parse_result method to create a specific parser, see
String for example.
388 def parse( 389 self: Parser[T_co], 390 text: str, 391 state_handler: Type[TextState] = TextState, 392 debug: bool = False, 393 ) -> T_co: 394 """ 395 Run the parser on input text, returning the parsed value or raising a 396 `ParseError` on failure. 397 398 `text` - the text to be parsed 399 `state_handler` (optional) - the class to use for handling parser state 400 `debug` (optional) - if True, enables debug mode with detailed error information 401 402 Debug mode provides detailed information about parser execution when parsing fails, 403 including a parse tree that shows successful parsers (marked with "= value") and 404 failed parsers (marked with "X (failed)"). This is useful during development but 405 has performance overhead. 406 """ 407 if debug: 408 # Import here to avoid circular imports 409 from parmancer.debug import DebugTextState 410 411 state: TextState = DebugTextState.start(text) 412 else: 413 state = state_handler.start(text) 414 result = (self << end_of_text).parse_result(state) 415 if not result.status: 416 raise ParseError(result.state.failures, result.state) 417 return result.value
Run the parser on input text, returning the parsed value or raising a
ParseError on failure.
text - the text to be parsed
state_handler (optional) - the class to use for handling parser state
debug (optional) - if True, enables debug mode with detailed error information
Debug mode provides detailed information about parser execution when parsing fails, including a parse tree that shows successful parsers (marked with "= value") and failed parsers (marked with "X (failed)"). This is useful during development but has performance overhead.
435 def match( 436 self, text: str, state_handler: Type[TextState] = TextState, debug: bool = False 437 ) -> Result[T_co]: 438 """ 439 Run the parser on input text, returning the parsed result. 440 441 Unlike `Parser.parse`, this method does not raise an error if parsing fails, it 442 returns a `Result` type wrapping the parser output or the failure state. 443 444 `text` - the text to be parsed 445 `state_handler` (optional) - the class to use for handling parser state 446 `debug` (optional) - if True, enables debug mode with detailed error information 447 448 Debug mode provides the same detailed parser execution information as `Parser.parse`, 449 but accessible through the Result object's state rather than a raised exception. 450 """ 451 if debug: 452 # Import here to avoid circular imports 453 from parmancer.debug import DebugTextState 454 455 state: TextState = DebugTextState.start(text) 456 else: 457 state = state_handler.start(text) 458 return (self << end_of_text).parse_result(state)
Run the parser on input text, returning the parsed result.
Unlike Parser.parse, this method does not raise an error if parsing fails, it
returns a Result type wrapping the parser output or the failure state.
text - the text to be parsed
state_handler (optional) - the class to use for handling parser state
debug (optional) - if True, enables debug mode with detailed error information
Debug mode provides the same detailed parser execution information as Parser.parse,
but accessible through the Result object's state rather than a raised exception.
460 def parse_result(self, state: TextState) -> Result[T_co]: 461 """ 462 Given the input text and the current parsing position (state), parse and return 463 a result (success with the parsed value, or failure with failure info). 464 465 Override this method in subclasses to create a specific parser. 466 """ 467 return NotImplemented # type: ignore[no-any-return]
Given the input text and the current parsing position (state), parse and return a result (success with the parsed value, or failure with failure info).
Override this method in subclasses to create a specific parser.
475 def result(self: Parser[Any], value: T) -> Parser[T]: 476 """Replace the current result with the given ``value``.""" 477 return self >> Success(value)
Replace the current result with the given value.
493 def many( 494 self: Parser[T_co], 495 min_count: int = 0, 496 max_count: int | float = float("inf"), 497 ) -> Parser[List[T_co]]: 498 """Repeat the parser until it doesn't match, storing all matches in a list. 499 Optionally set a minimum or maximum number of times to match. 500 501 :param min_count: Match at least this many times 502 :param max_count: Match at most this many times 503 :return: A new parser which will repeatedly apply the previous parser 504 """ 505 return Range(self, min_count=min_count, max_count=max_count)
Repeat the parser until it doesn't match, storing all matches in a list. Optionally set a minimum or maximum number of times to match.
Parameters
- min_count: Match at least this many times
- max_count: Match at most this many times
Returns
A new parser which will repeatedly apply the previous parser
507 def times(self: Parser[T_co], count: int) -> Parser[List[T_co]]: 508 """Repeat the parser a fixed number of times, storing all matches in a list. 509 510 :param count: Number of times to apply the parser 511 :return: A new parser which will repeat the previous parser ``count`` times 512 """ 513 return self.many(min_count=count, max_count=count).with_name(f"times({count})")
Repeat the parser a fixed number of times, storing all matches in a list.
Parameters
- count: Number of times to apply the parser
Returns
A new parser which will repeat the previous parser
counttimes
515 def at_most(self: Parser[T_co], count: int) -> Parser[List[T_co]]: 516 """Repeat the parser at most ``count`` times. 517 518 :param count: Maximum number of repeats 519 :return: A new parser which will repeat the previous parser up to ``count`` times 520 """ 521 return self.many(0, count).with_name(f"at_most({count})")
Repeat the parser at most count times.
Parameters
- count: Maximum number of repeats
Returns
A new parser which will repeat the previous parser up to
counttimes
523 def at_least(self: Parser[T_co], count: int) -> Parser[List[T_co]]: 524 """Repeat the parser at least ``count`` times. 525 526 :param count: Minimum number of repeats 527 :return: A new parser which will repeat the previous parser at least ``count`` times 528 """ 529 return self.many(min_count=count, max_count=float("inf")).with_name( 530 f"at_least({count})" 531 )
Repeat the parser at least count times.
Parameters
- count: Minimum number of repeats
Returns
A new parser which will repeat the previous parser at least
counttimes
533 def until( 534 self: Parser[T_co], 535 until_parser: Parser[Any], 536 min_count: int = 0, 537 max_count: int | float = float("inf"), 538 ) -> Parser[List[T_co]]: 539 """Repeatedly apply the parser until the ``until_parser`` matches, optionally 540 setting a minimum or maximum number of times to repeat. 541 542 :param until_parser: Repeats will stop when this parser matches 543 :param min_count: Optional minimum number of repeats required to succeed 544 :param max_count: Optional maximum number of repeats before the ``until_parser`` 545 must succeed 546 :return: A new parser which will repeat the previous parser until ``until_parser`` 547 """ 548 return Until(self, until_parser, min_count, max_count)
Repeatedly apply the parser until the until_parser matches, optionally
setting a minimum or maximum number of times to repeat.
Parameters
- until_parser: Repeats will stop when this parser matches
- min_count: Optional minimum number of repeats required to succeed
- max_count: Optional maximum number of repeats before the
until_parsermust succeed
Returns
A new parser which will repeat the previous parser until
until_parser
550 def sep_by( 551 self: Parser[T_co], 552 sep: Parser[Any], 553 *, 554 min_count: int = 0, 555 max_count: int | float = float("inf"), 556 ) -> Parser[List[T_co]]: 557 r""" 558 Alternately apply this parser and the ``sep`` parser, keeping a list of results 559 from this parser. 560 561 For example, to match a comma-separated list of values, keeping only the values 562 and discarding the commas: 563 564 ```python 565 from parmancer import regex, string 566 567 value = regex(r"\d+") 568 sep = string(", ") 569 parser = value.sep_by(sep) 570 assert parser.parse("1, 2, 30") == ["1", "2", "30"] 571 ``` 572 573 :param sep: The parser acting as a separator 574 :param min_count: Optional minimum number of repeats 575 :param max_count: Optional maximum number of repeats 576 :return: A new parser which will apply this parser multiple times, with ``sep`` 577 applied between each repeat. 578 """ 579 return Range( 580 self, separator_parser=sep, min_count=min_count, max_count=max_count 581 )
Alternately apply this parser and the sep parser, keeping a list of results
from this parser.
For example, to match a comma-separated list of values, keeping only the values and discarding the commas:
from parmancer import regex, string
value = regex(r"\d+")
sep = string(", ")
parser = value.sep_by(sep)
assert parser.parse("1, 2, 30") == ["1", "2", "30"]
Parameters
- sep: The parser acting as a separator
- min_count: Optional minimum number of repeats
- max_count: Optional maximum number of repeats
Returns
A new parser which will apply this parser multiple times, with
sepapplied between each repeat.
583 def bind( 584 self: Parser[T1], 585 bind_fn: Callable[[T1], Parser[T2]], 586 ) -> Parser[T2]: 587 """ 588 Bind the result of the current parser to a function which returns another 589 parser. 590 591 :param bind_fn: A function which will take the result of the current parser as 592 input and return another parser which may depend on the result. 593 :return: The bound parser created by ``bind_fn`` 594 """ 595 return Bind(self, bind_fn)
Bind the result of the current parser to a function which returns another parser.
Parameters
- bind_fn: A function which will take the result of the current parser as input and return another parser which may depend on the result.
Returns
The bound parser created by
bind_fn
597 def map( 598 self: Parser[T1], 599 map_fn: Callable[[T1], T2], 600 map_name: Optional[str] = None, 601 ) -> Parser[T2]: 602 """Convert the current result to a new result by passing its value through 603 ``map_fn`` 604 605 :param map_fn: The current parser result value will be passed through this 606 function, creating a new result. 607 :param map_name: A name to give to the map function 608 :return: A new parser which will convert the previous parser's result to a new 609 value using ``map_fn`` 610 """ 611 if map_name is None: 612 map_name = "map" 613 if hasattr(map_fn, "__name__"): 614 map_name = map_fn.__name__ 615 616 return Map(parser=self, map_callable=map_fn, map_name=map_name)
Convert the current result to a new result by passing its value through
map_fn
Parameters
- map_fn: The current parser result value will be passed through this function, creating a new result.
- map_name: A name to give to the map function
Returns
A new parser which will convert the previous parser's result to a new value using
map_fn
618 def map_failure( 619 self, failure_transform: Callable[[FailureInfo], FailureInfo] 620 ) -> Parser[T_co]: 621 """Transform a failure state using a transform function, used for example to add 622 additional context to a parser failure. 623 624 :param failure_transform: A function which converts a ``FailureInfo`` into 625 another ``FailureInfo`` 626 :return: A parser which will map its failure info using ``failure_transform`` 627 """ 628 return MapFailure(self, failure_transform)
Transform a failure state using a transform function, used for example to add additional context to a parser failure.
Parameters
- failure_transform: A function which converts a
FailureInfointo anotherFailureInfo
Returns
A parser which will map its failure info using
failure_transform
630 def unpack( 631 self: Parser[Tuple[Unpack[Ts]]], 632 transform_fn: Callable[[Unpack[Ts]], T2], 633 ) -> Parser[T2]: 634 """When the result is a tuple, it can be unpacked and passed as *args to 635 ``transform_fn``, creating a new result containing the function's output. 636 637 :param transform_fn: Function to unpack the current result tuple into as args 638 :return: An updated parser which will unpack its result into ``transform_fn`` 639 to produce a new result 640 """ 641 return self.bind(lambda value: Success(transform_fn(*value))).with_name( 642 "unpack" 643 )
When the result is a tuple, it can be unpacked and passed as *args to
transform_fn, creating a new result containing the function's output.
Parameters
- transform_fn: Function to unpack the current result tuple into as args
Returns
An updated parser which will unpack its result into
transform_fnto produce a new result
645 def tuple(self: Parser[T]) -> Parser[Tuple[T]]: 646 """Wrap the result in a tuple of length 1.""" 647 return self.map(lambda value: (value,), "Wrap tuple")
Wrap the result in a tuple of length 1.
649 def append( 650 self: Parser[Tuple[Unpack[Ts]]], other: Parser[T2] 651 ) -> Parser[Tuple[Unpack[Ts], T2]]: 652 """ 653 Append the result of another parser to the end of the current parser's result tuple 654 655 ```python 656 from parmancer import string 657 658 initial = string("First").tuple() 659 appended = initial.append(string("Second")) 660 661 assert appended.parse("FirstSecond") == ("First", "Second") 662 ``` 663 """ 664 return self.bind( 665 lambda self_value: other.bind( 666 lambda other_value: Success((*self_value, other_value)) 667 ) 668 )
Append the result of another parser to the end of the current parser's result tuple
from parmancer import string
initial = string("First").tuple()
appended = initial.append(string("Second"))
assert appended.parse("FirstSecond") == ("First", "Second")
670 def list(self: Parser[T]) -> Parser[List[T]]: 671 """Wrap the result in a list.""" 672 return self.map(lambda value: [value], map_name="Wrap list")
Wrap the result in a list.
762 def concat( 763 self: Parser[Iterable[SupportsAdd[T, T1]]], 764 ) -> Parser[T1]: 765 """ 766 Add all the elements of an iterable result together. 767 768 For an iterable of strings, this concatenates the strings: 769 770 ```python 771 from parmancer import digits, string 772 773 delimited = digits.sep_by(string("-")) 774 775 assert delimited.parse("0800-12-3") == ["0800", "12", "3"] 776 777 assert delimited.concat().parse("0800-12-3") == "0800123" 778 ``` 779 """ 780 781 return self.map(partial(reduce, operator.add), "Concat")
Add all the elements of an iterable result together.
For an iterable of strings, this concatenates the strings:
from parmancer import digits, string
delimited = digits.sep_by(string("-"))
assert delimited.parse("0800-12-3") == ["0800", "12", "3"]
assert delimited.concat().parse("0800-12-3") == "0800123"
788 def keep_right(self, other: Parser[T]) -> Parser[T]: 789 """ 790 This parser is run, followed by the other parser, but only the result of the 791 other parser is kept. 792 793 Another way to use this is with the `>>` operator: 794 795 ```python 796 from parmancer import string 797 798 parser = string("a") >> string("b") 799 # The "a" is matched but not kept as part of the result 800 assert parser.parse("ab") == "b" 801 ``` 802 """ 803 return KeepOne(left=(self,), keep=other)
This parser is run, followed by the other parser, but only the result of the other parser is kept.
Another way to use this is with the >> operator:
from parmancer import string
parser = string("a") >> string("b")
# The "a" is matched but not kept as part of the result
assert parser.parse("ab") == "b"
810 def keep_left(self: Parser[T], other: Parser[Any]) -> Parser[T]: 811 """ 812 This parser is run, followed by the other parser, but only the result of this 813 parser is kept. 814 815 Another way to use this is with the `<<` operator: 816 817 ```python 818 from parmancer import string 819 820 parser = string("a") << string("b") 821 # The "b" is matched but not kept as part of the result 822 assert parser.parse("ab") == "a" 823 ``` 824 """ 825 return KeepOne(keep=self, right=(other,))
This parser is run, followed by the other parser, but only the result of this parser is kept.
Another way to use this is with the << operator:
from parmancer import string
parser = string("a") << string("b")
# The "b" is matched but not kept as part of the result
assert parser.parse("ab") == "a"
827 def gate(self: Parser[T], gate_function: Callable[[T], bool]) -> Parser[T]: 828 """ 829 Fail the parser if ``gate_function`` returns False when called on the result, 830 otherwise succeed without changing the result. 831 """ 832 return Gate(self, gate_function)
Fail the parser if gate_function returns False when called on the result,
otherwise succeed without changing the result.
845 def optional( 846 self: Parser[T1], default: Optional[T2] = None 847 ) -> Parser[T1 | Optional[T2]]: 848 """ 849 Make the previous parser optional by returning a result with a value of 850 ``default`` if the parser failed. 851 """ 852 return Choice((self, success(default)))
Make the previous parser optional by returning a result with a value of
default if the parser failed.
854 def with_name(self, name: str) -> Parser[T_co]: 855 """Set the name of the parser.""" 856 return NamedParser(name=name, parser=self)
Set the name of the parser.
858 def breakpoint(self) -> Parser[T_co]: 859 """Insert a breakpoint before the current parser runs, for debugging.""" 860 861 @stateful_parser 862 def parser(state: TextState) -> Result[T_co]: 863 breakpoint() 864 result = self.parse_result(state) 865 return result 866 867 return parser
Insert a breakpoint before the current parser runs, for debugging.
284@dataclass(**_slots) 285class Result(Generic[T_co]): 286 """ 287 A result of running a parser, including whether it failed or succeeded, the parsed 288 value if it succeeded, the text state after parsing, and any failure information 289 about the furthest position in the text which has been parsed so far. 290 291 The generic type parameter `T_co` represents the type of the parsed value when the 292 parsing operation succeeds. This type corresponds to the return type of the parser 293 that produced this result. For example: 294 295 - `Result[str]`: Result of a parser that produces string values 296 - `Result[int]`: Result of a parser that produces integer values 297 - `Result[List[T]]`: Result of a parser that produces lists of values of type T 298 299 The `_co` suffix indicates that the type parameter is covariant, which means that if 300 `Child` is a subtype of `Parent`, then `Result[Child]` is a subtype of `Result[Parent]`. 301 """ 302 303 status: bool 304 state: TextState 305 failure_info: FailureInfo 306 value: T_co 307 308 def expect(self: Self) -> Self: 309 """ 310 Raise `ResultAsException` if parsing failed, otherwise return the result. 311 312 This is useful in stateful parsers as a way to exit parsing part way through 313 a function; the `ResultAsException` will then be caught and turned into a 314 failure `Result` by the `StatefulParser`. 315 """ 316 if not self.status: 317 raise ResultAsException(self) 318 return self 319 320 def map_failure( 321 self, failure_transform: Callable[[FailureInfo], FailureInfo] 322 ) -> Result[T_co]: 323 """ 324 If the result is a failure, map the failure to a new failure value by applying 325 `failure_transform`. 326 """ 327 if self.status: 328 return self 329 mapped_info = failure_transform(self.failure_info) 330 failures = self.state.failures 331 if self.failure_info in self.state.failures: 332 # Need to update the failures state 333 failures = tuple( 334 mapped_info if info is self.failure_info else info for info in failures 335 ) 336 return Result( 337 self.status, self.state.replace_failures(failures), mapped_info, self.value 338 )
A result of running a parser, including whether it failed or succeeded, the parsed value if it succeeded, the text state after parsing, and any failure information about the furthest position in the text which has been parsed so far.
The generic type parameter T_co represents the type of the parsed value when the
parsing operation succeeds. This type corresponds to the return type of the parser
that produced this result. For example:
Result[str]: Result of a parser that produces string valuesResult[int]: Result of a parser that produces integer valuesResult[List[T]]: Result of a parser that produces lists of values of type T
The _co suffix indicates that the type parameter is covariant, which means that if
Child is a subtype of Parent, then Result[Child] is a subtype of Result[Parent].
308 def expect(self: Self) -> Self: 309 """ 310 Raise `ResultAsException` if parsing failed, otherwise return the result. 311 312 This is useful in stateful parsers as a way to exit parsing part way through 313 a function; the `ResultAsException` will then be caught and turned into a 314 failure `Result` by the `StatefulParser`. 315 """ 316 if not self.status: 317 raise ResultAsException(self) 318 return self
Raise ResultAsException if parsing failed, otherwise return the result.
This is useful in stateful parsers as a way to exit parsing part way through
a function; the ResultAsException will then be caught and turned into a
failure Result by the StatefulParser.
320 def map_failure( 321 self, failure_transform: Callable[[FailureInfo], FailureInfo] 322 ) -> Result[T_co]: 323 """ 324 If the result is a failure, map the failure to a new failure value by applying 325 `failure_transform`. 326 """ 327 if self.status: 328 return self 329 mapped_info = failure_transform(self.failure_info) 330 failures = self.state.failures 331 if self.failure_info in self.state.failures: 332 # Need to update the failures state 333 failures = tuple( 334 mapped_info if info is self.failure_info else info for info in failures 335 ) 336 return Result( 337 self.status, self.state.replace_failures(failures), mapped_info, self.value 338 )
If the result is a failure, map the failure to a new failure value by applying
failure_transform.
246class ParseError(ValueError): 247 """A parsing error.""" 248 249 def __init__(self, failures: Tuple[FailureInfo, ...], state: TextState) -> None: 250 """Create a parsing error with specific failures for a given parser state.""" 251 self.failures: Tuple[FailureInfo, ...] = failures 252 self.state: TextState = state 253 254 def __str__(self) -> str: 255 """ 256 Error text to display, including information about whichever parser(s) consumed 257 the most text, along with a small window of context showing where parsing 258 failed. If debug mode was used, includes detailed parser state information. 259 """ 260 furthest_state = self.state.at(max(failure.index for failure in self.failures)) 261 messages = sorted(f"'{info.message}'" for info in self.failures) 262 263 # Build the basic error message 264 if len(messages) == 1: 265 basic_error = f"failed with {messages[0]}\nFurthest parsing position:\n{furthest_state.context_display()}" 266 else: 267 basic_error = f"failed with {', '.join(messages)}\nFurthest parsing position:\n{furthest_state.context_display()}" 268 269 # Check if this is a debug state and add debug information 270 try: 271 # Import here to avoid circular imports 272 from parmancer.debug import DebugTextState 273 274 if isinstance(self.state, DebugTextState): 275 debug_info = self.state.get_debug_info() 276 return f"{basic_error}\n\n{debug_info}" 277 except ImportError: 278 # If debug module isn't available, just return basic error 279 pass 280 281 return basic_error
A parsing error.
249 def __init__(self, failures: Tuple[FailureInfo, ...], state: TextState) -> None: 250 """Create a parsing error with specific failures for a given parser state.""" 251 self.failures: Tuple[FailureInfo, ...] = failures 252 self.state: TextState = state
Create a parsing error with specific failures for a given parser state.
Inherited Members
- builtins.BaseException
- with_traceback
- add_note
- args
116@dataclass(frozen=True, eq=True) 117class FailureInfo: 118 """Information about a parsing failure: the text index and a message.""" 119 120 index: int 121 message: str
Information about a parsing failure: the text index and a message.
124@dataclass(frozen=True, **_slots) 125class TextState: 126 """ 127 Parsing state: the input text, the current index of parsing, failures from previous 128 parser branches for error reporting. 129 130 Note that many `TextState` objects are created during parsing and they all contain 131 the original input `text`, but these are all references to the same original string 132 rather than copies. 133 """ 134 135 text: str 136 """The full text being parsed.""" 137 index: int 138 """Index at start of the remaining unparsed text.""" 139 failures: Tuple[FailureInfo, ...] = tuple() 140 """Previously encountered parsing failures, used for reporting parser failures.""" 141 142 @classmethod 143 def start(cls: Type[Self], text: str) -> Self: 144 """Initialize TextState for the given text with the index at the start.""" 145 return cls(text, 0) 146 147 def progress( 148 self: Self, index: int, failures: Tuple[FailureInfo, ...] = tuple() 149 ) -> Self: 150 """ 151 Create a new state from the current state, maintaining any extra information 152 153 Every time a new state is made from an existing state, it should pass through 154 this function to keep any values other than the basic TextState fields. 155 This is similar to making a shallow copy but doesn't require mutation after 156 the copy is made. 157 """ 158 return type(self)( 159 self.text, 160 index, 161 failures, 162 **{ 163 field.name: getattr(self, field.name) 164 for field in fields(self) 165 if field.name not in ("text", "index", "failures") 166 }, 167 ) 168 169 def at(self: Self, index: int) -> Self: 170 """Move `index` to the given value, returning a new state.""" 171 return self.progress(index, self.failures) 172 173 def apply( 174 self: Self, parser: Parser[T_co], raise_failure: bool = True 175 ) -> Result[T_co]: 176 """ 177 Apply a parser to the current state, returning the parsing `Result` which may 178 be a success or failure. 179 """ 180 result = parser.parse_result(self) 181 if not result.status and raise_failure: 182 raise ResultAsException(result) 183 return result 184 185 def success(self: Self, value: T) -> Result[T]: 186 """Produce a success Result with the given value.""" 187 return Result(True, self, FailureInfo(-1, ""), value) 188 189 def failure(self: Self, message: str) -> Result[Any]: 190 """Create a failure Result with the given failure message.""" 191 info = FailureInfo(index=self.index, message=message) 192 193 new_state = self.merge_failures((info,)) 194 return Result( 195 False, 196 new_state, 197 info, 198 None, 199 ) 200 201 def merge_state_failures(self: Self, state: TextState) -> Self: 202 return self.merge_failures(state.failures) 203 204 def merge_failures(self: Self, other: Tuple[FailureInfo, ...]) -> Self: 205 furthest_failure = ( 206 max(info.index for info in self.failures) if self.failures else -1 207 ) 208 result_failures: Tuple[FailureInfo, ...] = self.failures 209 for failure in other: 210 if furthest_failure < failure.index: 211 furthest_failure = failure.index 212 result_failures = (failure,) 213 elif furthest_failure == failure.index: 214 result_failures = (*result_failures, failure) 215 216 return self.progress(self.index, result_failures) 217 218 def replace_failures(self: Self, failures: Tuple[FailureInfo, ...]) -> Self: 219 """Replace any current failures with new failures.""" 220 return self.progress(self.index, failures) 221 222 def line_col(self: Self) -> LineColumn: 223 """The line and column at the current parser index in the text.""" 224 return LineColumn.from_index(self.text, self.index) 225 226 def context_display(self) -> str: 227 """ 228 Text which displays a context window around the current parser position, with 229 an indicator pointing to the character at the current index. 230 """ 231 window, cursor = context_window(self.text, self.index, width=40) 232 context: List[str] = [] 233 for i, line in enumerate(window): 234 if i == cursor.line: 235 context.append(line.rstrip("\n") + "\n") 236 context.append("~" * cursor.column + "^\n") 237 else: 238 context.append(line) 239 return "".join(context) 240 241 def remaining(self: Self) -> str: 242 """All of the text remaining to be parsed, from the current index onward.""" 243 return self.text[self.index :]
Parsing state: the input text, the current index of parsing, failures from previous parser branches for error reporting.
Note that many TextState objects are created during parsing and they all contain
the original input text, but these are all references to the same original string
rather than copies.
Previously encountered parsing failures, used for reporting parser failures.
142 @classmethod 143 def start(cls: Type[Self], text: str) -> Self: 144 """Initialize TextState for the given text with the index at the start.""" 145 return cls(text, 0)
Initialize TextState for the given text with the index at the start.
147 def progress( 148 self: Self, index: int, failures: Tuple[FailureInfo, ...] = tuple() 149 ) -> Self: 150 """ 151 Create a new state from the current state, maintaining any extra information 152 153 Every time a new state is made from an existing state, it should pass through 154 this function to keep any values other than the basic TextState fields. 155 This is similar to making a shallow copy but doesn't require mutation after 156 the copy is made. 157 """ 158 return type(self)( 159 self.text, 160 index, 161 failures, 162 **{ 163 field.name: getattr(self, field.name) 164 for field in fields(self) 165 if field.name not in ("text", "index", "failures") 166 }, 167 )
Create a new state from the current state, maintaining any extra information
Every time a new state is made from an existing state, it should pass through this function to keep any values other than the basic TextState fields. This is similar to making a shallow copy but doesn't require mutation after the copy is made.
169 def at(self: Self, index: int) -> Self: 170 """Move `index` to the given value, returning a new state.""" 171 return self.progress(index, self.failures)
Move index to the given value, returning a new state.
173 def apply( 174 self: Self, parser: Parser[T_co], raise_failure: bool = True 175 ) -> Result[T_co]: 176 """ 177 Apply a parser to the current state, returning the parsing `Result` which may 178 be a success or failure. 179 """ 180 result = parser.parse_result(self) 181 if not result.status and raise_failure: 182 raise ResultAsException(result) 183 return result
Apply a parser to the current state, returning the parsing Result which may
be a success or failure.
185 def success(self: Self, value: T) -> Result[T]: 186 """Produce a success Result with the given value.""" 187 return Result(True, self, FailureInfo(-1, ""), value)
Produce a success Result with the given value.
189 def failure(self: Self, message: str) -> Result[Any]: 190 """Create a failure Result with the given failure message.""" 191 info = FailureInfo(index=self.index, message=message) 192 193 new_state = self.merge_failures((info,)) 194 return Result( 195 False, 196 new_state, 197 info, 198 None, 199 )
Create a failure Result with the given failure message.
204 def merge_failures(self: Self, other: Tuple[FailureInfo, ...]) -> Self: 205 furthest_failure = ( 206 max(info.index for info in self.failures) if self.failures else -1 207 ) 208 result_failures: Tuple[FailureInfo, ...] = self.failures 209 for failure in other: 210 if furthest_failure < failure.index: 211 furthest_failure = failure.index 212 result_failures = (failure,) 213 elif furthest_failure == failure.index: 214 result_failures = (*result_failures, failure) 215 216 return self.progress(self.index, result_failures)
218 def replace_failures(self: Self, failures: Tuple[FailureInfo, ...]) -> Self: 219 """Replace any current failures with new failures.""" 220 return self.progress(self.index, failures)
Replace any current failures with new failures.
222 def line_col(self: Self) -> LineColumn: 223 """The line and column at the current parser index in the text.""" 224 return LineColumn.from_index(self.text, self.index)
The line and column at the current parser index in the text.
226 def context_display(self) -> str: 227 """ 228 Text which displays a context window around the current parser position, with 229 an indicator pointing to the character at the current index. 230 """ 231 window, cursor = context_window(self.text, self.index, width=40) 232 context: List[str] = [] 233 for i, line in enumerate(window): 234 if i == cursor.line: 235 context.append(line.rstrip("\n") + "\n") 236 context.append("~" * cursor.column + "^\n") 237 else: 238 context.append(line) 239 return "".join(context)
Text which displays a context window around the current parser position, with an indicator pointing to the character at the current index.
195@dataclass(frozen=True) 196class DebugTextState(TextState): 197 """ 198 A TextState subclass that captures parser execution information for debug display. 199 When a parser fails, this state can provide detailed information about what 200 parsers were attempted and where the failure occurred. 201 """ 202 203 tree: Node = field(default_factory=Node.default) 204 205 def progress( 206 self: Self, index: int, failures: Tuple[FailureInfo, ...] = tuple() 207 ) -> Self: 208 """ 209 Override progress to maintain the tree state across state transitions. 210 """ 211 # Use the parent's progress method but ensure we keep our tree 212 new_state = super().progress(index, failures) 213 # The tree should already be preserved by the parent's progress method 214 # since it uses **{field.name: getattr(self, field.name) for field in fields(self)} 215 return new_state 216 217 def success(self: Self, value: Any) -> Any: 218 # Capture successful parser results in the tree 219 stack = ParseStack.get_from_stack() 220 if stack.path: # Only add to tree if we have a valid path 221 node = Node(stack.path[-1], [], result=value) 222 append_tree(self.tree, stack.path, node) 223 224 return super().success(value) 225 226 def failure(self: Self, message: str) -> Any: 227 # Capture failure in the tree when it actually occurs 228 stack = ParseStack.get_from_stack() 229 if stack.path: # Only add to tree if we have a valid path 230 node = Node(stack.path[-1], [], result=Failure) 231 append_tree(self.tree, stack.path, node) 232 233 return super().failure(message) 234 235 def get_debug_info(self: Self) -> str: 236 """Get formatted debug information for this state.""" 237 return format_debug_info(self, self.tree)
A TextState subclass that captures parser execution information for debug display. When a parser fails, this state can provide detailed information about what parsers were attempted and where the failure occurred.
205 def progress( 206 self: Self, index: int, failures: Tuple[FailureInfo, ...] = tuple() 207 ) -> Self: 208 """ 209 Override progress to maintain the tree state across state transitions. 210 """ 211 # Use the parent's progress method but ensure we keep our tree 212 new_state = super().progress(index, failures) 213 # The tree should already be preserved by the parent's progress method 214 # since it uses **{field.name: getattr(self, field.name) for field in fields(self)} 215 return new_state
Override progress to maintain the tree state across state transitions.
217 def success(self: Self, value: Any) -> Any: 218 # Capture successful parser results in the tree 219 stack = ParseStack.get_from_stack() 220 if stack.path: # Only add to tree if we have a valid path 221 node = Node(stack.path[-1], [], result=value) 222 append_tree(self.tree, stack.path, node) 223 224 return super().success(value)
Produce a success Result with the given value.
226 def failure(self: Self, message: str) -> Any: 227 # Capture failure in the tree when it actually occurs 228 stack = ParseStack.get_from_stack() 229 if stack.path: # Only add to tree if we have a valid path 230 node = Node(stack.path[-1], [], result=Failure) 231 append_tree(self.tree, stack.path, node) 232 233 return super().failure(message)
Create a failure Result with the given failure message.
235 def get_debug_info(self: Self) -> str: 236 """Get formatted debug information for this state.""" 237 return format_debug_info(self, self.tree)
Get formatted debug information for this state.