Skip to content

Library Reference

This page documents how to include CiteURL in your Python programming projects.

The first step is to instantiate a Citator, which by default contains all of CiteURL's built-in Templates:

from citeurl import Citator
citator = Citator()

After that, you can feed it text to return a list of Citations it finds:

text = """
Federal law provides that courts should award prevailing civil rights plaintiffs reasonable attorneys fees, 42 USC § 1988(b), and, by discretion, expert fees, id. at (c). This is because the importance of civil rights litigation cannot be measured by a damages judgment. See Riverside v. Rivera, 477 U.S. 561 (1986). But Evans v. Jeff D. upheld a settlement where the plaintiffs got everything they wanted, on condition that they waive attorneys' fees. 475 U.S. 717 (1986). This ruling lets savvy defendants create a wedge between plaintiffs and their attorneys, discouraging civil rights suits and undermining the court's logic in Riverside, 477 U.S. at 574-78.
"""
citations = citator.list_cites(text)

Once you have a list of citations, you can get information about each one:

print(citations[0].text)
# 42 USC § 1988(b)
print(citations[0].tokens)
# {'Title': '42', 'Section': '1988', 'subsection': '(b)'}
print(citations[0].URL)
# https://www.law.cornell.edu/uscode/text/42/1988#b

You can also use insert_links() to insert the citations back into the source text as HTML hyperlinks:

from citeurl import insert_links
output = insert_links(citations, text)

You can also compare citations to one another, to determine whether they reference the same material or a subsection thereof:

art_I = citator.cite('U.S. Const. Art. I')
also_art_I = citator.cite('Article I of the U.S. Constitution')
art_I_sec_3 = citator.cite('U.S. Const. Art. I, § 3')

assert art_I == also_art_I
assert art_I_sec_3 in art_I

Citator

A collection of citation templates, and the tools to match text against them en masse.

Attributes:

Name Type Description
templates

a dictionary of citation templates that this citator will try to match against

__init__(self, defaults=['caselaw', 'general federal law', 'specific federal laws', 'state law', 'secondary sources'], yaml_paths=[], templates={}) special

Create a citator from any combination of CiteURL's default template sets (by default, all of them), plus any custom templates you want, either by pointing to custom YAML files or making Template objects at runtime.

Parameters:

Name Type Description Default
defaults

names of files to load from the citeurl/templates folder. Each file contains one or more of CiteURL's built-in templates relevant to the given topic.

['caselaw', 'general federal law', 'specific federal laws', 'state law', 'secondary sources']
yaml_paths list

paths to custom YAML files to load templates from. These are loaded after the defaults, so they can inherit and/or overwrite them.

[]
templates dict

optional list of Template objects to load directly. These are loaded last, after the defaults and any yaml_paths.

{}
Source code in citeurl/citator.py
def __init__(
    self,
    defaults = [
        'caselaw',
        'general federal law',
        'specific federal laws',
        'state law',
        'secondary sources',
    ],
    yaml_paths: list[str] = [],
    templates: dict[str, Template] = {},
):
    """
    Create a citator from any combination of CiteURL's default
    template sets (by default, all of them), plus any custom
    templates you want, either by pointing to custom YAML files or
    making Template objects at runtime.

    Arguments:
        defaults: names of files to load from the citeurl/templates
            folder. Each file contains one or more of CiteURL's
            built-in templates relevant to the given topic.
        yaml_paths: paths to custom YAML files to load templates
            from. These are loaded after the defaults, so they can
            inherit and/or overwrite them.
        templates: optional list of Template objects to load
            directly. These are loaded last, after the defaults and
            any yaml_paths.
    """
    self.templates = {}

    yamls_path = Path(__file__).parent.absolute() / 'templates'    
    for name in defaults or []:
        yaml_file = yamls_path / f'{name}.yaml'
        self.load_yaml(yaml_file.read_text())

    for path in yaml_paths:
        self.load_yaml(Path(path).read_text())
    self.templates.update(templates)

cite(self, text, broad=True)

Check the given text against each of the citator's templates and return the first citation detected, or None.

If broad is true, matching is case-insensitive and each template's broad regexes are used in addition to its normal regexes.

Source code in citeurl/citator.py
def cite(self, text: str, broad: bool=True) -> Citation:
    """
    Check the given text against each of the citator's templates and
    return the first citation detected, or None.

    If broad is true, matching is case-insensitive and each
    template's broad regexes are used in addition to its normal
    regexes.
    """
    for template in self.templates.values():
        cite = template.cite(text, broad=broad)
        if cite:
            return cite
    else:
        return None

from_yaml(yaml) classmethod

Create a citator from scratch (i.e. without the default templates) by loading templates from the specified YAML string.

Source code in citeurl/citator.py
@classmethod
def from_yaml(cls, yaml: str):
    """
    Create a citator from scratch (i.e. without the default
    templates) by loading templates from the specified YAML string.
    """
    citator = cls(defaults=None)
    citator.load_yaml(yaml)
    return citator

list_cites(self, text, id_breaks=None)

Find all citations in the given text, whether longform, shortform, or idform. They will be listed in order of appearance. If any two citations overlap, the shorter one will be deleted.

Wherever the id_breaks pattern matches, it will interrupt chains of idform citations. This is helpful

Source code in citeurl/citator.py
def list_cites(
    self,
    text: str,
    id_breaks: re.Pattern = None,
) -> list[Citation]:
    """
    Find all citations in the given text, whether longform,
    shortform, or idform. They will be listed in order of
    appearance. If any two citations overlap, the shorter one will
    be deleted. 

    Wherever the id_breaks pattern matches, it will interrupt chains
    of idform citations. This is helpful 
    """
    # first get a list of all long and shortform (not id.) citations
    longforms = []
    for template in self.templates.values():
        longforms += template.list_longform_cites(text)

    shortforms = []
    for citation in longforms:
        shortforms += citation.get_shortform_cites()

    citations = longforms + shortforms
    _sort_and_remove_overlaps(citations)

    # Figure out where to interrupt chains of idform citations,
    # i.e. anywhere a longform or shortform citation starts, plus
    # the start of any substring that matches the id_breaks pattern
    breakpoints = [c.span[0] for c in citations]
    if id_breaks:
        breakpoints += [
            match.span()[0] for match in
            id_breaks.finditer(text)
        ]
    breakpoints = sorted(set(breakpoints))

    # for each cite, look for idform citations until the next cite
    # or until the next breakpoint
    idforms = []
    for cite in citations:
        # find the next relevant breakpoint, and delete any
        # breakpoints that are already behind the current citation
        for i, breakpoint in enumerate(breakpoints):
            if breakpoint >= cite.span[1]:
                breakpoints = breakpoints[i:]
                break
        try:
            breakpoint = breakpoints[0]
        except IndexError:
            breakpoint = None

        # find the first idform reference to the citation, then the
        # first idform reference to that idform, and so on, until
        # the breakpoint
        idform = cite.get_idform_cite(until_index=breakpoint)
        while idform:
            idforms.append(idform)
            idform = idform.get_idform_cite(until_index=breakpoint)

    citations += idforms
    _sort_and_remove_overlaps(citations)
    return citations

load_yaml(self, yaml)

Load templates from the given YAML, overwriting any existing templates with the same name.

Source code in citeurl/citator.py
def load_yaml(self, yaml: str):
    """
    Load templates from the given YAML, overwriting any existing
    templates with the same name.
    """
    for name, data in safe_load(yaml).items():
        self.templates[name] = Template.from_dict(
            name, data, inheritables=self.templates
        )

to_yaml(self)

Save this citator to a YAML string to load later

Source code in citeurl/citator.py
def to_yaml(self):
    "Save this citator to a YAML string to load later"
    yamls = [t.to_yaml() for t in self.templates.values()]
    return '\n\n'.join(yamls)

Citation

A legal reference found in text.

Attributes:

Name Type Description
tokens

dictionary of the values that define this citation, such as its volume and page number, or its title, section, and subsection, etc

URL

the location, if any, where this citation can be found online, defined by the template's URL_builder

name

a uniform, human-readable representation of this citation, written by the template's name_builder

text

the actual text of this citation as found in the source text

source_text

the full text that this citation was found in

template

the template whose regexes found this citation or its parent

parent

the earlier citation, if any, that this citation is a shortform or idform child of

raw_tokens

dictionary of tokens as captured in the original regex match, before normalization. Note that for child citations, raw_tokens will include any raw_tokens inferred from the parent citation.

idform_regexes

list of regex pattern objects to find child citations later in the text, valid until the next different citation appears.

shortform_regexes

list of regex pattern objects to find child citations anywhere in the subsequent text

__contains__(self, other_cite) special

Returns True if both citations are from templates with the same name, and the only difference between their tokens is that the other one has a more specific (i.e. higher-indexed) token than any of this one's. Severable tokens are considered a match if the other token's value starts with this one's.

Source code in citeurl/citation.py
def __contains__(self, other_cite):
    """
    Returns True if both citations are from templates with the same
    name, and the only difference between their tokens is that the
    other one has a more specific (i.e. higher-indexed) token than
    any of this one's. Severable tokens are considered a match if
    the other token's value *starts with* this one's.
    """
    if (
        other_cite.template.name != self.template.name
        or other_cite.tokens == self.tokens
    ):
        return False
    for key, value in self.tokens.items():
        if value and other_cite.tokens.get(key) != value:
            if (
                self.template.tokens[key].severable
                and other_cite.tokens[key]
                and other_cite.tokens[key].startswith(value)
            ):
                continue
            else:
                return False
    else:
        return True

__eq__(self, other_cite) special

Returns True if both citations are from templates with the same name, and they have the exact same token values.

Source code in citeurl/citation.py
def __eq__(self, other_cite):
    """
    Returns True if both citations are from templates with the same
    name, and they have the exact same token values.
    """
    return (
        other_cite.template.name == self.template.name
        and other_cite.tokens == self.tokens
    )

Template

A pattern to recognize a single kind of citation and extract information from it.

__init__(self, name, tokens={}, meta={}, patterns=[], broad_patterns=[], shortform_patterns=[], idform_patterns=[], name_builder=None, URL_builder=None, inherit_template=None) special

Parameters:

Name Type Description Default
name str

the name of this template

required
tokens dict

The full dictionary of TokenTypes that citations from this template can contain. These must be listed in order from least-specific to most. For instance, the U.S. Constitution's template puts 'article' before 'section' before 'clause', because articles contain sections, and sections contain clauses.

{}
patterns list

Patterns are essentially regexes to recognize recognize long-form citations to this template. However, wherever a token would appear in the regex, it should be replaced by the name of the token, enclosed in curly braces.

Patterns are matched in the order that they are listed, so if there is a pattern that can only find a subset of tokens, it should be listed after the more-complete pattern so that the better match won't be precluded.

[]
broad_patterns list

Same as patterns, except that they will only be used in contexts like search engines, where convenience is more important than avoiding false positive matches. When used, they will be used in addition to the normal patterns.

[]
shortform_patterns list

Same as patterns, but these will only go into effect after a longform citation has been recognized. If a shortform pattern includes "same TOKEN_NAME" in curly braces, e.g. "{same volume}", the bracketed portion will be replaced with the exact text of the corresponding raw_token from the long-form citation.

[]
idform_patterns list

Same as shortform_patterns, except that they will only be used to scan text until the next different citation occurs.

[]
URL_builder StringBuilder

StringBuilder to construct URLs for found citations

None
name_builder StringBuilder

StringBuilder to construct canonical names of found citations

None
meta dict

Optional metadata relating to this template. Patterns and StringBuilders can access metadata fields as if they were tokens, though fields can be overridden by tokens with the same name.

{}
inherit_template

another Template whose values this one should copy unless expressly overwritten.

None
Source code in citeurl/citator.py
def __init__(
    self,
    name: str,
    tokens: dict[str, TokenType] = {},
    meta: dict[str, str] = {},
    patterns: list[str] = [],
    broad_patterns: list[str] = [],
    shortform_patterns: list[str] = [],
    idform_patterns: list[str] = [],
    name_builder: StringBuilder = None,
    URL_builder: StringBuilder = None,
    inherit_template = None,
):
    """
    Arguments:
        name: the name of this template

        tokens: The full dictionary of TokenTypes that citations from
            this template can contain. These must be listed in order
            from least-specific to most. For instance, the U.S.
            Constitution's template puts 'article' before 'section'
            before 'clause', because articles contain sections, and
            sections contain clauses.

        patterns: Patterns are essentially regexes to recognize
            recognize long-form citations to this template. However,
            wherever a token would appear in the regex, it should be
            replaced by the name of the token, enclosed in curly
            braces.

            Patterns are matched in the order that they are listed,
            so if there is a pattern that can only find a subset of
            tokens, it should be listed after the more-complete
            pattern so that the better match won't be precluded.

        broad_patterns: Same as `patterns`, except that they will
            only be used in contexts like search engines, where
            convenience is more important than avoiding false
            positive matches. When used, they will be used in
            addition to the normal patterns.

        shortform_patterns: Same as `patterns`, but these will only
            go into effect after a longform citation has been
            recognized. If a shortform pattern includes "same
            TOKEN_NAME" in curly braces, e.g. "{same volume}", the
            bracketed portion will be replaced with the exact text
            of the corresponding `raw_token` from the long-form
            citation.

        idform_patterns: Same as `shortform_patterns`, except that
            they will only be used to scan text until the next
            different citation occurs.

        URL_builder: `StringBuilder` to construct URLs for found
            citations

        name_builder: `StringBuilder` to construct canonical names
            of found citations

        meta: Optional metadata relating to this template. Patterns
            and StringBuilders can access metadata fields as if they
            were tokens, though fields can be overridden by tokens
            with the same name.

        inherit_template: another `Template` whose values this one
            should copy unless expressly overwritten.
    """
    kwargs = locals()
    for attr, default in {
        'name':               None,
        'tokens':             {},
        'patterns':           [],
        'broad_patterns':     [],
        'shortform_patterns': [],
        'idform_patterns':    [],
        'URL_builder':        None,
        'name_builder':       None,
        'meta':               {},
    }.items():
        if inherit_template and kwargs[attr] == default:
            value = inherit_template.__dict__.get(attr)
        elif attr.endswith('patterns') and not kwargs[attr]:
            value = []
        else:
            value = kwargs[attr]
        self.__dict__[attr] = value

    # update inherited StringBuilders with the correct metadata
    if inherit_template and self.meta:
        if self.URL_builder:
            self.URL_builder = copy(self.URL_builder)
            self.URL_builder.defaults = self.meta
        if self.name_builder:
            self.name_builder = copy(self.name_builder)
            self.name_builder.defaults = self.meta

    # use the template's metadata and tokens to make a dictionary
    # of replacements to insert into the regexes before compilation
    replacements = {k:str(v) for (k, v) in self.meta.items()}
    replacements.update({
        k:fr'(?P<{k}>{v.regex})(?!\w)'
        for (k,v) in self.tokens.items()
    })

    # compile the template's regexes and broad_regexes
    self.regexes = []
    self.broad_regexes = []
    for kind in ['regexes', 'broad_regexes']:
        if kind == 'broad_regexes':
            pattern_list = self.patterns + self.broad_patterns
            flags = re.I
        else:
            pattern_list = self.patterns
            flags = 0

        for p in pattern_list:
            pattern = process_pattern(
                p,
                replacements,
                add_word_breaks=True)
            try:
                regex = re.compile(pattern, flags)
                self.__dict__[kind].append(regex)
            except re.error as e:
                i = 'broad ' if kind == 'broad_regexes' else ''
                raise re.error(
                    f'{self} template\'s {i}pattern "{pattern}" has '
                    f'an error: {e}'
                )

    self._processed_shortforms = [
        process_pattern(p, replacements, add_word_breaks=True)
        for p in self.shortform_patterns
    ]
    self._processed_idforms = [
        process_pattern(p, replacements, add_word_breaks=True)
        for p in self.idform_patterns
    ]

cite(self, text, broad=True, span=(0,))

Return the first citation that matches this template. If 'broad' is True, case-insensitive matching and broad regex patterns will be used. If no matches are found, return None.

Source code in citeurl/citator.py
def cite(self, text, broad: bool=True, span: tuple=(0,)) -> Citation:
    """
    Return the first citation that matches this template. If 'broad'
    is True, case-insensitive matching and broad regex patterns will
    be used. If no matches are found, return None.
    """
    regexes = self.broad_regexes if broad else self.regexes
    matches = match_regexes(text, regexes, span=span)
    for match in matches:
        try:
            return Citation(match, self)
        except SyntaxError: # invalid citation
            continue
    else:
        return None

from_dict(name, values, inheritables={}) classmethod

Return a template from a dictionary of values, like a dictionary created by parsing a template from YAML format.

Source code in citeurl/citator.py
@classmethod
def from_dict(cls, name: str, values: dict, inheritables: dict={}):
    """
    Return a template from a dictionary of values, like a dictionary
    created by parsing a template from YAML format.
    """
    values = {
        k.replace(' ', '_'):v
        for k,v in values.items()
    }

    # when pattern is listed in singular form,
    # replace it with a one-item list
    items = values.items()
    values = {}
    for key, value in items:
        if key.endswith('pattern'):
            values[key + 's'] = [value]
        else:
            values[key] = value

    # unrelated: when a single pattern is split
    # into a list (likely to take advantage of
    # YAML anchors), join it into one string
    for k,v in values.items():
        if not k.endswith('patterns'):
            continue
        elif v is None:
            values[k] = None
            continue
        for i, pattern in enumerate(v):
            if type(pattern) is list:
                values[k][i] = ''.join(pattern)

    inherit = values.get('inherit')

    if inherit:
        values.pop('inherit')
        try:
            values['inherit_template'] = inheritables.get(inherit)
        except KeyError:
            raise KeyError(
                f'Template "{name}" tried to inherit unknown '
                + f'template "{inherit}"'
            )

    for key in ['name_builder', 'URL_builder']:
        data = values.get(key)
        if data:
            data['defaults'] = values.get('meta') or {}
            values[key] = StringBuilder.from_dict(data)
    values['tokens'] = {
        k: TokenType.from_dict(k, v)
        for k,v in values.get('tokens', {}).items()
    }
    return cls(name=name, **values)

list_longform_cites(self, text, broad=False, span=(0,))

Get a list of all long-form citations to this template found in the given text.

Source code in citeurl/citator.py
def list_longform_cites(self, text, broad: bool=False, span: tuple=(0,)):
    """
    Get a list of all long-form citations to this template found in
    the given text.
    """
    cites = []
    regexes = self.broad_regexes if broad else self.regexes
    for match in match_regexes(text, regexes, span=span):
        try:
            cites.append(Citation(match, self))
        except SyntaxError:
            continue
    return cites

to_dict(self)

save this Template to a dictionary of values

Source code in citeurl/citator.py
def to_dict(self) -> dict:
    "save this Template to a dictionary of values"
    output = {}
    if self.meta:
        output['meta'] = self.meta
    output['tokens'] = {
        k:v.to_dict() for k, v in self.tokens.items()
    }
    for key in ['patterns', 'shortform_patterns', 'idform_patterns']:
        value = self.__dict__.get(key)
        if not value:
            continue
        elif len(value) > 1:
            output[key] = value
        else: # de-pluralize lists that contain only one pattern
            output[key[:-1]] = value[0]
    for key in ['name_builder', 'URL_builder']:
        if self.__dict__.get(key):
            output[key] = self.__dict__[key].to_dict()

    spaced_output = {k.replace('_', ' '):v for k, v in output.items()}

    return spaced_output

to_yaml(self)

save this Template to a YAML string

Source code in citeurl/citator.py
def to_yaml(self) -> str:
    "save this Template to a YAML string"
    return safe_dump(
        {self.name: self.to_dict()},
        sort_keys = False,
        allow_unicode = True,
    )

TokenType

These objects represent categories of tokens that might be found in a citation.

Attributes:

Name Type Description
regex

A regular expression that matches the actual text of the token as found in any document, like the "42" in "42 USC § 1983" or the "Fourteenth" in "The Fourteenth Amendment". This regex will automatically be enclosed in a named capture group and inserted into any of the template's match patterns wherever the token's name appears in curly braces.

edits

Steps to normalize the token as captured in the regex into a value that is consistent across multiple styles.

default

Set the token to this value if it is not found in the citation.

severable

If two citations only differ based on this token, and only because one of the tokens extends longer than the other, e.g. "(b)(2)" and "(b)(2)(A)", then severable means that the former citation is thought to encompass the latter.

from_dict(name, data) classmethod

load a TokenType from a dictionary of values

Source code in citeurl/tokens.py
@classmethod
def from_dict(cls, name: str, data: dict):
    "load a TokenType from a dictionary of values"
    return cls(
        regex = data['regex'],
        default = data.get('default'),
        edits = [
            TokenOperation.from_dict(v)
            for v in data.get('edits', [])
        ],
        severable=data.get('severable', False)
    )

to_dict(self)

save this TokenType to a dictionary for storage in YAML format

Source code in citeurl/tokens.py
def to_dict(self) -> dict:
    "save this TokenType to a dictionary for storage in YAML format"
    output = {'regex': self.regex}
    if self.edits:
        output['edits'] = [
            e.to_dict() for e in self.edits
        ]
    if self.default:
        output['default'] = self.default
    if self.severable:
        output['severable'] = True
    return output

TokenOperation

A function to perform a predefined string manipulation

__init__(self, action, data, mandatory=True, token=None, output=None) special

Parameters:

Name Type Description Default
action str

The kind of string manipulation that this operation will perform, using the given data. There are a few different options:

'sub': Regex substitution to perform on the text. Needs a list of two values: [PATTERN, REPLACEMENT]

'lookup': Check if the token matches any of the given regexes (via case-insensitive matching), and if so, replace it with the corresponding value. Needs a dictionary of regex: replacement pairs.

'case': Capitalize the token in the specified way. Options are 'upper', 'lower', and 'title'.

'lpad': Left pad the token with zeros until it is the specified number of characters long. Requires an int specifying the number of characters. You can also specify the padding character by providing a tuple: (MINIMUM_LENGTH, PADDING_CHARACTER).

'number_style': Assume that the token is a number, either in the form of digits, Roman numerals, or number words like "thirty-seven". Convert it into the specified number format, which can be any of these:

'cardinal', e.g. "twenty-seven"

'cardinal spaced', e.g. "twenty seven"

'cardinal unspaced', e.g. "twentyseven"

'ordinal', e.g. "twenty-seventh"

'ordinal spaced', e.g. "twenty seventh"

'ordinal unspaced', e.g. "twentyseventh"

'roman numeral', e.g. 'xxvii'

'digit', e.g. '27'

Note that number formatting only works for positive
whole numbers that do not exceed 40.
required
data

any data that a given action needs specified, as described above

required
mandatory bool

whether a failed lookup or format action should invalidate the entire citation

True
token str

Necessary for operations in StringBuilders. This value lets you provide the name of input token to use, allowing you to then use the modify_dict() method.

None
output str

If this value is set, modify_dict() will save the operation's output to the dictionary key with this name instead of modifying the input token in place.

None
Source code in citeurl/tokens.py
def __init__(
    self,
    action: str,
    data,
    mandatory: bool = True,
    token: str = None,
    output: str = None,
):
    """
    Arguments:
        action: The kind of string manipulation that this operation
            will perform, using the given data. There are a few
            different options:

            'sub': Regex substitution to perform on the text. Needs
                a list of two values: [PATTERN, REPLACEMENT]

            'lookup': Check if the token matches any of the given
                regexes (via case-insensitive matching), and if so,
                replace it with the corresponding value. Needs a
                dictionary of `regex`: `replacement` pairs.

            'case': Capitalize the token in the specified way.
                Options are 'upper', 'lower', and 'title'.

            'lpad': Left pad the token with zeros until it is the
                specified number of characters long. Requires an
                int specifying the number of characters. You can
                also specify the padding character by providing a
                tuple: (MINIMUM_LENGTH, PADDING_CHARACTER).

            'number_style': Assume that the token is a number,
                either in the form of digits, Roman numerals, or
                number words like "thirty-seven". Convert it into
                the specified number format, which can be any of
                these:

                'cardinal', e.g. "twenty-seven"

                'cardinal spaced', e.g. "twenty seven"

                'cardinal unspaced', e.g. "twentyseven"

                'ordinal', e.g. "twenty-seventh"

                'ordinal spaced', e.g. "twenty seventh"

                'ordinal unspaced', e.g. "twentyseventh"

                'roman numeral', e.g. 'xxvii'

                'digit', e.g. '27'

                Note that number formatting only works for positive
                whole numbers that do not exceed 40.

        data: any data that a given action needs specified, as
            described above

        mandatory: whether a failed lookup or format action should
            invalidate the entire citation

        token: Necessary for operations in StringBuilders. This
            value lets you provide the name of input token to use,
            allowing you to then use the modify_dict() method.

        output: If this value is set, modify_dict() will save the
            operation's output to the dictionary key with this name
            instead of modifying the input token in place.
    """
    if action == 'sub':
        self.func = lambda x: re.sub(data[0], data[1], x)
    elif action == 'lookup':
        table = {
            re.compile(k, flags=re.I):v
            for k, v in data.items()
        }
        self.func = lambda x: self._lookup(x, table, mandatory)
    elif action == 'case':
        self.func = lambda x: self._set_case(x, data)
    elif action == 'lpad':
        self.func = lambda x: self._left_pad(x, data)
    elif action == 'number_style':
        action_options = ['cardinal', 'ordinal', 'roman', 'digit']
        if data not in action_options:
            raise SyntaxError(
                f'{data} is not a valid number style. Valid options: '
                f'{action_options}'
            )
        self.func = lambda x: self._number_style(x, data, mandatory)
    else:
        raise SyntaxError(
            f'{action} is not a defined token operation.'
        )

    self.action = action
    self.data = data
    self.mandatory = mandatory
    self.token = token
    self.output = output

from_dict(data) classmethod

load a TokenOperation from a dictionary of values

Source code in citeurl/tokens.py
@classmethod
def from_dict(cls, data: dict):
    "load a TokenOperation from a dictionary of values"
    operations = []
    for key in ['sub', 'lookup', 'case', 'lpad', 'number style']:
        value = data.get(key)
        if value:
            action = key.replace(' ', '_')
            action_data = value
            break
    mandatory = data.get('mandatory', True)
    token = data.get('token')
    output = data.get('output')
    return cls(action, action_data, mandatory, token, output)

modify_dict(self, tokens)

apply this operation to a dictionary of tokens, editing them as appropriate

Source code in citeurl/tokens.py
def modify_dict(self, tokens: dict):
    """
    apply this operation to a dictionary of tokens,
    editing them as appropriate
    """
    if not tokens.get(self.token):
        return
    if self.output:
        tokens[self.output] = self.func(tokens[self.token])
    else:
        tokens[self.token] = self.func(tokens[self.token])

to_dict(self)

save this TokenOperation to a dictionary of values

Source code in citeurl/tokens.py
def to_dict(self) -> dict:
    "save this TokenOperation to a dictionary of values"
    output = {}
    for key in ['token', 'output']:
        if self.__dict__.get(key):
            output[key] = self.__dict__[key]
    output[self.action] = self.data
    if not self.mandatory:
        output['mandatory'] = False

    spaced_output = {k.replace('_', ' '):v for k, v in output.items()}

    return spaced_output

StringBuilder

A function to take a dictionary of values and use it to construct a piece of text from them. This is used for citation templates' name builders and URL builders.

Attributes:

Name Type Description
parts

A list of strings that will be concatenated to create the string. Parts may contain bracketed references to citations' token values as well as templates' metadata. If a part references a token whose value is not set, the part will be omitted from the created string.

edits

A list of TokenOperations that will be performed on the provided tokens before the string is constructed. If the edits have output values, it is possible for them to define entirely new tokens for the sole purpose of building the string.

defaults

A dictionary of default token values to use when not overwritten by the citation. Generally these are provided by the template's meta attribute.

from_dict(data) classmethod

load StringBuilder from dictionary of values

Source code in citeurl/tokens.py
@classmethod
def from_dict(cls, data: dict):
    "load StringBuilder from dictionary of values"
    edits = [
        TokenOperation.from_dict(o)
        for o in data.get('edits', [])
    ]
    parts = data['parts']
    defaults = data.get('defaults') or {}
    return cls(parts, edits, defaults)

to_dict(self)

save StringBuilder to a dictionary of values

Source code in citeurl/tokens.py
def to_dict(self) -> dict:
    "save StringBuilder to a dictionary of values"
    output = {'parts': self.parts}
    if self.edits:
        output['edits'] = [op.to_dict() for op in self.edits]
    return output

Add each citation back into the given text as HTML hyperlinks, placed via the spans where they were initially found.

Parameters:

Name Type Description Default
citations list

list of citation objects to insert back into the text

required
text str

the string where all the citations were found.

required
attrs dict

various HTML link attributes to give each inserted link

{'class': 'citation'}
add_title bool

whether to use citation.name for link titles

True
URL_optional bool

whether to insert a hyperlink even when the citation does not have an associated URL

False
redundant_links bool

whether to insert a hyperlink if it would go to the same URL as the previous link

True

Returns:

Type Description
str

text, with an HTML a element for each citation.

Source code in citeurl/hyperlink.py
def insert_links(
    citations: list[Citation],
    text: str,
    attrs: dict = {'class': 'citation'},
    add_title: bool = True,
    URL_optional: bool = False,
    redundant_links: bool = True,
) -> str:
    """
    Add each citation back into the given text as HTML hyperlinks,
    placed via the spans where they were initially found.

    Arguments:
        citations: list of citation objects to insert back into the text
        text: the string where all the citations were found.
        attrs: various HTML link attributes to give each inserted link
        add_title: whether to use citation.name for link titles
        URL_optional: whether to insert a hyperlink even when the
            citation does not have an associated URL
        redundant_links: whether to insert a hyperlink if it would go to
            the same URL as the previous link

    Returns:
        text, with an HTML `a` element for each citation. 
    """    
    offset = 0
    last_URL = None
    for cite in citations:
        attrs['href'] = cite.URL

        if not cite.URL and not URL_optional:
            continue
        if not redundant_links and cite.URL == last_URL:
            continue

        if add_title:
            attrs['title'] = cite.name

        attr_str = ''.join([
            f' {k}="{v}"'
            for k, v in attrs.items() if v
        ])
        link = f'<a{attr_str}>{cite.text}</a>'

        span = (
            cite.span[0] + offset,
            cite.span[1] + offset,
        )
        text = text[:span[0]] + link + text[span[1]:]

        offset += len(link) - len(cite.text)
        last_URL = cite.URL
    return text

cite()

Convenience function to find a single citation in text, or None. See Citator.cite() for more info.

Source code in citeurl/citator.py
def cite(
    text: str,
    broad: bool = True,
    citator: Citator='DEFAULT'
) -> Citation:
    """
    Convenience function to find a single citation in text, or None. See
    Citator.cite() for more info.
    """
    if citator == 'DEFAULT':
        citator = _get_default_citator()
    return citator.cite(text, broad=broad)

list_cites()

Convenience function to list all citations in a text. For more info, see Citator.list_cites().

Source code in citeurl/citator.py
def list_cites(text, citator: Citator='DEFAULT', id_breaks=None):
    """
    Convenience function to list all citations in a text. For more info,
    see Citator.list_cites().
    """
    if citator == 'DEFAULT':
        citator = _get_default_citator()
    return citator.list_cites(text, id_breaks=id_breaks)