Skip to content

Improve parsing of frus-history index cross references #352

@joewiz

Description

@joewiz

In the frus-history index (see the source TEI), range-based cross references are encoded in a unique way, which hsg-shell's ODD isn't parsing the same way as our pre-TEI Publisher site, and seems to be causing server errors.

Here is a sample encoded cross reference:

<item>
    <term>Aandahl, Fredrick</term>, <ref target="#range(b_446-start,b_446-end)"
        >196–199</ref>, <ref target="#b_447">203</ref>, <ref target="#b_448"
        >206</ref>
</item>

The syntax used in the first of these two @target attributes is based on the TEI Guidelines' support for XPointer; I only use the range pointer scheme. Specifically, the cross reference points to the range between two <anchor> elements with @xml:id elements in the body of the book:

  1. Line 11548
    <anchor xml:id="b_446-start" corresp="#b_446-end"/>
  2. Line 11711
    <anchor xml:id="b_446-end" corresp="#b_446-start"/>

My original handling for this, on our pre-TEI Publisher-based website, was to examine where the targets were located, and replace the book's original "196–199, 203, 206" with a web-relevant description of the target section, e.g., "Ch. 8 paras 34–39, Ch. 8 para 47, Ch. 8 para 52".

The Internet Archive contains a snapshot of the old rendering of the page.

"Ch. 8 paras 34–39, Ch. 8 para 47, Ch. 8 para 52" were given the URLs:

However, the current hsg site fails to parse the links correctly, generating URLs like this:

Our website performs a 302 redirect when these URLs, respectively, to:

... which appears to be a graceful recovery, but @windauer reported finding errors in the logs:

2019-12-20 10:40:09,297 [qtp731870416-10326] ERROR (DeferredFunctionCall.java [isEmpty]:203) - Exception in deferred function: not-found publication frus-history-monograph document frus-history section b_806 not found [at line 99, column 13, source: /db/apps/hsg-shell/modules/pages.xqm]
In function:
    pages:load-fallback-page(xs:string, xs:string, xs:string?) [85:13:/db/apps/hsg-shell/modules/pages.xqm]
    pages:load-xml(xs:string, xs:string, xs:string?, xs:string, xs:boolean?) [49:67:/db/apps/hsg-shell/modules/pages.xqm]
    pages:load(node(), map(*), xs:string?, xs:string?, xs:string?, xs:string, xs:boolean) [-1:-1:/db/apps/hsg-shell/modules/pages.xqm]
    templates:process-output(element(), map(*), item()*, element()) 
   ....

This error comes ~ 10 x time in a row followed by:

2019-12-20 10:40:09,300 [qtp731870416-10326] WARN  (HttpChannel.java [handleException]:591) - /exist/apps/hsg-shell/historicaldocuments/frus-history/b_806 
javax.servlet.ServletException: javax.servlet.ServletException: An error occurred while processing request to /exist/apps/hsg-shell/historicaldocuments/frus-history/b_806: Committed
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:162) ~[jetty-server-9.4.24.v20191120.jar:9.4.24.v20191120]
        ...
    ... 18 more

Here is the original code I wrote to transform the links:

(: handle xpointer-style range references, as found in the frus-history, e.g.,
    index entries like: 
        <term>Washington, George</term>, <ref target="#range(b_37-start,b_37-end)">9–10</ref>
    point to:
        <anchor xml:id="b_37-start" corresp="#b_37-end"/>
    and:
        <anchor xml:id="b_37-end" corresp="#b_37-start"/>
:)
else if (starts-with($target, '#range')) then
    let $range := substring-after($target, '(')
    let $range := substring-before($range, ')')
    let $range := tokenize($range, ',')
    let $range-start := $range[1]
    let $range-end := $range[2]
    let $target-start-node := root($node)/id($range-start)
    let $target-end-node := root($node)/id($range-end)
    (: use ancestor notes to ensure linkability :)
    let $target-start-node := if ($target-start-node/ancestor::tei:note) then $target-start-node/ancestor::tei:note else $target-start-node
    let $target-end-node := if ($target-end-node/ancestor::tei:note) then $target-end-node/ancestor::tei:note else $target-end-node
    let $target-start-node-ancestor-div := $target-start-node/ancestor::tei:div[1]
    let $target-end-node-ancestor-div := $target-end-node/ancestor::tei:div[1]
    let $same-ancestor-divs := $target-start-node-ancestor-div is $target-end-node-ancestor-div
    (: use the ancestor chapter div's heading, e.g., "Chapter 9: ...", but chop off at the colon :)
    let $target-nodes := ($target-start-node, $target-end-node)
    let $target-divs := ($target-start-node-ancestor-div, $target-end-node-ancestor-div)
    let $target-node-labels := 
        let $both-notes := $target-nodes[1]/self::tei:note and $target-nodes[2]/self::tei:note
        let $one-note := $target-nodes[1]/self::tei:note or $target-nodes[2]/self::tei:note
        for $target-node at $n in $target-nodes
        let $ancestor-div-label :=
            if ($same-ancestor-divs and $n = 2) then
                ()
            else 
                string-join(functx:remove-elements-deep($target-divs[$n]/tei:head[1], 'note'), '')
        let $ancestor-div-label :=
            if (contains($ancestor-div-label, ':')) then substring-before($ancestor-div-label, ':') else $ancestor-div-label
        let $node-label :=
            if ($target-node/self::tei:note) then 
                concat(if ($n = 1 and $both-notes) then 'footnotes ' else 'footnote ', $target-node/@n)
            else
                (: paragraph-like-block-number :)
                concat(if ($one-note) then 'para ' else if ($n = 1) then 'paras ' else '', index-of($target-start-node-ancestor-div/*[not(self::tei:head)][not(self::tei:byline)][not(self::tei:p[@rend='sectiontitlebold'])], $target-node/ancestor::element()[parent::tei:div][1]))
        return
            string-join(($ancestor-div-label, $node-label), ' ')
    let $label :=
        replace(string-join($target-node-labels, '–'), 'Chapter', 'Ch.')
    let $target-node-destination-hash := 
        if ($target-start-node/self::tei:note) then
            concat('#fnref', substring-after($target-start-node/@xml:id, 'fn'))
        else
            concat('#', $range-start)
    return
        (: check to make sure the targets exist :)
        if ($target-start-node and $target-end-node) then
            element a { 
                attribute href { concat($abs-site-uri, $volume, '/', $target-start-node-ancestor-div/@xml:id, $target-node-destination-hash, $persistent-view) },
                $label 
                }
        (: display the label in case of malformed links :)
        else
            $label
(: handle single point references, as found in the frus-history, e.g.,
    index entries like:
     <term>Woodford, Stewart</term>, <ref target="#b_803">98</ref>
    point to:
     <anchor xml:id="b_611"/>
:)
else if (starts-with($target, '#b')) then
    let $url := substring-after($target, '#')
    let $target-node := root($node)/id($url)
    let $target-node := if ($target-node/ancestor::tei:note) then $target-node/ancestor::tei:note else $target-node
    let $destination-div := $target-node/ancestor::tei:div[1]
    (: use the ancestor chapter div's heading, e.g., "Chapter 9: ...", but chop off at the colon :)
    let $head := string-join(functx:remove-elements-deep($destination-div/tei:head[1], 'note'), '')
    let $target-node-label :=
        if ($target-node/self::tei:note) then 
            concat('footnote ', $target-node/@n)
        else
            concat('para ', index-of($destination-div/*[not(self::tei:head)][not(self::tei:byline)][not(self::tei:p[@rend='sectiontitlebold'])], $target-node/ancestor::element()[parent::tei:div][1]))
    let $label := replace(concat(if (contains($head, ':')) then substring-before($head, ':') else $head, ' ', $target-node-label), 'Chapter', 'Ch.')
    let $target-node-destination-hash := 
        if ($target-node/self::tei:note) then
            concat('#fnref', substring-after($target-node/@xml:id, 'fn'))
        else
            $target
    return
        if ($target-node) then 
            element a { 
                attribute href { concat($abs-site-uri, $volume, '/', $destination-div/@xml:id, $target-node-destination-hash, $persistent-view) },
                $label 
                }
        (: display the label in case of malformed links :)
        else 
            $label
else
    element a { 
        attribute href { concat($abs-site-uri, $volume, '/', substring-after($target, '#'), $persistent-view) }, 
        $type,
        render:recurse($node, $options) 
        }

We should research the logs to find the source of the error messages above, and, if needed, adapt the original link parsing code to our current ODD-based method for transforming TEI into HTML.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions