将.tei文件转换为.txt fi

<biblStruct xml:id="b0"> <analytic> <title level="a" type="main">The Semantic Web</title> <author> <persName xmlns="http://www.tei-c.org/ns/1.0"> <forename type="first">T</forename> <surname>Berners-Lee</surname> </persName> </author> <author> <persName xmlns="http://www.tei-c.org/ns/1.0"> <forename type="first">J</forename> <surname>Hendler</surname> </persName> </author> <author> <persName xmlns="http://www.tei-c.org/ns/1.0"> <forename type="first">O</forename> <surname>Lassilia</surname> </persName> </author> </analytic> <monogr> <title level="j">Scientific American</title> <imprint> <date type="published" when="2001-05" /> </imprint> </monogr> </biblStruct>

1条回答

网友

1楼 · 发布于 2024-04-29 05:44:52

我知道您正在寻找一个Python解决方案，但是由于XSLT是一个非常方便的替代方案，而且非常适合.xml文件，所以我还是发布一个XSLT解决方案。在

我想它可以很容易地集成到您的Python解决方案中。
因此，这是必要的XSLT：

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:tei="http://www.tei-c.org/ns/1.0" xmlns:month="http://month.com">
    <xsl:output method="text" />
    <xsl:strip-space elements="*" />

    <month:month>
        <month name="Jan" />
        <month name="Feb" />
        <month name="Mar" />
        <month name="Apr" />
        <month name="May" />
        <month name="Jun" />
        <month name="Jul" />
        <month name="Aug" />
        <month name="Sep" />
        <month name="Oct" />
        <month name="Nov" />
        <month name="Dec" />
    </month:month>

    <xsl:template match="author[position()=1]">
        <xsl:value-of select="concat(tei:persName/tei:forename, '. ',tei:persName/tei:surname)" />
    </xsl:template>    

    <xsl:template match="author">
        <xsl:value-of select="concat(', ',tei:persName/tei:forename, '. ',tei:persName/tei:surname)" />
    </xsl:template>

    <xsl:template match="author[last()]">
        <xsl:value-of select="concat(' and ',tei:persName/tei:forename, '. ',tei:persName/tei:surname)" />
    </xsl:template>

    <xsl:template match="/biblStruct">
        <xsl:apply-templates select="analytic/author" />
        <xsl:variable name="mon" select="number(substring(monogr/imprint/date/@when,6,2))" />
        <xsl:value-of select='concat(" &apos;",analytic/title,"&apos;",", ",monogr/title, ", ")' />   
        <xsl:value-of select="document('')/xsl:stylesheet/month:month/month[$mon]/@name" />
        <xsl:value-of select="concat(' ',/xsl:stylesheet/month:month[substring(monogr/imprint/date/@when,5,2)],substring(monogr/imprint/date/@when,1,4))" />
    </xsl:template>

</xsl:stylesheet>

您不必对XSLT有太多了解就可以理解这段代码：
有三个模板匹配author元素——一个匹配第一个匹配，一个匹配last()匹配，还有一个匹配两者之间的所有元素。它们只在处理,和{}等分隔符时有所不同。在

最后一个模板处理整个XML并合并其他三个模板的输出。它还设法通过引用month:month数据岛将数字月份号转换为字符串。在

您还应该查看xsl:stylesheet元素的已定义名称空间：

一个用于XSL:http://www.w3.org/1999/XSL/Transform
一个用于TEI:http://www.tei-c.org/ns/1.0
一个月：http://month.com用于数据岛

我希望我为使用XSLT文件进行转换提供了一个令人信服的例子。xsl:output元素使用method="text"指定所需的文本输出目标。在

相关问题更多 >

编程相关推荐

热门问题

热门文章