如何从给定字符串中删除html标记和代码

2024-05-23 19:24:28 发布

您现在位置:Python中文网/ 问答频道 /正文

在抓取一些网站时,我看到一些文本包含HTML标记、CSS样式、未定义的字符。。。在里面。由于这些字符,我在将其插入数据库时出错

文本示例:

text = "<p><strong style="background-color: inherit;">PROGRAM DESCRIPTION:&nbsp;</strong><span style="background-color: inherit;">U.S. Embassy Moscow’s Public Affairs Section (PAS) is..."

当我尝试将其插入数据库时,会出现以下错误:

Arguments: (UnicodeEncodeError('charmap', '<p><strong style="background-color: inherit;">PROGRAM DESCRIPTION:&nbsp;</strong><span style="background-color: inherit;">U.S. Embassy Moscow’s Public Affairs Section (PAS) is pleased to announce “International Cooperation,” which supports American and Russian cooperation in multinational and binational frameworks to address global issues, rule of law, and international agreements and partnerships. Funding for this program is now available through our PAS grants office.&nbsp;This call for proposals outlines our funding priorities, strategic themes, and the procedure for submitting requests for funding. Applicants may apply for funding for any amount up to $200,000.&nbsp;&nbsp;</span>&nbsp;</p>\n<p><span style="background-color: inherit;">&nbsp;</span>&nbsp;</p>\n<p><span style="background-color: inherit;">Maximum for Each Award: $200,000.&nbsp;&nbsp;</span><span style="background-color: inherit; color: rgb(200, 38, 19);">You do not need to submit a grant request for the maximum amount.&nbsp;&nbsp;Submit a budget that suited to your organization\'s capacity and tailored to accomplish your goals.&nbsp;&nbsp;Note that smaller awards may be approved more quickly.\u202f</span><span style="color: rgb(200, 38, 19);">&nbsp;</span></p>\n<p>&nbsp;</p>\n<p><span style="background-color: inherit;">Deadline for Applications:&nbsp;&nbsp;Rolling until June&nbsp;1, 2021.&nbsp;&nbsp;</span><span style="background-color: inherit; color: rgb(200, 38, 19);">Applications are reviewed every month, and funds are distributed as needed on a first-come-first served basis. Feel free to submit your application at any time between now and June&nbsp;1.</span><span style="background-color: inherit;">&nbsp;&nbsp;Some grants will be awarded immediately, but in some cases, grant funding will not be available until September 2021.</span>&nbsp;</p>\n<p>&nbsp;</p>\n<p><span style="background-color: inherit;">Please carefully follow all instructions below.&nbsp;Please use the grant application document and budget template found on our website.</span>&nbsp;</p>\n<p><strong style="background-color: inherit;">Purpose:&nbsp;</strong><span style="background-color: inherit;">PAS Moscow invites proposals for projects that\u202f</span><strong style="background-color: inherit;">promote&nbsp;American and Russian cooperation in multinational and binational frameworks to address global issues, rule of law, and international agreements and partnerships</strong><span style="background-color: inherit;">.&nbsp;</span><strong style="background-color: inherit;">&nbsp;</strong><span style="background-color: inherit;">The program promotes a broader understanding of the impact of American and&nbsp;Russian&nbsp;cooperation and multinational engagement.&nbsp;Competitive proposals should also include a connection with American expert(s), organization(s), or institution(s)&nbsp;that will promote increased cooperation between the people of the United States and Russia even after the program has&nbsp;ended.</span>&nbsp;</p>\n<p>&nbsp;</p>\n<p><strong style="background-color: inherit;">COVID-19 disclaimer:</strong><span style="background-color: inherit;">&nbsp;Out of an abundance of caution,&nbsp;due to the ongoing pandemic,&nbsp;PAS Moscow cannot fund projects that involve travel or in-person interaction at this time.&nbsp;Any proposal that includes an in-person element must also explain how the program would take place virtually in the likely event that in-person activities remain impossible.&nbsp;Proposals that do not include such a plan may not be given further consideration by the grants committee.&nbsp;</span>&nbsp;</p>\n<p>&nbsp;</p>\n<p><span style="background-color: inherit;">Proposals can include, but are not limited to, the following themes:&nbsp;</span>&nbsp;</p>\n<p>&nbsp;</p>\n<p><strong style="background-color: inherit;">Area One:&nbsp;International Commitments</strong>&nbsp;</p>\n<ul>\n <li><span style="background-color: inherit;">Increasing awareness and addressing global issues, rule of law, and international agreements and&nbsp;partnerships;</span>&nbsp;</li>\n <li><span style="background-color: inherit;">American and Russian collaboration in international festivals and events promoting global citizenry and&nbsp;issues;</span>&nbsp;</li>\n <li><span style="background-color: inherit;">Promoting&nbsp;pluralism&nbsp;and&nbsp;&nbsp;peaceful&nbsp;coexistence,&nbsp;of&nbsp;different interests and values within society</span>&nbsp;</li>\n</ul>\n<p>&nbsp;</p>\n<p><strong style="background-color: inherit;">Area Two:&nbsp;Educating Global Citizens</strong>&nbsp;</p>\n<ul>\n <li><span style="background-color: inherit;">Promoting English language skills in Russia and US-Russia English language partnerships</span>&nbsp;</li>\n <li><span style="background-color: inherit;">Supporting US-Russia debate partnerships and debate education and skill-building in Russia</span>&nbsp;</li>\n <li><span style="background-color: inherit;">Developing civil-society educational programming</span>&nbsp;</li>\n</ul>\n<p>&nbsp;</p>\n<p><strong style="background-color: inherit;">Area Three:&nbsp;Bilateral Cooperation</strong>&nbsp;</p>\n<ul>\n <li><span style="background-color: inherit;">Supporting Sister City projects and cooperation</span>&nbsp;</li>\n <li><span style="background-color: inherit;">Informing about the 80th&nbsp;anniversary of the&nbsp;World War II&nbsp;Lend-Lease Act&nbsp;</span>&nbsp;</li>\n</ul>\n<p>&nbsp;</p>\n<p><span style="background-color: inherit;">Proposals may&nbsp;</span><strong style="background-color: inherit;">NOT</strong><span style="background-color: inherit;">:&nbsp;</span>&nbsp;</p>\n<ul>\n <li><span style="background-color: inherit;">Solely benefit one Russian or American entity, business, or university.</span>&nbsp;</li>\n <li><span style="background-color: inherit;">Generate income. All income generated from grant projects must be used to further the goals of the programs.&nbsp;&nbsp;For example, income earned through a demonstration project must be used to extend the duration of the project or fund more participants.</span>&nbsp;</li>\n <li><span style="background-color: inherit;">Should not primarily focus on academic or scientific research.</span>&nbsp;</li>\n <li><span style="background-color: inherit;">Should not be political in nature.</span>&nbsp;</li>\n</ul>\n<p>&nbsp;</p>\n<p><strong style="background-color: inherit;">Deadline for Applications</strong><span style="background-color: inherit;">:&nbsp;Rolling until June&nbsp;1, 2021</span>&nbsp;</p>\n<p><strong style="background-color: inherit;">Total Amount Available:&nbsp;</strong><span style="background-color: inherit;">Amount pending funds availability</span>&nbsp;</p>\n<p><strong style="background-color: inherit;">Maximum for Each Award:&nbsp;&nbsp;</strong><span style="background-color: inherit;">&nbsp;$200,000</span>&nbsp;</p>\n<p><strong style="background-color: inherit;">Who may apply:&nbsp;&nbsp;U.S. and Russian nonprofits,&nbsp;institutions of higher education, NGOs and civil society, museums, parks, reserves, and community&nbsp;organizations.&nbsp;</strong><span style="background-color: inherit;">Corporate entries and individuals may only receive grant funding in limited circumstances.&nbsp;Please email us if you have questions about funding for corporations or individuals.</span>&nbsp;</p>\n<p><span style="background-color: inherit;">&nbsp;&nbsp;</span>&nbsp;</p>\n<p><span style="background-color: inherit;">Please download the instructions and budget template.</span>&nbsp;</p>', 1174, 1175, 'character maps to <undefined>'),)

我想删除那些标记和未定义的字符,以使我的文本干净并避免数据库错误。 我该怎么做?非常感谢你的帮助


Tags: andofthetoinforthatstyle