如何删除嵌套在其他双引号正则表达式中的双引号

2024-06-01 01:03:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试使用json.loads加载使用Beautiful Soup收集的数据。但是,我使用的数据存在一个问题,即某些字段中包含双引号。例如:

"rComments":"He is a very easy grader, but gets boring occasionally. I wish he would quit saying "Without further ado..." Cancer Bio is a great class because there is a different lecturer each time."

这会导致以下错误:

JSONDecodeError: Expecting ',' delimiter: line 1 column 3556 (char 3555)

有没有一种方法可以使用正则表达式或其他方法将“无需进一步修改…”附近的双引号替换为单引号/无引号?我需要维护其他双引号,因为它们是JSON所需要的

这是我的密码副本。对于任何具有嵌套双引号的Prof ID,它都会失败

# Make Request
url1 = 'https://www.ratemyprofessors.com/paginate/professors/ratings?tid={}&filter=&courseCode=&page=1'.format(124880)
page1 = requests.get(url1)
soup1 = BeautifulSoup(page1.text, "html.parser")
soup1 = str(soup1)

# Remove Double Quotes in Comments
soup1 = re.sub(r'(?:[\b\s\:]\".*)(?:.*)(\")(?:.*\")', '', soup1)

# Create Dictionary
Dict1 = json.loads(soup1)

我也试过下面的正则表达式,但它也不起作用

:r"(\".*?)\"(.*?)\"(.*\")

作为参考,这是repr(soup1)返回的内容

'\'{"ratings":[{"attendance":"N/A","clarityColor":"good","easyColor":"average","helpColor":"good","helpCount":2,"id":29366967,"notHelpCount":0,"onlineClass":"","quality":"awesome","rClarity":5,"rClass":"BIOL4015","rComments":"One of my favorite professors at Tech. Really cares about his students, and even brought us apples from Elijay and snacks during the final. His tests are not too bad and the group project is pretty easy. Good teacher and even better human being.","rDate":"01/01/2018","rEasy":3.0,"rEasyString":"3.0","rErrorMsg":null,"rHelpful":5,"rInterest":"N/A","rOverall":5.0,"rOverallString":"5.0","rStatus":1,"rTextBookUse":"Yes","rTimestamp":1514816343000,"rWouldTakeAgain":"Yes","sId":361,"takenForCredit":"Yes","teacher":null,"teacherGrade":"B+","teacherRatingTags":["Inspirational","Caring"],"unUsefulGrouping":"people","usefulGrouping":"people"},{"attendance":"Not Mandatory","clarityColor":"good","easyColor":"average","helpColor":"good","helpCount":0,"id":28805507,"notHelpCount":0,"onlineClass":"","quality":"awesome","rClarity":5,"rClass":"BIOL3450","rComments":"GOAT","rDate":"10/30/2017","rEasy":3.0,"rEasyString":"3.0","rErrorMsg":null,"rHelpful":5,"rInterest":"N/A","rOverall":5.0,"rOverallString":"5.0","rStatus":1,"rTextBookUse":"Yes","rTimestamp":1509404689000,"rWouldTakeAgain":"Yes","sId":361,"takenForCredit":"Yes","teacher":null,"teacherGrade":"A","teacherRatingTags":["Caring","Get ready to read","Accessible outside class"],"unUsefulGrouping":"people","usefulGrouping":"people"},{"attendance":"N/A","clarityColor":"average","easyColor":"good","helpColor":"poor","helpCount":0,"id":19977224,"notHelpCount":0,"onlineClass":"","quality":"poor","rClarity":2,"rClass":"BIOL3450","rComments":"Dr Merril is a really, really nice person, and I\\\'m sure he\\\'s great doing his research but he is just not a good professor for a lecture based class with 150ish people. He\\\'s soft spoken, moves too fast in lecture and goes into unnecessary detail. Also does not hold office hours. Would rather defer students to TA.","rDate":"03/31/2012","rEasy":4.0,"rEasyString":"4.0","rErrorMsg":null,"rHelpful":1,"rInterest":"Low","rOverall":1.5,"rOverallString":"1.5","rStatus":1,"rTextBookUse":"Yes","rTimestamp":1333212949000,"rWouldTakeAgain":"N/A","sId":361,"takenForCredit":"N/A","teacher":null,"teacherGrade":"N/A","teacherRatingTags":[],"unUsefulGrouping":"people","usefulGrouping":"people"},{"attendance":"N/A","clarityColor":"good","easyColor":"good","helpColor":"average","helpCount":0,"id":15545116,"notHelpCount":0,"onlineClass":"","quality":"good","rClarity":5,"rClass":"BIOL3340","rComments":"Dr. Merrill is a very nice man and a decent teacher. Class attendance isn\\\'t necessary, however, he does offer extra credit for attendence occasionally. The class is all memorization and a lot of nit-picky information. Didn\\\'t like the class too much, but he was a fine teacher.","rDate":"03/18/2009","rEasy":4.0,"rEasyString":"4.0","rErrorMsg":null,"rHelpful":2,"rInterest":"Meh","rOverall":3.5,"rOverallString":"3.5","rStatus":1,"rTextBookUse":"Yes","rTimestamp":1237418592000,"rWouldTakeAgain":"N/A","sId":361,"takenForCredit":"N/A","teacher":null,"teacherGrade":"N/A","teacherRatingTags":[],"unUsefulGrouping":"people","usefulGrouping":"people"},{"attendance":"N/A","clarityColor":"good","easyColor":"poor","helpColor":"good","helpCount":1,"id":10944025,"notHelpCount":0,"onlineClass":"","quality":"awesome","rClarity":5,"rClass":"BIOL8802","rComments":"He is a very easy grader, but gets boring occasionally. I wish he would quit saying "Without further ado..." Cancer Bio is a great class because there is a different lecturer each time.","rDate":"11/18/2005","rEasy":1.0,"rEasyString":"1.0","rErrorMsg":null,"rHelpful":4,"rInterest":"It\\\'s my life","rOverall":4.5,"rOverallString":"4.5","rStatus":1,"rTextBookUse":"N/A","rTimestamp":1132303531000,"rWouldTakeAgain":"N/A","sId":361,"takenForCredit":"N/A","teacher":null,"teacherGrade":"N/A","teacherRatingTags":[],"unUsefulGrouping":"people","usefulGrouping":"person"},{"attendance":"N/A","clarityColor":"good","easyColor":"average","helpColor":"good","helpCount":0,"id":614809,"notHelpCount":0,"onlineClass":"","quality":"awesome","rClarity":4,"rClass":"3331","rComments":"Not very challenging","rDate":"02/22/2003","rEasy":2.0,"rEasyString":"2.0","rErrorMsg":null,"rHelpful":5,"rInterest":"N/A","rOverall":4.5,"rOverallString":"4.5","rStatus":1,"rTextBookUse":"N/A","rTimestamp":1045879151000,"rWouldTakeAgain":"N/A","sId":361,"takenForCredit":"N/A","teacher":null,"teacherGrade":"N/A","teacherRatingTags":[],"unUsefulGrouping":"people","usefulGrouping":"people"}],"remaining":0}\''

Tags: andispeoplenullclassyeshegood
1条回答
网友
1楼 · 发布于 2024-06-01 01:03:49

看起来您从中下载的API返回的是JSON,而不是HTML,因此您不需要使用BeautifulSoup对其进行解析。您可以简单地执行以下操作:

import requests


url = 'https://www.ratemyprofessors.com/paginate/professors/ratings?tid={}&filter=&courseCode=&page=1'.format(124880)
page = requests.get(url)
page.json()

相关问题 更多 >