我有一个HTML源代码文件中的以下原始数据
{$deletedFields:[courses,projects,description,degreeName,recommendations,honors,entityLocale,activities,grade,fieldOfStudyUrn,testScores,degreeUrn],entityUrn:urn:li:fs_education:(ACoAAAIUJvgBC7QTHSmLEjgtomzxvwceeM71E1c,75863717),school:urn:li:fs_miniSchool:11709,timePeriod:urn:li:fs_education:(ACoAAAIUJvgBC7QTHSmLEjgtomzxvwceeM71E1c,75863717),timePeriod,schoolName:Charles University in Prague,fieldOfStudy:Economics, Politics,schoolUrn:urn:li:fs_miniSchool:11709,$type:com.linkedin.voyager.identity.profile.Education},
{$deletedFields:[courses,projects,description,recommendations,honors,entityLocale,activities,grade,fieldOfStudyUrn,testScores,degreeUrn],entityUrn:urn:li:fs_education:(ACoAAAIUJvgBC7QTHSmLEjgtomzxvwceeM71E1c,26812055),school:urn:li:fs_miniSchool:17888,timePeriod:urn:li:fs_education:(ACoAAAIUJvgBC7QTHSmLEjgtomzxvwceeM71E1c,26812055),timePeriod,degreeName:BA,schoolName:Occidental College,fieldOfStudy:Economics,schoolUrn:urn:li:fs_miniSchool:17888,$type:com.linkedin.voyager.identity.profile.Education},
{$deletedFields:[],profileId:ACoAAAIUJvgBC7QTHSmLEjgtomzxvwceeM71E1c,elements:[urn:li:fs_education:(ACoAAAIUJvgBC7QTHSmLEjgtomzxvwceeM71E1c,26812055),urn:li:fs_education:(ACoAAAIUJvgBC7QTHSmLEjgtomzxvwceeM71E1c,75863717)],paging:urn:li:fs_profileView:ACoAAAIUJvgBC7QTHSmLEjgtomzxvwceeM71E1c,educationView,paging,$type:com.linkedin.voyager.identity.profile.EducationView,$id:urn:li:fs_profileView:ACoAAAIUJvgBC7QTHSmLEjgtomzxvwceeM71E1c,educationView},
{$deletedFields:[],start:501,end:1000,$type:com.linkedin.voyager.identity.profile.EmployeeCountRange,$id:urn:li:fs_position:(ACoAAAIUJvgBC7QTHSmLEjgtomzxvwceeM71E1c,323432440),company,employeeCountRange}
{$deletedFields:[month,day],year:2007,$type:com.linkedin.common.Date,$id:urn:li:fs_education:(ACoAAAIUJvgBC7QTHSmLEjgtomzxvwceeM71E1c,75863717),timePeriod,startDate},
{$deletedFields:[month,day],year:2004,$type:com.linkedin.common.Date,$id:urn:li:fs_education:(ACoAAAIUJvgBC7QTHSmLEjgtomzxvwceeM71E1c,26812055),timePeriod,startDate},
{$deletedFields:[month,day],year:2008,$type:com.linkedin.common.Date,$id:urn:li:fs_education:(ACoAAAIUJvgBC7QTHSmLEjgtomzxvwceeM71E1c,26812055),timePeriod,endDate},
{$deletedFields:[month,day],year:2007,$type:com.linkedin.common.Date,$id:urn:li:fs_education:(ACoAAAIUJvgBC7QTHSmLEjgtomzxvwceeM71E1c,75863717),timePeriod,endDate},
我需要的是从中提取一些数据。你知道吗
schoolname = re.findall(r',schoolname:(.*?),' , page_html)
fieldofstudy = skills = re.findall(r'fieldOfStudy:(.*?),s' , page_html)
degreename = re.findall(r'degreeName:(.*?),' , page_html)
s码校名:查尔斯大学在布拉格
五研究方向:经济学,政治
开始时间:2007年
结束日期:2007年
s码学校名称:西方学院你知道吗
五研究方向:经济学你知道吗
d级白鹭名:BA你知道吗
开始时间:2004年
结束日期:2008年
定义数据容器
class School
:逐行读取文件:
用Python:3.4.2测试
相关问题 更多 >
编程相关推荐