java字符串搜索算法
我有一个包含数千行数字和文本的文件。 我想创建这些行的文件,其中只包含一些特定的关键字。这是我的代码,但在输出文件中我可以看到一些没有这些关键字的行。 我把输入数据和输出数据的样本放在最后
import java.io.*;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TagIdentifier {
/**
* creates an Instance of TagIdentifier to find the tags in a file .
* <p>
* The euclidean distance will be used as default distance measure.
*
* @param inputFilePath the name and address of the output file
* @param OutputFilePath the name and address of the file that should be read
* @param fieldSeparator the character for split the fields in line
* @param keywords the list of tag/keyword which should be found
* @throws FileNotFoundException, IOException
*/
public TagIdentifier(String inputFilePath, String OutputFilePath, String fieldSeparator,List<String> keywords )
throws FileNotFoundException, IOException {
/*Create a file to write the result in*/
FileWriter fileStream = new FileWriter(OutputFilePath, false);
BufferedWriter fileResult = new BufferedWriter(fileStream);
/* Create a file reade and buffer the data */
FileReader flickrFileReader = new FileReader(inputFilePath);
BufferedReader bufferedReader = new BufferedReader(flickrFileReader);
//StringBuffer stringBuffer = new StringBuffer();
String line;
int linecount = 0;
while ((line = bufferedReader.readLine()) != null) {
linecount++;
for (String keyword : keywords) {
//keyword = "//b"+keyword+"//b";
Pattern p = Pattern.compile(keyword, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(line);
if(m.find()){
fileResult.write(line);
fileResult.newLine();
break;
}
}
}
fileResult.close();
fileStream.close();
}
}
<B>Sample of list of tags:</B><br>
[place_of_worship, place of worship, religious_administration, cathedral, chapel, mosque, Church, temple, Religion, animist, bahai, buddhist, christian, hindu, jain, jewish, multifaith, muslim, pagan, pastafarian, scientologist, shinto, sikh, spiritualist, taoist, unitarian, yazidi, zoroastrian, nichiren, jodo_shinshu, jodo_shu, vajrayana, shingon_shu, zen, thai_mahanikaya, thai_thammayut, ahmadiyya, alaouite, druze, ibadi, ismaili, nondenominational, shia, sunni, sufi, asatru, celtic, greco-roman, wicca, EVKdFSMiD, VKdFSMA, CotFSM, irani, parsi, alternative, ashkenazi, buchari, conservative, egalitarian, hasidic , humanistic , kabbalistic , karaite , liberal , lubavitch , lubavitch_messianic , mizrachi_baghdadi , mizrachi_chida , mizrachi_jerusalemite , mizrachi_livorno , mizrachi_moroccan , modern_orthodox , neo_orthodox , nondenominational , orthodox, Orthodox Judaism, orthodox_ashkenaz , orthodox_sefard , progressive , reconstructionist , reform , renewal , samaritan , sefardi , sefardi_amsterdam , sefardi_london , traditional , ultra_orthodox , unaffiliated , yemenite , yemenite_baladi , yemenite_shami , Devi/Bhagavati, Krishna, Siva, Parasurama, Muthappan, adventist, alliance, anglican, assemblies_of_god, apostolic, armenian_apostolic, assyrian, baptist, catholic, catholic_apostolic, christ_scientist, christian_community, church_of_scotland, church_of_sweden, coptic_orthodox, czechoslovak_hussite, dutch_reformed, episcopal, evangelical, evangelical_covenant, exclusive_brethren, foursquare, greek_catholic, greek_orthodox, iglesia_ni_cristo, jehovahs_witness, kimbanguist, living_waters_church, lutheran, mariavite, maronite, mennonite, messianic_jewish, methodist, mission_covenant_church_of_sweden, moravian, mormon, nazarene, new_apostolic, nondenominational, orthodox, old_believers, old_catholic, pentecostal, philippine_independent, polish_catholic, polish_national_catholic, presbyterian, protestant, quaker, reformed, roman_catholic, russian_orthodox, salvation_army, santo_daime, serbian_orthodox, seventh_day_adventist, spiritist, united, united_church_of_christ, united_free_church_of_scotland, united_methodist, united_reformed, uniting]
<br>
<b>sample of lines that I want to filter by tags: <b> <br>
35653969 15 -0.14235 51.506416 74937968@N00 DSC02635 1124566870 1085303897 http://www.flickr.com/photos/mount_otz/35653969/ 6 UK;England;London;Hyde Park;Speaker's Corner;Singers uk;england;london;hydepark;speakerscorner;singers<br>
35654116 15 -0.14235 51.506416 74937968@N00 DSC02641 1124566908 1085304006 http://www.flickr.com/photos/mount_otz/35654116/ 5 UK;England;London;Hyde Park;Speaker's Corner uk;england;london;hydepark;speakerscorner<br>
35654245 15 -0.14235 51.506416 74937968@N00 DSC02639 1124566937 1085303967 http://www.flickr.com/photos/mount_otz/35654245/ "Today Speaker's Corner has three main topics, religion, the "evil USA" and the war - he was overdoing the first one....<br />" 5 UK;England;London;Hyde Park;Speaker's Corner uk;england;london;hydepark;speakerscorner<br>
<b>Sample of input<b>
1934995263 15 -0.072269 51.502712 99245765@N00 "Zaha Hadid Exhibition, Abu Dhabi Performing Arts Centre, 2007- Ongoing" 1194630416 1194615799 http://www.flickr.com/photos/blahflowers/1934995263/ 7 architecture;buildings;Abu Dhabi;United Arab Emirates;London;Zaha Hadid;Design Museum architecture;buildings;abudhabi;unitedarabemirates;london;zahahadid;designmuseum
1935258871 15 -0.128198 51.508354 20914166@N00 lomographers 1194632555 1194632555 http://www.flickr.com/photos/dreifachzucker/1935258871/ if I only remembered all the names. 22 "voigtl?�nder;bessa;rangefinder;wide angle;bessa L;super wide heliar aspherical 15mm f:4,5;film;agfa ultra 100;c41;analog;analogue;september;2007;september 21, 2007;september 2007;september 21 til 23, 2007;london;england;uk;united kingdom;lomography world congress 2007;lomo green shoot with scootiepye" voigtl?�nder;bessa;rangefinder;wideangle;bessal;superwideheliaraspherical15mmf45;film;agfaultra100;c41;analog;analogue;september;2007;september212007;september2007;september21til232007;london;england;uk;unitedkingdom;lomographyworldcongress2007;lomogreenshootwithscootiepye
共 (0) 个答案