有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java字符串搜索算法

我有一个包含数千行数字和文本的文件。 我想创建这些行的文件,其中只包含一些特定的关键字。这是我的代码,但在输出文件中我可以看到一些没有这些关键字的行。 我把输入数据和输出数据的样本放在最后

import java.io.*;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class TagIdentifier {
/**
 * creates an Instance of TagIdentifier to find the tags in a file .
 * <p>
 * The euclidean distance will be used as default distance measure.
 *
 * @param inputFilePath the name and address of the output file
 * @param OutputFilePath the name and address of the file that should be read
 * @param fieldSeparator the character for split the fields in line
 * @param keywords the list of tag/keyword which should be found
 * @throws FileNotFoundException, IOException
 */
public TagIdentifier(String inputFilePath, String OutputFilePath, String fieldSeparator,List<String> keywords )
        throws FileNotFoundException, IOException {
    /*Create a file to write the result in*/
    FileWriter fileStream = new FileWriter(OutputFilePath, false);
    BufferedWriter fileResult = new BufferedWriter(fileStream);
    /* Create a file reade and buffer the data */
    FileReader flickrFileReader = new FileReader(inputFilePath);
    BufferedReader bufferedReader = new BufferedReader(flickrFileReader);
    //StringBuffer stringBuffer = new StringBuffer();
    String line;
    int linecount = 0;
    while ((line = bufferedReader.readLine()) != null) {
        linecount++;
        for (String keyword : keywords) {
            //keyword = "//b"+keyword+"//b";
            Pattern p = Pattern.compile(keyword, Pattern.CASE_INSENSITIVE);
            Matcher m = p.matcher(line);
            if(m.find()){
                fileResult.write(line);
                fileResult.newLine();
                break;
            }
        }
    }
    fileResult.close();
    fileStream.close();
}
}
<B>Sample of list of tags:</B><br>
[place_of_worship, place of worship, religious_administration, cathedral, chapel, mosque, Church, temple, Religion, animist, bahai, buddhist, christian, hindu, jain, jewish, multifaith, muslim, pagan, pastafarian, scientologist, shinto, sikh, spiritualist, taoist, unitarian, yazidi, zoroastrian, nichiren, jodo_shinshu, jodo_shu, vajrayana, shingon_shu, zen, thai_mahanikaya, thai_thammayut, ahmadiyya, alaouite, druze, ibadi, ismaili, nondenominational, shia, sunni, sufi, asatru, celtic, greco-roman, wicca, EVKdFSMiD, VKdFSMA, CotFSM, irani, parsi, alternative, ashkenazi, buchari, conservative, egalitarian, hasidic    , humanistic    , kabbalistic   , karaite   , liberal   , lubavitch , lubavitch_messianic   , mizrachi_baghdadi , mizrachi_chida    , mizrachi_jerusalemite , mizrachi_livorno  , mizrachi_moroccan , modern_orthodox   , neo_orthodox  , nondenominational , orthodox, Orthodox Judaism, orthodox_ashkenaz , orthodox_sefard   , progressive   , reconstructionist , reform    , renewal   , samaritan , sefardi   , sefardi_amsterdam , sefardi_london    , traditional   , ultra_orthodox    , unaffiliated  , yemenite  , yemenite_baladi   , yemenite_shami    , Devi/Bhagavati, Krishna, Siva, Parasurama, Muthappan, adventist, alliance, anglican, assemblies_of_god, apostolic, armenian_apostolic, assyrian, baptist, catholic, catholic_apostolic, christ_scientist, christian_community, church_of_scotland, church_of_sweden, coptic_orthodox, czechoslovak_hussite, dutch_reformed, episcopal, evangelical, evangelical_covenant, exclusive_brethren, foursquare, greek_catholic, greek_orthodox, iglesia_ni_cristo, jehovahs_witness, kimbanguist, living_waters_church, lutheran, mariavite, maronite, mennonite, messianic_jewish, methodist, mission_covenant_church_of_sweden, moravian, mormon, nazarene, new_apostolic, nondenominational, orthodox, old_believers, old_catholic, pentecostal, philippine_independent, polish_catholic, polish_national_catholic, presbyterian, protestant, quaker, reformed, roman_catholic, russian_orthodox, salvation_army, santo_daime, serbian_orthodox, seventh_day_adventist, spiritist, united, united_church_of_christ, united_free_church_of_scotland, united_methodist, united_reformed, uniting]
    <br>
<b>sample of lines that I want to filter by tags: <b> <br>

35653969    15  -0.14235    51.506416   74937968@N00    DSC02635    1124566870  1085303897  http://www.flickr.com/photos/mount_otz/35653969/        6   UK;England;London;Hyde Park;Speaker's Corner;Singers    uk;england;london;hydepark;speakerscorner;singers<br>                                                                                                                       
35654116    15  -0.14235    51.506416   74937968@N00    DSC02641    1124566908  1085304006  http://www.flickr.com/photos/mount_otz/35654116/        5   UK;England;London;Hyde Park;Speaker's Corner    uk;england;london;hydepark;speakerscorner<br>                                                                                                                           
35654245    15  -0.14235    51.506416   74937968@N00    DSC02639    1124566937  1085303967  http://www.flickr.com/photos/mount_otz/35654245/    "Today Speaker's Corner has three main topics, religion, the &quot;evil USA&quot; and the war - he was overdoing the first one....<br />"   5   UK;England;London;Hyde Park;Speaker's Corner    uk;england;london;hydepark;speakerscorner<br>


<b>Sample of input<b>
1934995263  15  -0.072269   51.502712   99245765@N00    "Zaha Hadid Exhibition, Abu Dhabi Performing Arts Centre, 2007- Ongoing"    1194630416  1194615799  http://www.flickr.com/photos/blahflowers/1934995263/        7   architecture;buildings;Abu Dhabi;United Arab Emirates;London;Zaha Hadid;Design Museum   architecture;buildings;abudhabi;unitedarabemirates;london;zahahadid;designmuseum                                                                                                                                        
1935258871  15  -0.128198   51.508354   20914166@N00    lomographers    1194632555  1194632555  http://www.flickr.com/photos/dreifachzucker/1935258871/ if I only remembered all the names. 22  "voigtl?�nder;bessa;rangefinder;wide angle;bessa L;super wide heliar aspherical 15mm f:4,5;film;agfa ultra 100;c41;analog;analogue;september;2007;september 21, 2007;september 2007;september 21 til 23, 2007;london;england;uk;united kingdom;lomography world congress 2007;lomo green shoot with scootiepye" voigtl?�nder;bessa;rangefinder;wideangle;bessal;superwideheliaraspherical15mmf45;film;agfaultra100;c41;analog;analogue;september;2007;september212007;september2007;september21til232007;london;england;uk;unitedkingdom;lomographyworldcongress2007;lomogreenshootwithscootiepye

共 (0) 个答案