java我如何让西班牙语中的奇怪角色消失?即使将JDB URL更改为UTF8,它仍然存在
我看到像súbito,autónomo这样的词。为什么不合适呢。我在通过JDBC将所有俄语字符输入MySQL数据库时遇到了一个问题。问题是,俄罗斯人的形象是????而不是文字。当我将JDBCURL更改为UTF-8编码时,这个问题得到了解决
jdbc:mysql://localhost/metaphor_repository?characterEncoding=utf8"
这样做并不能解决这里的问题
public void readPatterns() throws FileNotFoundException, IOException, InstantiationException, ClassNotFoundException, IllegalAccessException, SQLException {
//Code to initialize database and stuff
PreparedStatement preparedStatement = null;
String key1 = null;
String databaseURL = "jdbc:mysql://localhost/metaphor_repository?characterEncoding=utf8";
String databaseUser = "root";
String databasePassword = "D0samrD9";
String dbName = "metaphor_repository";
Connection conn = null;
Class.forName("com.mysql.jdbc.Driver").newInstance();
conn = DriverManager.getConnection(databaseURL, databaseUser, databasePassword);
System.out.println("CONNECTED");
String insertTableSQL = "INSERT INTO source_domain_spanish_oy2_jul2014_2(filename, seed, words, frequency, type, after_before) VALUES(?,?,?,?,?,?);";
String foldername = "/Desktop/Espana/AdjectiveBefore/";
File Folder = new File(foldername);
File[] ListOfFiles = Folder.listFiles();
for (int x = 0; x < ListOfFiles.length; x++) {
File file = new File(ListOfFiles[x].getAbsolutePath());
InputStream in = new FileInputStream(file);
InputStreamReader reader1 = new InputStreamReader(in);
BufferedReader br = new BufferedReader(reader1);
String fileData = new String();
String filename = ListOfFiles[x].getName().toUpperCase();
int total;
BufferedWriter out;
FileWriter fstream;
BufferedWriter outLog;
String fileName = new String("/Desktop/Espana/AdjectiveBeforeResult/" + ListOfFiles[x].getName());
fstream = new FileWriter(fileName);
out = new BufferedWriter(fstream);
while ((fileData = br.readLine()) != null) {
Map<String, Integer> sortedMapDesc = searchDatabase(fileData);;
//Code Written By Aniruth to extract some info: seed, before_after
String seed = fileData;
String before_after = seed.split("\\[")[0];
seed = seed.replaceAll("\\(v.\\)", "");
seed = seed.replaceAll("\\(n.\\)", "");
seed = seed.substring(seed.indexOf("]") + 1, seed.indexOf("."));
seed = seed.substring(seed.indexOf("[") + 1, seed.indexOf("]"));
seed = seed.replaceAll("'", "");
seed = seed.trim();
seed = seed.toUpperCase();
Set<String> keySet = sortedMapDesc.keySet();
total = 0;
Iterator<String> keyItr = keySet.iterator();
out.write("++++++++++++++++++++++++++++++++++++++++++\n");
if (sortedMapDesc.isEmpty()) {
out.write(fileData + "\n");
out.write(fileData + "returned zero results \n");
out.flush();
} else {
out.write(fileData + "\n");
int i = 1;
String spaceString = " ";
while (keyItr.hasNext()) {
key1 = keyItr.next();
for (int k = 0; k < 40 - key1.length(); k++) {
spaceString = spaceString + " ";
}
total = total + sortedMapDesc.get(key1);
out.write(i + ":" + "'" + filename + "'" + ":" + "'" + seed + "'" + ":" + "'" + key1.replaceAll("'", "") + "'" + ":" + sortedMapDesc.get(key1) + ":" + "'" + "ADJ" + "'" + ":" + "'" + before_after + "'" + "\n");
//Code to add to the databases
preparedStatement = conn.prepareStatement(insertTableSQL);
preparedStatement.setString(1, filename);
preparedStatement.setString(2, seed);
preparedStatement.setString(3, key1);
if (sortedMapDesc.get(key1) != null) {
preparedStatement.setInt(4, sortedMapDesc.get(key1));
} else {
preparedStatement.setInt(4, 0);
}
preparedStatement.setString(5, "ADJ");
preparedStatement.setString(6, before_after);
System.out.println("Checking Prepared Statement:" + preparedStatement);
preparedStatement.executeUpdate();
System.out.println("Record Inserted :| ");
preparedStatement.close();
//System.out.println(out.toString());
i++;
spaceString = " ";
}
out.flush();
}
}
}
conn.close();
}
# 1 楼答案
这可能是第一个问题:
这就是使用平台默认编码加载文件,该编码可能适用于所讨论的文件,也可能不适用于所讨论的文件
以后也一样:
同样,这将使用平台默认编码
始终明确你的编码——如果你可以选择的话,UTF-8通常是一个不错的选择
接下来,找出问题真正出现的地方。将字符串中的确切UTF-16代码单位记录为整数,并尝试找出它们何时从“好”变为“坏”(如果它们一开始就很好)。有关更多详细信息,请参阅我的blog post关于诊断此类问题。类似这样的东西很有用:
(当然,要根据您的日志基础设施等进行调整。)