将多个.doc合并到一个.cs中

2条回答

网友

1楼 · 编辑于 2024-06-09 07:39:51

问题解决了。这是我的脚本，在数据的子样本中似乎可以很好地工作。非常感谢大家。另外，我还设法从标题中提取了日期（为了避免使问题进一步复杂化，我省略了原来的问题，因此增加了几行代码）。在

files     <- list.files(pattern = "\\.(txt)")
files.ID  <- substr(basename(files), 1, 7)  #SUBSTR() takes the first 7 characters of the name of each file

#TO OBTAIN THE DATE FROM THE FILE TITLE
a <- unlist(strsplit(unlist(files), "[^0-9]+"))  #takes all the numeric sequences from each string in the vector "files" - the first one is a space (all filenames have a space as first character - the second is the ID, the third is the date as DDMMYY ("010513")
b <- a[seq(3, length(a), 3)]  #I take only the every 3rd string which is the sequence of the date.
d <- paste(substr(b,1,2),"/",substr(b,3,4),"/",substr(b,5,6), sep="") #creates the date as dd/mm/yy
files.date <- as.POSIXct(d,format="%d/%m/%Y")

x <- length(files)
j <- 1
reports<-data.frame(matrix(0,x,3))
names(reports)<-c("ID","date","text") #creates data frame with columns ID and Text
for (i in 1:x) {
  texto<-paste(readLines(files[i]),collapse="\n ")
  strip(texto,char.keep=c(".","?","!","-","+","±","~","=","&","%","$","£","@","*","(",")",":",";",">","<"))
  reports$ID[i] <- files.ID[i]
  reports$date[i] <- files.date[i]
  reports$text[i] <- texto
}

网友

2楼 · 编辑于 2024-06-09 07:39:51

在R中，您可以使用一个循环来处理满是文件的目录，在这个循环中，使用qdap包中的read.transcript来读取并处理这些文件。qdap还将为您做一些文本分析。那个包裹的作者经常在某处，你可能会从他那里得到一个更完整的答案。但是，阅读qdap可能是你获得一个坚实的开始所需要的全部。关于循环和处理文件的细节的问题将适用于另一个问题（尽管已经有很多这样的问题，您可以通过搜索找到您需要的东西）。但下面是一个简单的循环结构，让您了解：

files <- list.files(pattern = "\\.(docx|DOCX)")
files.noext <- substr(basename(files), 1, nchar(basename(files)) - 4)
out.files <- paste(files.noext, "csv", sep = "")

for (i in 1:length(files)) {
    # process the files here with qdap, accumulating the results into a new
    # structure to be determined; write out as csv
    # you might need two passes, one to unpack the docx, then one to assemble them
    # into a single structure for further analysis
    }

相关问题更多 >

编程相关推荐

热门问题

热门文章

将多个.doc合并到一个.cs中

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >