lambda您如何使用javastream来比较两个文件并找到某个分组中的值之和？

8 月 Questions & Answers 1060

基本上，我有两个文件：县。我和人们。txt

Countries文件中的每一行都有

郡
陈述
ZipCode

人员文件中的每一行都有

名字
姓氏
收入
ZipCode

我需要做的是计算每个使用人口的县的平均收入。在Java8中使用streams和lambdas的txt文件

我创建了两个类，它们存储所有相关信息，并具有getter和setter方法县级：字符串县、字符串州、长zipcode和int平均收入 Person类：String firstName，String lastName，int-income，long-zipcode，String-country

一旦我找到了平均收入，我想将每个人的收入与各自所在县的平均收入进行比较。这是我尝试过的，如果一个县只有一个zipcode，它会起作用，但如果不是这样，它会分别过滤zipcode，而不是为每个县合并它们

       //filter the people based on zip code , find the average county income and set it in county object
   counties.forEach(county -> {
       int sum = persons.stream().filter(p -> p.getZipcode() == county.getZipcode()).mapToInt(p -> p.getIncome()).sum();
       int count = (int)persons.stream().filter(p -> p.getZipcode() == county.getZipcode()).count();
       county.setAverageIncome(sum/count);
   });

   //filter the people based on zip code , set updated income
   counties.forEach(county -> {
       List<Person> person = persons.stream().filter(p -> p.getZipcode() == county.getZipcode()).collect(Collectors.toList());
       persons.removeAll(person);
       person.forEach(per -> {
           per.setIncome(per.getIncome() - county.getAverageIncome());
           per.setCounty(county.getCounty());
           persons.add(per);
       });

   });

我相当肯定，错误在于找到平均县收入，但有可能是更新比较收入

# 1 楼答案

你说过一个县可以有多个邮政编码。我认为你的代码没有考虑到这一点。您的县列表可能包含多个具有相同county但不同zipCode的条目。这就是为什么你用zipCode来平均收入，而不是按郡来平均收入

Map<Long, String> zipToCounty = counties.stream()
                                        .collect(Collectors.toMap(County::getZipCode, County::getCounty));

Map<String, List<Person>> peopleInCounty = persons.stream()
                                                  .collect(Collectors.groupingBy(p -> zipToCounty.get(p.getZipCode())));

Map<String, Double> averagePerCounty = peopleInCounty.entrySet()
                                                     .stream()
                                                     .collect(Collectors.toMap(e -> e.getKey(),
                                                                               e -> e.getValue()
                                                                                     .stream()
                                                                                     .mapToInt(Person::getIncome)
                                                                                     .average()
                                                                                     .orElse(0)));

在这个样本中

我首先将每个邮政编码映射到其所在的县，以便有一个快速的可能性
按县把人们聚在一起
然后我平均每个县的人均收入

现在你有了每个县的平均收入和居住人口。这样就可以很容易地对这些人进行调查，并计算他们的收入与县平均水平之间的差额

编辑

为了进一步处理数据，您应该重写县的数据类型。现在，您拥有的类仍然不能反映真实的县（可以有多个邮政编码），而只能反映文件的一个条目（只有一个邮政编码）。此外，在您最初的方法中，您将该人的实际收入替换为与县平均水平的差额。差异应该是另一个属性，因为这两件事并不意味着相同。县平均值和个人收入的差额是否应该存储在县和个人对象中（即它们是否真正属于县和个人对象），这是一个完全不同的话题

一般来说，请记住类应该反映真实世界的对象。因此，长话短说，以下是我们需要的数据类型（为了可读性，省略了getter和setter）：

public static class Person
{
    private Integer income;
    private Long    zipCode;  // not the county, as the person file does not contain it
    private String  firstname, lastname;

    private Integer aboveAverage;
}

public static class RawCounty
{
    private Long   zipCode;
    private String county;
    private String state;
}

public static class County
{
    private Set<Long> zipCodes;
    private String    county;
    private String    state;

    private Double    average;
}

接下来，我们将把算法从之前更改为：

Collection<County> counties = rawCounties.stream()
                                         .collect(Collectors.groupingBy(RawCounty::getCounty,
                                                                        Collectors.mapping(raw -> {
                                                                            County county = new County();
                                                                            county.setState(raw.getState());
                                                                            county.setCounty(raw.getCounty());
                                                                            county.setZipCodes(new HashSet<>(Arrays.asList(raw.getZipCode())));
                                                                            return county;
                                                                        }, Collectors.reducing((c1, c2) -> {
                                                                            c1.getZipCodes()
                                                                              .addAll(c2.getZipCodes());
                                                                            return c1;
                                                                        }))))
                                         .values()
                                         .stream()
                                         .map(Optional::get)
                                         .collect(Collectors.toSet());

Map<Long, County> zipToCounty = new HashMap<>();
counties.forEach(c -> c.getZipCodes().forEach(z -> zipToCounty.put(z, c)));

Map<String, List<Person>> peopleInCounty = persons.stream()
                                                  .collect(Collectors.groupingBy(p -> zipToCounty.get(p.getZipCode())
                                                                                                 .getCounty()));

counties.forEach(c -> c.setAverage(peopleInCounty.get(c.getCounty())
                                                 .stream()
                                                 .mapToInt(Person::getIncome)
                                                 .average()
                                                 .orElse(0)));

counties.forEach(c -> peopleInCounty.get(c.getCounty())
                                    .forEach(p -> p.setAboveAverage(p.getIncome() - c.getAverage())));

那么，我们在这里做什么

我们将所有县文件条目（RawCounty）按county分组，为它们创建County对象并将它们合并在一起
为了在步骤3中更轻松地查找邮政编码所在的县，我们现在将每个邮政编码映射到它所属的县
与之前相同：按县分组所有人员
对于每个县，获取居民，计算他们的平均收入，并将其设置为该国的average值
再次检查各县和人民，读取现有的县平均值，并计算其与个人收入之间的差额

现在，列表中的元素counties和persons应该被很好地初始化了

Python中文网

有 Java 编程相关的问题?

lambda您如何使用javastream来比较两个文件并找到某个分组中的值之和？

共 (1) 个答案

# 1 楼答案