在数据库中查找“代表性”数据组 - 问答 - Python中文网

在数据库中查找“代表性”数据组

2024-04-18 15:08:52 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我在数据库中有这些数据，大致如下所示：

id (int) not null unique | measurement_id (int) not null | range_id (int) not null unique | temperature (int) | TimeOfDay (string) either Dawn or Day or Night | Weather.Clear (Boolean) either true or null | Weather.Cloudy (Boolean) either true or null | Weather.Fog (Boolean) either true or null | Weather.Snow (Boolean) either true or null | Area.City (Boolean) either true ot null | Area.Country (Boolean) either true or null | etc

有成百上千行这样的数据，假设有人对这些数据进行了统计，例如40%的行在（TimeOfDay（string）Dawn或Day或Night）列中有Day，65%是真的(天气晴朗（布尔值）true或null）列等。如果天气晴朗则设置为true天气。多云为空等

我的工作是找到“代表性”组，比如说1000行数据。所以我需要1000=400行中有40%的行有Day in（TimeOfDay（string）或Dawn或Day或Night）列，其中65%（650）行有true in(天气晴朗（布尔）true或null）列等

我意识到仅仅使用SQL（oracle）查询是非常困难的（也许我错了），那么在这里我应该用什么样的方法来使用像python这样的通用编程语言来获得我需要的结果呢？有什么算法吗？你知道吗

敬礼。你知道吗

Tags： or 数据 id true string not null int

1条回答

网友

1楼 · 发布于 2024-04-18 15:08:52

在统计学中，获得一组具有代表性的数据的一种方法是random sampling。你知道吗

在SQL中可能实现的简单方法如下

1）为表中的每一行指定一个介于0和1之间的随机值

2）对随机列上的数据进行排序

3）按定义的顺序获取前N行

SELECT id
FROM
  (SELECT id,
    rnd
  FROM
    ( SELECT id, dbms_random.value rnd FROM t
    )
  ORDER BY rnd
  )
WHERE rownum <= 1000;

相关问题更多 >

编程相关推荐

热门问题

热门文章