欢迎光临散文网 会员登陆 & 注册

使用正则匹配百度的疫情数据地图

2020-05-02 13:25 作者:雾削木FHZ  | 我要投稿

    51节UP在家闲着无聊又尝试了下分析百度的疫情数据,那么UP主发现这次的数据和上次尝试分析的数据完全不同了,因为百度老是换数据结构所以不是很建议大家使用百度的数据写获取装置,那么UP主通过对https://voice.baidu.com/act/newpneumonia/newpneumonia/?from=osari_pc_3这条主要的数据包分析,发现这次的结构分为三层。

第一层:国内数据

从 "caseList":[ 开始之后就是国内的数据;数据结构为:

//省区数据

{"confirmed":"1","died":"0","crued":"1","relativeTime":"1588176000","confirmedRelative":"0","diedRelative":"0","curedRelative":"0","curConfirm":"0","curConfirmRelative":"0","icuDisable":"1","area":"\u897f\u85cf","subList":[{"city":"\u62c9\u8428","confirmed":"1","died":"0","crued":"1","confirmedRelative":"0","curConfirm":"0","cityCode":"100"}]},

从这条数据可以分析出这个省的总数据和其城市的数据

城市数据从 "subList":[ 开始之后就以这样的数据结构开始

{"city":"\u62c9\u8428","confirmed":"1","died":"0","crued":"1","confirmedRelative":"0","curConfirm":"0","cityCode":"100"}

那么这样我们可以先在源代码中将省区分开然后再解析城市,UP通过树形框演示是这个样子:

效果图

其中有些地方因为不统一所以会出现错乱。

第二层:国家

这层存放了各个国家的数据和省份的数据;

{"confirmed":"1","died":"","crued":"","relativeTime":"1588089600","confirmedRelative":"","curConfirm":"1","icuDisable":"1","area":"\u79d1\u6469\u7f57","subList":[]},

和上面的国内结构差不多,也是 "subList":[] 里面存放子区数据;

这层数据的开始是以 "caseOutsideList":[

结尾会以一堆杂乱的数据进行填充除了:{"confirmed":"84387","died":"4643","cured":"78893","asymptomatic":"981","asymptomaticRelative":"25","unconfirmed":"9","relativeTime":"1588176000","confirmedRelative":"12","unconfirmedRelative":"3","curedRelative":"60","diedRelative":"0","icu":"38","icuRelative":"-3","overseasInput":"1670","unOverseasInputCumulative":"82715","overseasInputRelative":"6","unOverseasInputNewAdd":"6","curConfirm":"851","curConfirmRelative":"-48","icuDisable":"1"},"summaryDataOut":{"confirmed":"3242610","died":"230232","curConfirm":"2033124","cured":"979254","confirmedRelative":"64731","curedRelative":"42292","diedRelative":"5828","curConfirmRelative":"16611","relativeTime":"1588176000"},

之外还有很多填充的杂乱数据。

第三层:板块

板块的开始位于第二层的填充数据结尾,以:"globalList":[  为开始。

其中{"area":"\u4e9a\u6d32","subList":[   为板块开始

之间的国家数据以:{"confirmed":"15","died":"","crued":"","relativeTime":1588089600,"confirmedRelative":"","curConfirm":"15","country":"\u5854\u5409\u514b\u65af\u5766"}  为结构。

在最后一个 ], 结尾之后会直接来上一段板块的总数据:"died":"14128","crued":"190055","confirmed":"442739","curConfirm":"238556","confirmedRelative":"11282"},

之后又会以一个  {"area":"\u6b27\u6d32","subList":[  开始下一个板块的数据,

板块数据包含:亚洲、欧洲、非洲、北美洲、南美洲、大洋洲、其他。

此外还有一个不知名的数据,UP翻译过来之后是 热门 的意思,把UP都搞蒙了。

那么我们可以以这个热门为结尾的数据:{"area":"\u70ed\u95e8","subList":[

也可以以 "foreignTrendList": 为热门的结尾。

从分析上面三个层的数据之后便可以用正则或其他方法匹配其中的数据,UP主这里因为假期不够所以只弄了国内的。

数据结构txt下载:https://lanzous.com/ic6bg0f

国内数据更新源码:https://lanzous.com/ic6bj7a

好不容易放了两天假UP主要休息去了~!

使用正则匹配百度的疫情数据地图的评论 (共 条)

分享到微博请遵守国家法律