InfluxDB 插入数据 “数据丢失” - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
kkfnui
V2EX    问与答

InfluxDB 插入数据 “数据丢失”

  •  
  •   kkfnui 2017-12-07 19:02:31 +08:00 4062 次点击
    这是一个创建于 2866 天前的主题,其中的信息可能已经有所发展或是发生改变。

    刚使用 InfluxDB, 碰到个“数据丢失问题”

    mysql 中拷贝了 400w 条日志,插入到 influxdb 后, 数据只剩下了 70w 左右。 插入 influxdb 时使用的是批量插入方式, golang 的官方客户端。

    1. 程序执行无异常
    2. 考虑到数据本身可能重复,在插入的时候添加了 mysql 的主键到 influxdb,但还存在问题

    刚使用 influxdb,不知道这个问题是 influxdb 的特性,还是自己的 bug,抑或有什么配置可以设置

    一些配置:

    > SHOW RETENTION POLICIES ON mydb; name duration shardGroupDuration replicaN default ---- -------- ------------------ -------- ------- autogen 0s 168h0m0s 1 true > select count(*) from alg_read_log; name: alg_read_log time count_content_id count_id count_user_id ---- ---------------- -------- ------------- 0 736999 736999 736999 
    12 条回复    2019-10-11 14:47:45 +08:00
    kkfnui
        1
    kkfnui  
    OP
       2017-12-07 19:12:46 +08:00
    刚再用小量数据做测试, 批量插入 20 条数据,结果只剩 7 条数据。
    psnail
        2
    psnail  
       2017-12-07 19:27:28 +08:00   1
    看下丢失的都是些什么数据,和没有丢失的数据的区别是什么
    kkfnui
        3
    kkfnui  
    OP
       2017-12-07 19:39:20 +08:00
    插入二十条数据时

    ID userid content time
    13 4706513 851759 2017-12-06 17:00:00 +0000 UTC
    8 5143055 4813070 2017-12-06 17:00:00 +0000 UTC
    6 2439692 4434993 2017-12-06 17:00:00 +0000 UTC
    4 5261489 2098786 2017-12-06 17:00:00 +0000 UTC
    10 6185307 11177143 2017-12-06 17:00:00 +0000 UTC
    12 6173230 37959 2017-12-06 17:00:00 +0000 UTC
    20 6647995 4964641 2017-12-06 17:00:00 +0000 UTC
    9 5388686 9843194 2017-12-06 17:00:00 +0000 UTC
    15 4975601 4978180 2017-12-06 17:00:00 +0000 UTC
    1 5203741 4554768 2017-12-06 17:00:00 +0000 UTC
    18 876162 11164468 2017-12-06 17:00:00 +0000 UTC
    11 4239087 11164092 2017-12-06 17:00:00 +0000 UTC
    5 5864698 4978165 2017-12-06 17:00:00 +0000 UTC
    7 4461930 7139200 2017-12-06 17:00:00 +0000 UTC
    2 6226807 168513 2017-12-06 17:00:00 +0000 UTC
    14 3687226 460986 2017-12-06 17:00:00 +0000 UTC
    16 1444384 11119087 2017-12-06 17:00:00 +0000 UTC
    19 3244527 5674265 2017-12-06 17:00:00 +0000 UTC
    17 6528511 11162885 2017-12-06 17:00:00 +0000 UTC
    3 5810244 10774121 2017-12-06 17:00:00 +0000 UTC


    > select * from alg_read_log;
    name: alg_read_log
    time alg_id alg_scene cat_id content_id id user_id
    ---- ------ --------- ------ ---------- -- -------
    1512579600000000000 -1 sc_d 0 11164092 11423908711164092 4239087
    1512579600000000000 1 sc_d 0 4434993 624396924434993 2439692
    1512579600000000000 10 sc_d 0 37959 12617323037959 6173230
    1512579600000000000 2 sc_d 2 11177143 10618530711177143 6185307
    1512579600000000000 21 sc_d 0 11164468 1887616211164468 876162
    1512579600000000000 23 sc_d 0 5674265 1932445275674265 3244527
    1512579600000000000 25 sc_d 0 4978165 558646984978165 5864698
    kkfnui
        4
    kkfnui  
    OP
       2017-12-07 19:39:56 +08:00
    插入二十五条数据时

    17 6528511 11162885 2017-12-06 17:00:00 +0000 UTC
    24 5721905 11164092 2017-12-06 17:00:00 +0000 UTC
    7 4461930 7139200 2017-12-06 17:00:00 +0000 UTC
    11 4239087 11164092 2017-12-06 17:0:00 +0000 UTC
    16 1444384 11119087 2017-12-06 17:00:00 +0000 UTC
    13 4706513 851759 2017-12-06 17:00:00 +0000 UTC
    6 2439692 4434993 2017-12-06 17:00:00 +0000 UTC
    14 3687226 460986 2017-12-06 17:00:00 +0000 UTC
    9 5388686 9843194 2017-12-06 17:00:00 +0000 UTC
    4 5261489 2098786 2017-12-06 17:00:00 +0000 UTC
    19 3244527 5674265 2017-12-06 17:00:00 +0000 UTC
    2 6226807 168513 2017-12-06 17:00:00 +0000 UTC
    23 841747 7230377 2017-12-06 17:00:00 +0000 UTC
    5 5864698 4978165 2017-12-06 17:00:00 +0000 UTC
    20 6647995 4964641 2017-12-06 17:00:00 +0000 UTC
    1 5203741 4554768 2017-12-06 17:00:00 +0000 UTC
    8 5143055 4813070 2017-12-06 17:00:00 +0000 UTC
    22 4582157 11160521 2017-12-06 17:00:00 +0000 UTC
    21 3495482 919711 2017-12-06 17:00:00 +0000 UTC
    15 4975601 4978180 2017-12-06 17:00:00 +0000 UTC
    3 5810244 10774121 2017-12-06 17:00:00 +0000 UTC
    10 6185307 11177143 2017-12-06 17:00:00 +0000 UTC
    25 5266737 11109417 2017-12-06 17:00:00 +0000 UTC
    12 6173230 37959 2017-12-06 17:00:00 +0000 UTC
    18 876162 11164468 2017-12-06 17:00:00 +0000 UTC




    > select * from alg_read_log;
    name: alg_read_log
    time alg_id alg_scene cat_id content_id id user_id
    ---- ------ --------- ------ ---------- -- -------
    1512579600000000000 -1 sc_d 0 10774121 3581024410774121 5810244
    1512579600000000000 1 sc_d 0 11109417 25526673711109417 5266737
    1512579600000000000 10 sc_d 0 37959 12617323037959 6173230
    1512579600000000000 2 sc_d 2 11177143 10618530711177143 6185307
    1512579600000000000 21 sc_d 0 11164468 1887616211164468 876162
    1512579600000000000 23 sc_d 0 5674265 1932445275674265 3244527
    1512579600000000000 25 sc_d 0 11160521 22458215711160521 4582157
    kkfnui
        5
    kkfnui  
    OP
       2017-12-07 19:41:15 +08:00
    @psnail 两次相同的数据,但是后面一次多插入了 5 条。

    结果,第一次存在的数据,在第二次里面没有出现了。诡异~
    psnail
        6
    psnail  
       2017-12-07 19:55:16 +08:00   1
    influxdb 插入数据格式为 insert measurement,tag=value field=value timestamp
    是按照时间存储的

    对于 measurement tagkey, tagvalue 和 timestamp 一样的,field 会被最新的值替换
    kkfnui
        7
    kkfnui  
    OP
       2017-12-07 20:48:45 +08:00
    @psnail
    应该是我文档没有看清晰,以为 field 不同,数据就不会被覆盖。

    由于 mysql 数据的精度到 s, 所以在并发高的时候,会有数据重复导致被覆盖
    rrfeng
        8
    rrfeng  
       2017-12-07 20:53:49 +08:00 via Android   1
    可以这么理解,influxdb 主键就是时间戳
    rswl
        9
    rswl  
       2017-12-07 21:53:32 +08:00   1
    相同的时间戳会覆盖
    psnail
        10
    psnail  
       2017-12-07 22:03:07 +08:00
    @rrfeng 要这么说的话,实际上应该是 series
    qyvlik
        11
    qyvlik  
       2019-10-11 14:08:48 +08:00
    同样遇到了这个问题,针对 time 做了特殊处理,确保 time 都是不一样的,但是从 mysql 导出数据插入到 influxdb,还是丢失了一半的数据。
    qyvlik
        12
    qyvlik  
       2019-10-11 14:47:45 +08:00
    @qyvlik 有问题的 influxdb 版本:
    InfluxDB shell version: 1.7.7
    关于     帮助文档     自助推广系统     博客     API     FAQ     Solana     3612 人在线   最高记录 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 23ms UTC 00:08 PVG 08:08 LAX 17:08 JFK 20:08
    Do have faith in what you're doing.
    ubao snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86