Hive SQL 实现问题,请大佬看下~ - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
请不要在回答技术问题时复制粘贴 AI 生成的内容
shuaishuaivip
V2EX    程序员

Hive SQL 实现问题,请大佬看下~

  •  
  •   shuaishuaivip 2024-01-31 15:44:38 +08:00 2031 次点击
    这是一个创建于 669 天前的主题,其中的信息可能已经有所发展或是发生改变。
    Hive SQL 实现问题:数据展示如下:
    第一条数据的 end 等于第二条数据的 start 就认为是连续的,一直到 LS,5,6,2024-01-27 08:44:37 2024-01-27 08:44:47 这条数据就断开了,我想取第一条的 start 和断开那条数据的 end 那就查询到的数据结果是这样的,该怎么写?
    name1,start,end,start_time,end_time
    LSJA24390MS139778 55.8 56.6 2024-01-27 08:44:23 2024-01-27 08:46:51
    LSJA24390MS139778 56.7 57.3 2024-01-27 08:46:53 2024-01-27 08:48:35
    LSJA24390MS139778 57.2 57.3 2024-01-27 08:48:37 2024-01-27 08:48:47

    name1,start,end,start_time,end_time
    LSJA24390MS139778 55.8 55.8 2024-01-27 08:44:23 2024-01-27 08:44:23
    LSJA24390MS139778 NULL 55.9 2024-01-27 08:44:25 2024-01-27 08:44:35
    LSJA24390MS139778 NULL 55.9 2024-01-27 08:44:37 2024-01-27 08:44:47
    LSJA24390MS139778 NULL 56 2024-01-27 08:44:49 2024-01-27 08:44:51
    LSJA24390MS139778 NULL 56.1 2024-01-27 08:44:53 2024-01-27 08:45:11
    LSJA24390MS139778 NULL 56.1 2024-01-27 08:45:13 2024-01-27 08:45:21
    LSJA24390MS139778 NULL 56.1 2024-01-27 08:45:23 2024-01-27 08:45:23
    LSJA24390MS139778 NULL 56.3 2024-01-27 08:45:25 2024-01-27 08:45:51
    LSJA24390MS139778 NULL 56.5 2024-01-27 08:45:53 2024-01-27 08:46:21
    LSJA24390MS139778 NULL 56.5 2024-01-27 08:46:23 2024-01-27 08:46:23
    LSJA24390MS139778 NULL 56.6 2024-01-27 08:46:25 2024-01-27 08:46:51
    LSJA24390MS139778 56.7 56.7 2024-01-27 08:46:53 2024-01-27 08:46:59
    LSJA24390MS139778 NULL 56.8 2024-01-27 08:47:01 2024-01-27 08:47:11
    LSJA24390MS139778 NULL 56.8 2024-01-27 08:47:13 2024-01-27 08:47:21
    LSJA24390MS139778 NULL 56.8 2024-01-27 08:47:23 2024-01-27 08:47:23
    LSJA24390MS139778 NULL 56.9 2024-01-27 08:47:25 2024-01-27 08:47:35
    LSJA24390MS139778 NULL 57 2024-01-27 08:47:37 2024-01-27 08:47:47
    LSJA24390MS139778 NULL 57 2024-01-27 08:47:49 2024-01-27 08:47:51
    LSJA24390MS139778 NULL 57 2024-01-27 08:47:53 2024-01-27 08:47:59
    LSJA24390MS139778 NULL 57.1 2024-01-27 08:48:01 2024-01-27 08:48:11
    LSJA24390MS139778 NULL 57.2 2024-01-27 08:48:13 2024-01-27 08:48:21
    LSJA24390MS139778 NULL 57.2 2024-01-27 08:48:23 2024-01-27 08:48:23
    LSJA24390MS139778 NULL 57.3 2024-01-27 08:48:25 2024-01-27 08:48:35
    LSJA24390MS139778 57.2 57.3 2024-01-27 08:48:37 2024-01-27 08:48:47
    11 条回复    2024-02-01 09:31:42 +08:00
    moyi97
        1
    moyi97  
       2024-01-31 15:50:36 +08:00
    rownumber 确定第一条, lag 或者 lead 做个判断 取断掉那条, 应该就可以实现了吧
    Jinnrry
        2
    Jinnrry  
       2024-01-31 15:54:27 +08:00 via Android
    你这个第一条 end 和第二条 start 也不相等啊,这不差了 2 秒吗?

    如果开始结束时间完全一致的时候,先分列,把 starttime 和 endtime 合并成一个字段,然后计数,如果某个时间点计数值是 1 ,那就说明不连续。
    shuaishuaivip
        3
    shuaishuaivip  
    OP
       2024-01-31 16:03:35 +08:00
    Hive SQL 实现问题:数据展示如下:

    name1,start,end,start_time,end_time
    LSJA24390MS139778 55.8 55.8 ,2024-01-27 08:44:23 2024-01-27 08:44:23
    LSJA24390MS139778 55.8 55.9, 2024-01-27 08:44:25 2024-01-27 08:44:35
    LSJA24390MS139778 55.9 55.9 ,2024-01-27 08:44:37 2024-01-27 08:44:47
    LSJA24390MS139778 55.9 56 , 2024-01-27 08:44:49 2024-01-27 08:44:51
    LSJA24390MS139778 56 56.1, 2024-01-27 08:44:53 2024-01-27 08:45:11
    LSJA24390MS139778 56.1 56.1, 2024-01-27 08:45:13 2024-01-27 08:45:21
    LSJA24390MS139778 56.1 56.1, 2024-01-27 08:45:23 2024-01-27 08:45:23
    LSJA24390MS139778 56.1 56.3, 2024-01-27 08:45:25 2024-01-27 08:45:51
    LSJA24390MS139778 56.3 56.5, 2024-01-27 08:45:53 2024-01-27 08:46:21
    LSJA24390MS139778 56.5 56.5, 2024-01-27 08:46:23 2024-01-27 08:46:23
    LSJA24390MS139778 56.5 56.6, 2024-01-27 08:46:25 2024-01-27 08:46:51
    LSJA24390MS139778 56.7 56.7, 2024-01-27 08:46:53 2024-01-27 08:46:59
    LSJA24390MS139778 56.7 56.8, 2024-01-27 08:47:01 2024-01-27 08:47:11
    LSJA24390MS139778 56.8 56.8, 2024-01-27 08:47:13 2024-01-27 08:47:21
    LSJA24390MS139778 56.8 56.8, 2024-01-27 08:47:23 2024-01-27 08:47:23
    LSJA24390MS139778 56.8 56.9, 2024-01-27 08:47:25 2024-01-27 08:47:35
    LSJA24390MS139778 56.9 57, 2024-01-27 08:47:37 2024-01-27 08:47:47
    LSJA24390MS139778 57 57, 2024-01-27 08:47:49 2024-01-27 08:47:51
    LSJA24390MS139778 57 57, 2024-01-27 08:47:53 2024-01-27 08:47:59
    LSJA24390MS139778 57 57.1, 2024-01-27 08:48:01 2024-01-27 08:48:11
    LSJA24390MS139778 57.1 57.2, 2024-01-27 08:48:13 2024-01-27 08:48:21
    LSJA24390MS139778 57.2 57.2, 2024-01-27 08:48:23 2024-01-27 08:48:23LSJA24390MS139778 57.2 57.3, 2024-01-27 08:48:25 2024-01-27 08:48:35
    LSJA24390MS139778 57.2 57.3, 2024-01-27 08:48:37 2024-01-27 08:48:47
    实现成这样
    name1,start,end,start_time,end_time
    LSJA24390MS139778 55.8 56.6 2024-01-27 08:44:23 2024-01-27 08:46:51
    LSJA24390MS139778 56.7 57.3 2024-01-27 08:46:53 2024-01-27 08:48:35
    LSJA24390MS139778 57.2 57.3 2024-01-27 08:48:37 2024-01-27 08:48:47
    shuaishuaivip
        4
    shuaishuaivip  
    OP
       2024-01-31 16:04:41 +08:00
    start,和 end 说的不是时间哦
    ericzy
        5
    ericzy  
       2024-01-31 16:11:57 +08:00 via Android
    -- 创建一个临时表,用 lag 函数获取上一行的 end 值
    with temp as (
    select name1, start, end, start_time, end_time,
    lag(end) over (partition by name1 order by start_time) as prev_end
    from your_table
    )
    -- 从临时表中筛选出断开的数据,即 start 不等于 prev_end 的数据
    select name1, start, end, start_time, end_time
    from temp
    where start != prev_end or prev_end is null

    -- 来自 gpt ,仔细查看和使用
    shuaishuaivip
        6
    shuaishuaivip  
    OP
       2024-01-31 16:29:09 +08:00
    @ericzy gpt 我都翻烂了~行不通
    BeiChuanAlex
        7
    BeiChuanAlex  
       2024-01-31 16:53:19 +08:00
    一张表解决不了就 2 张表分层实现
    hpkaiq
        8
    hpkaiq  
       2024-01-31 18:39:06 +08:00
    with t2 as (
    select *,
    lead(`start`) over (partition by name order by `end_time`) as next_start from your_table
    )
    ,t3 as (
    select name,start_time,end_time
    from t2 where `end` != next_start or next_start is null
    )
    ,t4 as (select * ,lag(`end_time`) over (partition by name order by `end_time`) as pre_end_time
    from t3)

    select name,step_end_time,
    min(`start`) `start`,max(`end`) `end`,
    min(start_time) start_time,max(end_time) end_time
    from
    (
    select t2.*,step_end_time from
    (
    select name,end_time step_end_time,if(pre_end_time is null,'1970-01-01 00:00:00',pre_end_time) step_start_time from t4
    )step
    join
    t2 on step.name = t2.name and t2.end_time <= step_end_time and t2.start_time > step_start_time
    )tmp
    group by name,step_end_time
    order by name,step_end_time

    小试一把,不知道行不行
    hero1874
        9
    hero1874  
       2024-01-31 21:01:28 +08:00
    st 就是 start ed 是 end
    我是用 doris 写的 所以下面 0 在 hive 里应该是 null 把判断是否为 0 改成是否为 null

    第一步 用 lag 函数取出上一个的 ed 值
    select
    name,
    st,
    ed,
    start_time,
    end_time,
    lag(ed,1,0) over( partition by name order by start_time ) lag_ed1 -- 上一行的 ed 值
    from form6
    order by start_time

    第二步
    select
    t1.*,
    sum(
    case
    when lag_ed1 = 0 then 1 -- 第一行
    when lag_ed1 !=st then 1 -- 断开了
    when lag_ed1 = st then 0 -- 连续
    end )
    over( partition by name order by start_time) num
    from
    (select
    name,
    st,
    ed,
    start_time,
    end_time,
    lag(ed,1,0) over( partition by name order by start_time ) lag_ed1
    from form6
    order by start_time) t1
    order by start_time

    第三步 根据 name 和判断是否连续或者断开分组取数
    select
    name,
    min(st) st,
    max(ed) ed,
    min(start_time)start_time,
    max(end_time) end_time
    from
    (
    select
    t1.*,
    sum(
    case
    when lag_ed1 = 0 then 1
    when lag_ed1 !=st then 1
    when lag_ed1 = st then 0
    end )
    over( partition by name order by start_time) num
    from
    (select
    name,
    st,
    ed,
    start_time,
    end_time,
    lag(ed,1,0) over( partition by name order by start_time ) lag_ed1
    from form6
    order by start_time) t1
    order by start_time
    )t3
    group by name ,num
    lusansan
        10
    lusansan  
       2024-01-31 23:35:47 +08:00
    select
    name1
    ,MIN(start_number) as start_number
    ,MAX(end_number) as end_number
    ,MIN(start_time) as start_time
    ,MAX(end_time) as end_time
    from
    (
    SELECT
    *
    ,case when start_number is null then row_number() over (partition by name1 order by start_time) -1 else row_number() over (partition by name1 order by start_time) end as rn
    from
    (
    SELECT
    name1
    ,start_number
    ,end_number
    ,start_time
    ,end_time
    ,lead(start_number,1) over (partition by name1 order by start_time) lead_start_number
    FROM
    (
    select
    name1,start_number,end_number,start_time,end_time
    from
    values
    ('LSJA24390MS139778' ,55.8,55.8 ,'2024-01-27 08:44:23' ,'2024-01-27 08:44:23')
    ,('LSJA24390MS139778' ,NULL,55.9 ,'2024-01-27 08:44:25' ,'2024-01-27 08:44:35')
    ,('LSJA24390MS139778' ,NULL,55.9 ,'2024-01-27 08:44:37' ,'2024-01-27 08:44:47')
    ,('LSJA24390MS139778' ,NULL,56 ,'2024-01-27 08:44:49' ,'2024-01-27 08:44:51')
    ,('LSJA24390MS139778' ,NULL,56.1 ,'2024-01-27 08:44:53' ,'2024-01-27 08:45:11')
    ,('LSJA24390MS139778' ,NULL,56.1 ,'2024-01-27 08:45:13' ,'2024-01-27 08:45:21')
    ,('LSJA24390MS139778' ,NULL,56.1 ,'2024-01-27 08:45:23' ,'2024-01-27 08:45:23')
    ,('LSJA24390MS139778' ,NULL,56.3 ,'2024-01-27 08:45:25' ,'2024-01-27 08:45:51')
    ,('LSJA24390MS139778' ,NULL,56.5 ,'2024-01-27 08:45:53' ,'2024-01-27 08:46:21')
    ,('LSJA24390MS139778' ,NULL,56.5 ,'2024-01-27 08:46:23' ,'2024-01-27 08:46:23')
    ,('LSJA24390MS139778' ,NULL,56.6 ,'2024-01-27 08:46:25' ,'2024-01-27 08:46:51')
    ,('LSJA24390MS139778' ,56.7,56.7 ,'2024-01-27 08:46:53' ,'2024-01-27 08:46:59')
    ,('LSJA24390MS139778' ,NULL,56.8 ,'2024-01-27 08:47:01' ,'2024-01-27 08:47:11')
    ,('LSJA24390MS139778' ,NULL,56.8 ,'2024-01-27 08:47:13' ,'2024-01-27 08:47:21')
    ,('LSJA24390MS139778' ,NULL,56.8 ,'2024-01-27 08:47:23' ,'2024-01-27 08:47:23')
    ,('LSJA24390MS139778' ,NULL,56.9 ,'2024-01-27 08:47:25' ,'2024-01-27 08:47:35')
    ,('LSJA24390MS139778' ,NULL,57 ,'2024-01-27 08:47:37' ,'2024-01-27 08:47:47')
    ,('LSJA24390MS139778' ,NULL,57 ,'2024-01-27 08:47:49' ,'2024-01-27 08:47:51')
    ,('LSJA24390MS139778' ,NULL,57 ,'2024-01-27 08:47:53' ,'2024-01-27 08:47:59')
    ,('LSJA24390MS139778' ,NULL,57.1 ,'2024-01-27 08:48:01' ,'2024-01-27 08:48:11')
    ,('LSJA24390MS139778' ,NULL,57.2 ,'2024-01-27 08:48:13' ,'2024-01-27 08:48:21')
    ,('LSJA24390MS139778' ,NULL,57.2 ,'2024-01-27 08:48:23' ,'2024-01-27 08:48:23')
    ,('LSJA24390MS139778' ,NULL,57.3 ,'2024-01-27 08:48:25' ,'2024-01-27 08:48:35')
    ,('LSJA24390MS139778' ,57.2,57.3 ,'2024-01-27 08:48:37' ,'2024-01-27 08:48:47') as t (name1,start_number,end_number,start_time,end_time)
    ) p1
    ) p2
    WHERE
    start_number is not null or lead_start_number is not null
    ) t1
    group by name1,rn
    ;



    name1 start_number end_number start_time end_time
    LSJA24390MS139778 55.8 56.6 2024-01-27 08:44:23 2024-01-27 08:46:51
    LSJA24390MS139778 56.7 57.3 2024-01-27 08:46:53 2024-01-27 08:48:35
    LSJA24390MS139778 57.2 57.3 2024-01-27 08:48:37 2024-01-27 08:48:47
    shuaishuaivip
        11
    shuaishuaivip  
    OP
       2024-02-01 09:31:42 +08:00
    @lusansan 大佬,这个可行。
    关于     帮助文档     自助推广系统     a href="https://blog.v2ex.com/" class="dark" target="_blank">博客     API     FAQ     Solana     3621 人在线   最高记录 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 24ms UTC 04:16 PVG 12:16 LAX 20:16 JFK 23:16
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86