比如我有以下 url(输入)
https://www.showcase.com/user/home https://www.showcase.com/bill/BlKLSJDFLJERSDF https://www.showcase.com/bill/BSERlKLSSDFEJSDF https://www.showcase.com/bill/BSDREWRDF https://www.showcase.com/billBSERDWEDFEJSDF # 类似 url 可能有 100+个 https://www.showcase.com/bill/BlKLSJDFLJERSDF/detail https://www.showcase.com/bill/BSERlKLSSDFEJSDF/detail https://www.showcase.com/bill/BSDREWRDF/detail https://www.showcase.com/bill/BSERDWEDFEJSDF/detail # 类似 url 可能有 100+个 https://www.showcase.com/topic/234566833245234566 https://www.showcase.com/topic/200000234523456683 https://www.showcase.com/topic/2586683567243w56324 # 类似 url 可能有 100+个 # 其它大量 url , 正则规则不固定,只能通过统计分析
分类为(输出)
https://www.showcase.com/user/home https://www.showcase.com/bill/{param} https://www.showcase.com/bill/{param}/detail https://www.showcase.com/topic/{param}
暂时只想到用模式识别, 不知大佬有无其它方法
1 Coderuancun 2023-02-03 09:20:11 +08:00 分词处理,有那种分词处理算法 |
2 acmerliu 2023-02-03 09:21:29 +08:00 隐马尔可夫 |
3 Jooooooooo 2023-02-03 10:39:28 +08:00 这不是正则吗 |
4 34127chi 2023-02-03 13:43:41 +08:00 这不是正则吗 |