V2EX wordpiece

Wordpiece

Definition 定义

WordPiece:一种常用于自然语言处理(NLP)的子词切分/分词方法,把单词拆成更小的“子词单元”(subword units),以便模型处理生僻词、词形变化和未登录词(OOV)。常见于现代语言模型的分词器中。(也可泛指“单词的一部分/词片段”,但最常见用法是指该算法与其生成的子词。)

Pronunciation 发音

/wd.pis/

Etymology 词源

word(词、单词)+ piece(片、块)构成的复合词,字面意思是“单词的片段”。在计算语言学语境中,它被用作专有名词,指一种把词拆成“更小片段”的子词建模思路与实现。

Examples 例句

WordPiece breaks rare words into smaller units so the model can still understand them.
WordPiece 会把罕见词拆成更小的单元,这样模型仍然能理解它们。

In our pipeline, we train a WordPiece vocabulary and tokenize all texts before feeding them into the transformer.
在我们的流程中,我们先训练一个 WordPiece 词表,并在送入 Transformer 之前对所有文本进行 WordPiece 分词。

Related Words 相关词

Literary Works 文学作品

  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(Devlin et al.)描述使用 WordPiece 分词的经典论文
  • Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation(Wu et al.)讨论子词单元与相关分词策略的代表性论文
  • SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing(Kudo & Richardson)常与 WordPiece 放在一起比较与讨论的分词研究作品
关于     帮助文档     自助推广系统     博客     API     FAQ     Solana     2865 人在线   最高记录 6679       Select Language
创意工作者们的社区
World is powered by solitude
VERSION: 3.9.8.5 7ms UTC 08:31 PVG 16:31 LAX 01:31 JFK 04:31
Do have faith in what you're doing.
ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86