关于 Python3 字符串比较问题

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

已注册用户请登录

这是一个创建于 3381 天前的主题，其中的信息可能已经有所发展或是发生改变。

我有两个问题，还请大家帮助解答：

为什么 a != b
既然 a != b，为啥后面检查路径的时候 os.path.exists(current_path, b) 为 True

In [33]: os.listdir('.') Out[33]: ['.DS_Store', 'Closer - Travis.mp3', 'I Really Like You - Carly Rae Jepsen.mp3', 'Love Story - Taylor Swift.mp3', 'Per dimenticare - Zero Assoluto.mp3', "Sono Bugiarda (I'm A Believer) - Caterina Caselli.mp3", 'The Phoenix - Fall Out Boy.mp3', 'We Are Never Ever Getting Back Together - Taylor Swift.mp3', 'You Got Me - Colbie Caillat.mp3', 'ガネット - 奥子.mp3', 'ブルバド - いきものがかり.mp3', '初恋 - 奥子.mp3', '思念是一种病(Live) - live - 张震岳.mp3', '最后一班车 - 刺猬.mp3', '梁山伯与祝英台 - 群星.mp3'] In [34]: a = os.listdir('.')[-5] In [35]: a Out[35]: 'ブルバド - いきものがかり.mp3' In [36]: b = 'ブルバド - いきものがかり.mp3' # 用鼠标复制这个字符串 In [37]: a == b # 这里很奇怪，明明是这样复制下来的，却不相等 Out[37]: False In [38]: current_path = os.getcwd() In [39]: b_path = os.path.join(current_path, b) In [40]: os.path.exists(b_path) # 这里也很奇怪，既然上面 a 和 b 不相等，这里路径又存在。搞不懂？ Out[40]: True

链接： https://gist.github.com/cosven/9e11707a8ebe98bc95948167a0001449

mp3'

out

奥子

1 条回复 2016-07-10 23:06:28 +08:00

mimzy

2016-07-10 23:06:28 +08:00

>>> a
'ブルバド - いきものがかり.mp3'
>>> b
'ブルバド - いきものがかり.mp3'

>>> a.encode('unicode_escape')
b'\\u30d5\\u3099\\u30eb\\u30fc\\u30cf\\u3099\\u30fc\\u30c8\\u3099 - \\u3044\\u304d\\u3082\\u306e\\u304b\\u3099\\u304b\\u308a.mp3'
>>> b.encode('unicode_escape')
b'\\u30d6\\u30eb\\u30fc\\u30d0\\u30fc\\u30c9 - \\u3044\\u304d\\u3082\\u306e\\u304c\\u304b\\u308a.mp3'

注意看第一个字「ブ」， a 中 2 个字节， b 中 1 个：
フ U+30D5 http://unicode-table.com/cn/30D5/

（此处手动换行防组合） U+3099 http://unicode-table.com/cn/3099/

ブ U+30D6 http://unicode-table.com/cn/30D6/

a 中的「ブバドが」都是「平假名 + 浊音符」组成的，根据 http://www.cnblogs.com/jessonluo/p/4801580.html 上的资料显示：「 Unicode 中重音符号有两种表示方法，用一个字节表示，或者用基字母加上重音符号表示，在 Unicode 中他们是相等的，但是在 Python 中由于通过 code point 来比较大小，所以就不相等了。」

可以通过使用 unicodedata.normalize 选择合适的方法去比较，类似于这种 http://r9.hateblo.jp/entry/2015/05/11/233000

完全不懂日文，以上都是搜索找到的~