
请教各位 V 友一个问题,本人需要增量处理一些大型的 XML 文件,从 python-cookbook 上找到了代码,我改到了我的场景下,但是代码似乎没有正常工作,内存占用上升很快,大约处理十几万行会占用几个 g 内存,我不太理解,希望大神指点,主要逻辑代码如下
macOS BigSur
python 3.8.12
from xml.etree.ElementTree import iterparse def parse_and_remove(filename, path): path_parts = path.split('/') doc = iterparse(filename, ('start', 'end')) # Skip the root element next(doc) tag_stack = [] elem_stack = [] for event, elem in doc: if event == 'start': tag_stack.append(elem.tag) elem_stack.append(elem) elif event == 'end': if tag_stack == path_parts: yield elem elem_stack[-2].remove(elem) try: tag_stack.pop() elem_stack.pop() except IndexError: pass data = parse_and_remove('my.xml','path') client, table = getMongo() for pothole in data: resDict = { # 获取我需要的数据 } table.insert(resDict) client.close() 1 2i2Re2PLMaDnghL 2021-11-10 09:46:26 +08:00 1. 尝试换用 lxml 2. 尝试用 xpath 而不是手动 iter 比对 path |