import re with open('西游记.txt', 'r', encoding='utf-8') as f: text = f.read() regex = re.compile(r'.*?《》(.*?)《》.*?', re.S) result = re.findall(regex, text) print(len(list(result))) for item in result: print(item)
可以得到如下的效果:
后来【瑜亮老师】发现了一个问题,并且指出:
改进后的代码如下所示:
import re
with open('西游记.txt', 'r+', encoding='utf-8') as f: txt = f.read()
rex1 = r'《》目录 (.*?)\n\n\n' rex2 = r'《》目录 (第一百回.*?《西游记》至此终。)' result = re.findall(rex1, txt, re.S) temp = re.findall(rex2, txt, re.S) result += temp # print(len(result)) for item in result: print(item)