我爱python...
生工学院py大一实验报告
生工学院大二py实验报告2(sklearn)
外国语学院大二py实验报告1(NLP-nltk)
外国语学院py实验报告
外国语学院大二py实验报告4.1
外国语学院大二py实验报告4.2
豆瓣爬虫
ECUST - 2022年下半年大二上学期外语学院python期末考试真题题库(python与语言智能)
ECUST - 2022年下半年大二上学期外语学院python备考资料
c语言求解n的阶乘1
c语言随机生成数组并排序
c语言编写递归,求解阶乘之和
2025 openvpn入门:用openvpn+云服务器实现私有网络代理
本文档使用 MrDoc 发布
-
+
首页
外国语学院大二py实验报告4.2
```python import urllib.request import bs4 import urllib.parse import glob def GetANews(aURL,num): list1 = [] headers = { 'user-agent': 'Mozilla/5.0' } request = urllib.request.Request(aURL, headers=headers) response = urllib.request.urlopen(request) soup = bs4.BeautifulSoup(response, 'html.parser') content = soup.find_all('div', id="Content") for eachCon in content: cont = eachCon.find_all('p') for c in cont: list1.append(c.text) with open('bnews'+str(num)+'.txt', 'w', encoding='utf-8') as targetf: targetf.writelines(list1) url = 'https://www.chinadaily.com.cn/business/companies' headers = { 'user-agent':'Mozilla/5.0' } request = urllib.request.Request(url,headers=headers) response = urllib.request.urlopen(request) soup = bs4.BeautifulSoup(response,'html.parser') content = soup.find_all('div',class_='lft_art lf') i = 1 with open('bussinews_index.txt', 'w', encoding='utf-8') as f: for eachCon in content: item = eachCon.find_all('h4') for it in item: item2 = it.find('a',href=True) s = str(i)+' '+item2.text+" "+ 'https:'+item2['href'] f.write(s) f.write('\n') i = i+1 if i==6: break with open('bussinews_index.txt','r',encoding='utf-8') as f: file = f.readlines() list2 = [] for data in file: data = data.split() for i in data: TargetUrl = data[-1] list2.append(TargetUrl) num = 1 for item in list2: GetANews(item,num) num = num+1 data = [] target_files = glob.glob(pathname='*.txt') for file in target_files: with open(file,'r',encoding='utf-8') as f: lines = f.readlines() data.extend(lines) data.extend(['\n']) with open('allbusiness.txt', 'w', encoding='utf-8') as f: f.writelines(data)
zhy@@ldy
2022年12月7日 21:18
转发文档
收藏文档
上一篇
下一篇
手机扫码
复制链接
手机扫一扫转发分享
复制链接
Markdown文件
PDF文档(打印)
分享
链接
类型
密码
更新密码