import requests
from bs4 import BeautifulSoup
import re
r = requests.get('http://www.jxufe.edu.cn/')
html = r.text
soup = BeautifulSoup(html)
html_prettified = soup.prettify()
wenzi = soup.get_text()
wenzi_substract = re.sub('\n','', wenzi)
(1)首先导入模块环境。
(2)通过request模块获取江西财经大学首页的源代码
(3)通过beautifulsoup对爬取内容进行美化并提取文字内容
(4)利用re模块将换行符替换成空