999
若沒有安裝過lxml解析器,需先透過指令安裝
pip install lxml
前兩行先將BeautifulSoup以及urlopen函示庫匯入,因為網頁中包含中文字,因此讀取後會需要decode(‘utf-8’)編碼。
再將html資料將其印出,網站來源為莫凡python提供的測試網站
from bs4 import BeautifulSoup
from urllib.request import urlopen
# if has Chinese, apply decode()
html = urlopen("https://mofanpy.com/static/scraping/list.html").read().decode('utf-8')
print(html)
這段程式碼將class類別中為month的資料解析出來,並透過print(m.get_text())將有使用month這個類別的標籤全部顯示。
from bs4 import BeautifulSoup
from urllib.request import urlopen
# if has Chinese, apply decode()
html = urlopen("https://mofanpy.com/static/scraping/list.html").read().decode('utf-8')
# use class to narrow search
#尋找指定的class類別
soup = BeautifulSoup(html, features='lxml')
month = soup.find_all('li',{"class":"month"})
for m in month:
print(m.get_text())