Skip to content

An automatic crawler that will assess chinese news from Google's headline with designated time everyday.

License

Notifications You must be signed in to change notification settings

AngusKung/zhNewsCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

zhNewsCrawler

中文 新聞(語料) 爬蟲


現行中文的語料就屬wiki上最完整,無奈欠缺了新聞等即時性的資訊,因此照造了這個爬蟲:可在每天的指定時間上,自動爬取Google News所蒐到的中文新聞頭條
Wikipedia is now the most sound and complete Mandarin corpus now on the Internet. However, it lacks the information of trendy topics. As a matter of fact, this daily cralwer is built to fetch news from the headlines of Google News.

Important Notice:

I do not own the file "dict.txt.big". It's a great work from jieba team, "fxsjy" et al.

Envoronments:

In python 2.7 with "requests", "BeutifulSoup", "jieba" and "progressbar" installed.

About

An automatic crawler that will assess chinese news from Google's headline with designated time everyday.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages