get bbs data
Method
實作從鼠洞bbs抓文並且轉到google group:
1. 先把版上之前的文章都先抓下來, 並設定轉信
實作上,為了處理big5的編碼問題,我會先把文章轉成utf-8,在使用RE,並且寄到google group上
2. 從news group server上抓文:
首先先在group.nctu.edu.tw指定依個xxx.twbbs.org到自己的ip
這樣便可以連線到group.nctu.edu.tw, 接著就能使用nntplib連線到server抓文章了
Resource
adsl 轉址:
http://redhat.ecenter.idv.tw/bbs/showthread.php?s=&threadid=12891
http://www.adsl.org/
http://www.5402.idv.tw/is/iptoip/no-ip/no-ip.htm
http://linux.vbird.org/linux_server/0270dynamic_dns.php#need_dynamic_noip
申請twbbs.org
http://twbbs.org/
Tips
其實BBS的文章如果有開放交大group轉信的話,
在group.nctu.edu.tw指定一個XXXXX.twbbs.org到自己的ip
便可直接telnet group.nctu.edu.tw 119
就可以連上去交大的news group了
然後就會多一個group.XXXXX.account的群組
就可以到那邊抓文章了…
基本的操作指令是
切換group: group group.XXXXX.account
Source
import os
import smtplib
import re
import time
from stat import *
import pickle
def date_cmp(f1,f2):
date1 = os.stat('./'+f1)[ST_MTIME]
date2 = os.stat('./'+f2)[ST_MTIME]
if date1 > date2:
return 1
elif date1 == date2:
return 0
else:
return -1
def send_mail(from_addr, to_addr, context, subject):
# Add the From: and To: headers at the start!
msg = ("From: %s\r\nTo: %s\r\nSubject: %s\r\n"
% (from_addr, to_addr, subject))
msg = msg + context
print "Message length is " + repr(len(msg))
server = smtplib.SMTP('localhost')
#server.set_debuglevel(1)
server.sendmail(from_addr, to_addr, msg)
server.quit()
def main():
from_addr = "Robot"
to_addr = "room_joke@googlegroups.com , shenyute@gmail.com"
f_list = os.listdir('./')
f_list.sort(date_cmp)
counter = 0
for file_name in f_list:
context = ""
subject = ""
if file_name[-2:] == '.A':
counter = counter + 1
if counter > 300:
f = open(file_name)
context = "".join(f.readlines())
#context.decode('Big5').encode('UTF-8')
#print context
result = re.search(u"翹??D: (.*)",context)
if result != None:
subject = result.group(1)
print result.group(1)
send_mail(from_addr,to_addr,context,subject)
time.sleep(40)
if __name__ == '__main__':
main()
page revision: 0, last edited: 22 Jan 2009 02:34