missing comment

This commit is contained in:
Ulf Gebhardt 2013-05-19 04:57:37 +02:00
parent a363992a15
commit 55935e7b22

View File

@ -19,7 +19,7 @@ VN:
- Sprachprüfer fertig
TODO:
- canonize urls -> canonize? slides?
- DONE canonize urls -> canonize? slides? -> remember last host -> no magic here -> even using ugly global
- DONE with getNextUrlToVisit():
server timeout -> safe crawled host, set timeout for crawled host
- statistics -> http://www.ke.tu-darmstadt.de/lehre/ss13/web-mining/uebung2.html
@ -106,7 +106,6 @@ def blockedByRobotsTxt(url):
prohibitedSites += 1
return True
## TODO: canonical url not only check if url is valid. Transfer relative url to absolute one
def canonicalUrl(url):
global lasthost
url = url.lower().replace(" ", "")