missing comment
This commit is contained in:
parent
a363992a15
commit
55935e7b22
@ -19,7 +19,7 @@ VN:
|
||||
- Sprachprüfer fertig
|
||||
|
||||
TODO:
|
||||
- canonize urls -> canonize? slides?
|
||||
- DONE canonize urls -> canonize? slides? -> remember last host -> no magic here -> even using ugly global
|
||||
- DONE with getNextUrlToVisit():
|
||||
server timeout -> safe crawled host, set timeout for crawled host
|
||||
- statistics -> http://www.ke.tu-darmstadt.de/lehre/ss13/web-mining/uebung2.html
|
||||
@ -106,7 +106,6 @@ def blockedByRobotsTxt(url):
|
||||
prohibitedSites += 1
|
||||
return True
|
||||
|
||||
## TODO: canonical url not only check if url is valid. Transfer relative url to absolute one
|
||||
def canonicalUrl(url):
|
||||
global lasthost
|
||||
url = url.lower().replace(" ", "")
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user