Re-writing the spiders 08 Nov 03

I have been very busy lately re-writing the spiders for the search engine. I have decided to write up what I did to build the spider in the vain hope that someone may find it useful one day. I digressed several times and had some fun writing a recursive one but I eventually settled on writing an iterative robot that uses Postgres to store the links. This was partly due to already having a database with several million links in it already. Please see the link above for more details. I have also managed to download a few thousand documents for the search engine, hence the increase in the links found, this was caused by me parsing the documents that I had found when experimenting with the new robots .

85.0 Million links found

Add to delicious Digg This Add to My Yahoo! Add to Google Add to StumbleUpon
| | Comments (0)

Leave a comment

About this Entry

This page contains a single entry by Harry published on November 8, 2003 12:25 AM.

Sorting out the code 27 Oct 03 was the previous entry in this blog.

Weeding the database 12 Nov 03 is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 4.01