Sorting out the code 27 Oct 03

The last couple of days have been spent sorting out some of the perl and C++. I have also expanded the stop list quite a bit. The Perl script that I was using to produce the file to build the term document matrix also got a bit of a working over.

I have increased the document list to 15700 which is still relatively small for an internet search engine but it is now a respectable amount of text to search for a small intranet site, like a small law firm. I will gradually increase this as I go along and testing to see what kind of results I get.

I have decided to write up what I have done with example code and put it on another few pages. Hopefully someone will be able to make some use of it.

Please see my:

Vector Space Search Engine

page for more details of what I have done to get this working, I am using the term working in its weakest sense here since I have been unable to test it properly yet.

83.1 Million links found
10.9 Million unique links found

Add to delicious Digg This Add to My Yahoo! Add to Google Add to StumbleUpon
| | Comments (0)

Leave a comment

About this Entry

This page contains a single entry by Harry published on October 27, 2003 12:26 AM.

Getting the vector space search engine runing 25 Oct 03 was the previous entry in this blog.

Re-writing the spiders 08 Nov 03 is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 4.01