Getting the vector space search engine runing 25 Oct 03

I have spent the last few days trying to get the Vector Space Search engine running. The code is in a bit of a mess at the moment but, it's comming along. All I can say is thank god for the STL, without it I would been in for a hell of a job. I have now managed to create a Sparse Vector Space Matrix from 2397 documents. This needs to be increased before I can really start testing any weighting algorithms.

At the moment this is using 30Mb of memory. This is the max used during the entire process. I did have it running at 256Mb but this was my first round at designing the matrix. I then showed my program a copy of Knuth volume 3, it cowered in fear and its shoesize quickly dropped to a more respectable size. I am pretty sure that I could drop this even further by writing my own data structure without using the STL but I am happy with it at the moment.

I am not entirely happy with the output of the program yet becasue the inner product routine is not producing the correct output but this should be reletively easy to fix. I really need to do a review to make sure that I am not missing any points in my methodology.

I also need to try and compile a better stop list. The one I am using is not particularly good. This is a sure fire way of reducing the RAM footprint.

83.1 Million links found
10.9 Million unique links found

Add to delicious Digg This Add to My Yahoo! Add to Google Add to StumbleUpon
| | Comments (0)

Leave a comment

About this Entry

This page contains a single entry by Harry published on October 25, 2003 12:27 AM.

Joys of DOS 5 20 Oct 03 was the previous entry in this blog.

Sorting out the code 27 Oct 03 is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 4.01