Started on a Semantic Search engine 14 Oct 03

I have started on the Semantic Search Engine. I have downloaded 250Mb of pages to test with. I then constructed a partial (test) word list from this. The word list has the term frequency occourance and originating doc id. The words have all been stemmed to reduce overhead, I used the Lingua::Stem module for this. I will create a full word list tomorrow if I get time. I also need to find a decent library in C++ becasue I don't fancy writing my own Singular Value Decomposition library (if you know what sort of maths would be involved in doing this you also know that I am not at that level, yet! ;-). I also think Perl may be a bit slow for what I am trying to do although I am always willing to give it a try and see what happens.

67.6 Million links found
8.529 Million unique links found

Add to delicious Digg This Add to My Yahoo! Add to Google Add to StumbleUpon
| | Comments (0)

Leave a comment

About this Entry

This page contains a single entry by Harry published on October 14, 2003 12:28 AM.

Not revising for Exams 13 Oct 03 was the previous entry in this blog.

C++ Libraries for sparse matrix manipulations are not easy to find 20 Oct 03 is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 4.01