C++ Libraries for sparse matrix manipulations are not easy to find 20 Oct 03

After much hunting for a library that I can use to impliment an LSI search engine I have had little luck. The library that seems to be the job for this is called SVDPACK it is written in Fortran and has been ported to C++. However, it has not been ported to the humble x86 architecture. It looks like I will have to run with writing the Vector Space search engine instead.

I managed to write a C++ routine to get the output of the Perl Term Document parser. This is a very simple parser that splits all the words in the document on whitespace. I know that there are reasons for not doing it like this so if I get time I will come up with a better method later but for now it will do.

My next task now is to take the input of the C++ program and create a Term Document Matrix from that that I can manipulate easily. I need to be able to carry out the following actions and quite a few more.

1. Count all occourances of each word in the entire document set.
2. Calculate mean values for each word.
3. Come up with some method to ran words in the document matrix. This is to avoid the typical abuses that you see where website saturate pages with keywords to try and manipulate the results of a search engine.

67.6 Million links found
8.529 Million unique links found

Add to delicious Digg This Add to My Yahoo! Add to Google Add to StumbleUpon
| | Comments (1)

1 Comments

leila said:

program sparse matrix in c++

Leave a comment

About this Entry

This page contains a single entry by Harry published on October 20, 2003 12:28 AM.

Started on a Semantic Search engine 14 Oct 03 was the previous entry in this blog.

20 Oct 03 is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 4.01