November 2004 Archives
This is relatively straight forward. I just recently installed a new network card to play around with and to see if I can make head or tail of the driver details so I need to make sure I have the driver for the card.
I installed a NetGear F311, I had a couple of spares. The driver for this card is the natsemi driver. To see if you have the source try the following.
]$ locate natsemi
/usr/src/kernel-source-2.6.5/drivers/net/natsemi.c
/usr/src/kernel-source-2.6.5/include/config/natsemi.h
There is no need to be the root user for any of this until you need to actually install the driver, I will tell you when ;)
Copy both these files to a directory of your choice. Then, in the same directory create a Makefile with the following text:
1 obj-m := natsemi.o
2
3 KDIR := /lib/modules/$(shell uname -r)/build
4 PWD := $(shell pwd)
5
6 default:
7 $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
save it and then execute the folloing command:
]$ make
Some text should whizz past detailing what it is doing. In the directory which you ran make in there should now be several new files
natsemi.ko
natsemi.mod.c
natsemi.mod.o
natsemi.o
The one you are interested in is "natsemi.ko". As the root user change to the directory containing the "natsemi.ko" file and run
]$ insmod natsemi.ko
If all goes well there should be no messages. To see if it loaded and to satisfy your curiosity try
]$ lsmod
natsemi 18976 0
tulip 36640 0
crc32 3840 2 natsemi,tulip
af_packet 12552 4
The above is what I have on mine
To see if the card works (Debian) edit your
/etc/network/interfaces
file and add the following. Note that I already have a card installed using eth0 so I have chosen eth1 for this card
11 iface eth1 inet static
12 address 192.168.1.10
13 netmask 255.255.255.0
Then issue the command
]$ ifup eth1
]$ ping 192.168.1.10
and you should now have the card working.
I received an invite the other day to a Google talk in London. For those that don't know me I am fascinated by search engines, particularly google since there is more online about how they did their stuff than any of the others. You can imagine my enthusiasm at the prospect of going so I registered my interest and decided to go.
I then discovered that the talk has been organized by Pulse Group. From what I can gather they are the recruitment arm of Google for Ireland (wild guess). This rang recruitment company alarm bells for me, (I wrote an entire site to circumvent the need to go through a recruitment company, I don't trust them.)
I am making bets online that it is going to be some corporate nonsense recruitment drive with more emphasis on that than on Googles tech.
I was also disappointed by the website.
1. No document type
2. It is mostly flash so a lot of people cannot see it.
3. Most of the menus are tiny and hard to read.
4. I was unable to resize the text.
5. The text is embedded in flash so it cannot be indexed by Google the people they are representing.
6. Some of the menus are mouse dependent (The mouse must hover over them to view the content)
Its another site designed by managers for managers. I suppose they have a different set of requirements than me.
Anyway I didn't go. I would be interested in hearing from people who did go and what they thought of the talk.............
As a follow up I was correct. A friend of mine went and was quite pissed off because it was a total waste of his time.
I have been using Eterm for some time now because enlightenment is my normal choice on Linux and I have never really needed anything else, however, I have noticed that on my machine at work I was getting some odd behavior when using ALT-TAB to switch between terminals so I decided to try xterm instead and I have to say I am very impressed with it.
It involves a little bit more work to set up but then most good things do. So far I have experienced no odd behavior and I think I might adopt xterm as my default terminal, it just seems more mature and competent than Eterm.
These are the setting I stared with in .Xdefaults
xterm*Background: black
xterm*Foreground: grey
xterm*VT100*geometry: 140x28+1+1
xterm*font: 9x15
xterm*scrollBar: False
xterm*JumpScroll: on
xterm*saveLines: 4096
. To load then use
shell]$ xrdb .Xdefaults
I have a database of just over 11000 jobs and I need to run and indexer against it for the search engine to work. Just recently this has been getting slower and slower due to other things going on with the server so I decided to have a look at it tonight. The following was what I done and what I found:
Preliminaries.
All regex's are pre-compiled prior to comparison using something like:
$dict{qr/\b\Q$keyword\E\b/} = $keyword;
Table Name Rows
key_word_search ==: 51641
rss_jobs ==: 179 (last nine hours worth)
Total checks == 9243739 (approx)
Methods:
0. Normal index with no tuning applied. This is what has been running
for the last few months.
1. For each job entry we are indexing check first to see if the job_id
and the keyword is already in the index. If yes go to the next record.
2. Use perls "index" function to pre-process the result and only try a
full regex if the string appears somewhere in one of the 3 RSS entries.
Results.
I was not going to try and second guess the result but I had a feeling
that M2 would be quicker. What I was suprised at is just how much
quicker. I imagine each method would see an improvement if more RAM was
given to Postgres especially M1 and M0 but I doubt either of them would
catch M1.
Also, the trigger that inserts the job actually carries out quite a
few checks to ensure the entry does not already exist so M1 is being
duplicated somewhat anyway and I am not about to relax the database checks/integrity to satisfy performance. Performance can normally be
cured using some other method as can be seen here.
Outer == No. Total Operations applied.
Inner == No. Left after filtering by Method
Matched == No. we matched and will be entered into database.
The inner rule is the method I have put in to filter the results before
I try a full regex match. The original indexer had no filter.
Method 0:
Outer == 9239317 In == 9239317 MATCH == 3009
real 8m23.868s
user 8m9.510s
sys 0m0.720s
Method 1:
Outer == 9239317 In == 14546 MATCH == 3009
real 1m30.897s
user 1m25.840s
sys 0m0.520s
As you can see here using the perls inbuilt "index" function I have
managed to narrow the actual operations considerably. We are relying on
the speed of the index compared to an actual regex match to gain the
speed here. I imageine they have almost literally used C's
char *strstr(const char *s1, const char *s2);
or something simlar.
Method 2:
Outer == 105084 In == 99293 MATCH == 23
real 2m9.680s
user 0m16.840s
sys 0m5.090s
We can see here that this method is a lot slower. I actually stopped
this one early and it had only completed just over 1% of the total
operations required and it took 2 minutes. This was going to be slow
due to the amount of IO required ie 9 million possible calls to the
database and then a binary lookup on and index of just over 800k is not
going to be that fast at the best of times.
As an excercise and to satisfy my own curiosity I tried to put M1 first
and use M2 after it to see what would happen and the following was the
result.
Outer == 9239317 In == 14540 MATCH == 3009
real 1m42.974s
user 1m22.980s
sys 0m1.430s
We can see from this that calling out to the database is adding
overhead to the process.
Conclusion:
When we are using heavy regex intensive operations it pays to
pre-process using perls inbuilt "index" rather than relying on the
speed of the regex itself.
I lost my internet connection at the weekend and was at a bit of a loss as to what I could do so I decided to take a pop at writing a simple module for the Linux kernel. I have a copy of
Beginning Linux Programming
ISBN: 1861002971
Authors: Richard Stevens and Neil Matthew
Ed: 2nd
so I turned to the back of it and started my foray into the Kernel. Now you need to remember that I am not a C programmer by trade and turning to the back of this book was a keen reminder of just how rusty my C is getting, not that it was ever rust free.
Luckily for me I have another book that is considered the C bible ie K&R and it deserves its reputation, it is a classic and I would recommend any programmer regardless of language choice to have a flick through it. When I was looking at some odd construct that those pointy hats had invented I had a flick through K&R and soon sorted it out.
Anyway back to the kernel. I was quick to discover that writing a module for the 2.6 kernel is not quite as straight forward as copying from the book and trying to understand what was going on. Things have been changing and I was getting all sorts of weird (or at least to me) and wonderful errors when trying to compile the kernel.
I eventually started to have a read at the recent modules in the source for 2.6.5 which I am running on this box. I also have the source for a 2.4 kernel on here so I opened 2 character drivers and compared notes between them. This is where I started to notice things that had changed. I made the changes I thought where necessary and I managed to get most of the "Hello World" module compiling but I was still getting errors.
I had a hunt around and I found a reference to some new build procedures for 2.6.5 so off I went in search of kbuild documentation and found some more stuff that had changed in the kernel. Namely the build procedure. This part was actually harder than the C that I had been struggling with.
After much swearing (I hate Makefiles and adding some more sugar is a pain in the ass) I managed to get the module compiling and I was on my way.
After a days work I now had a module that, on load would say
"Hello World"
and on removal
"Goodbye World"
time well spent or not? I haven't decided yet. I wonder how often changes like this take place in the kernel and how much porting takes place because of it.
Where to go from here. I asked a few friends who know more about this stuff than I do and I got mixed advice about continuing. Some of them think the kernel is a mess because they are always changing the driver API among other things. I cannot comment because my knowledge of the Linux kernel is limited to spelling it and I sometimes get that wrong.
I did get some useful pointers though. The following is the best book I have found so far for someone like me who is just starting out in the kernel.
Linux Device Drivers, 2nd Edition
It is written for the 2.4 kernel but has a wealth of information that is still valid today. I have started porting the scull drivers from this to the 2.6 kernel I am running and it is proving very interesting. I printed off chapter 2 and 3 yesterday and have have almost finished them (40 mins from Luton to London on the train each way helps). So far it seems to be moving along at a fair old pace, I am just hoping I can keep up.
I could have done with the following at the weekend. This tells me what I needed to know about moving from 2.4 to 2.6. I can see myself using this a lot in the next few weeks.
I had a look at Koha as an open source library system we might use at work and I promised I was going to look at
First off it installed very easily which was nice. We got it up and running with some problems ie we had to turn Globals on and under PHP this is normally considered a no no. This rung alarm bells in my head but I continued on.
Next thing I noticed was the code. It might be because I am used to Perl but the code just looked messy. This is no reason to judge it so I had a look at the main feature ie loading and understanding MARC21.
I could have saved myself a lot of time if I had noticed that they only support USMARC. I left a message on one of their mailing lists asking about the possibility of using MAR21 but heard nothing. Which was another bad sign i.e. from what I can tell its not a very active project.
Next thing I will be looking at is CDS ISIS which is a suite of tools written by United Nations Educational, Scientific and Cultural Organization (UNESCO)
It would appear that Swoogle is not being very polite to web servers. It seems to hit me from 2/4 times a second. I am probably going to ban it from UKlug because its just not very nice to hammer someones server as hard as that. At least Google is useful ie poeple find my site via google and it still manages to be polite about it. You would think that people doing research would be trying to be a bit more polite about what they are doing.
I have sent a couple of emails to the technical contacts and the the people running swoogle
If I don't get a reply I will ban their entire subnet because they appear to be using different IP address's to spider from.
130.85.95.109
130.85.95.23
I heard a great saying today.
"do not cast your pearls in front of swine"
This is so true. The context I had heard it is was in reference to people who would not accept Open Source as an alternative to proprietary systems ie you try and convince someone that there is a tool that will do the job and it is free but they insist on spending money on a closed system because free stuff can't possibly be as good, or maybe thats what they know and they don't want to change.
Don't bother with them, let them spend their money and go use your time on someone who deserves and appreciates it. Unfortunately some people are like horses and require their blinkers in order to work otherwise they get spooked.
For those interested in where the saying comes from its the Bible. The original King James says.
"Give not that which is holy unto the dogs, neither cast ye your pearls before swine, lest they trample them under their feet, and turn again and rend you."
(Matt. 7:6).
Dean sent me the following to parse the logs and see how many unique ip address's I was getting on a monthly basis.
grep 'Nov/2004' uk*.log | awk '{ print $1 }' | sort | uniq | wc
I wrote the following in Perl which does the same thing but I think I prefer Deans
perl -ne '/^(.*?)\s/; $a{$1}++;} END{for (keys %a){$c++;print "$_ == $a{$_}\n"} print "$c\n";' ukl*ss.log
maybe we could
perl -ane '$a{$F[0]}++};END{for (keys %a){$c++;print "$_ == $a{$_}\n"} print "$c\n";' ukl*ss.log
or maybe even
perl -ane '$a{$F[0]}++} END{for(keys %a){$c++;} print "$c\n";' ukl*ss.log
Or we could just
perl -ane '$a{$F[0]}++;END{print keys(%a)."\n";}' ukl*ss.log
bollicks to this. I am also sure there is some clever one liner in Perl to do this but I hardly ever use them so I will leave it to the reader to beat it ;)
I noticed that someone had a look for gimpy on my blog today and I was wondering what terms people are finding my site with so I ran the following over my logs
perl -ne '/.*google.*?&q=(.*?)(&|").*$/; print "$1\n" if $1;' *.log | uniq
I am sure there is a shorter and better way to do it but this was more than enough to have a quick look.
I have just finished the beta release of MT-SpamAssassin and so far so good. I have removed MT-Blacklist and everything is fine. I have not built the Bayesian database up completely yet since I don't have that many comments. If you want to try it you can download it here.
Please leave me some decent comments so I can seed the database ;)
Occasionally at work that we need to do some simple task that involves converting images or finding their sizes etc. The problems with tasks that are "Occasional" is that you can never remember the way you did it the last time.
What size if that jpeg, gif or png?
How can I resize that image?
You can't be bothered firing up gimp or some other tool so what can you do......
@debian:$ identify truman.gif
truman1.gif GIF 258x333 258x333+0+0 PseudoClass 32c 24kb 0.000u 0:01
That was easy, wasn't it. What if we needed much more info than this, well thats much harder we need to do the following:
@debian:$ identify -verbose truman.gif
The hard part is the extra typing. I will leave it to the reader to try that one (there is too much output for here).
What about those times when you just wish one of your images was half the size. Well here comes another great tool to the rescue
@debian:$ convert -sample 50%x50% truman.jpg truman_half.jpg
For those that are after a little bit more info on these handly little tools head on over to IBM developer works to get more information.
Even the article above only scratches the surface of what convert can do.
I asked on the Moveable Type support Forum if anyone would be interested in a plugin that uses SpamAssassin. There were no replies to the post so it looks like it is either longer such an issue in the blogging world or maybe its already been done and I have not found the link. Perhaps I posted it to the wrong forum ;) I would have thought that there would have been some interest in it but I was mistaken.
I wrote the plugin on Saturday and it is almost finished except for the pretty GUI. The Bayesian filtering is also working on it and I have tested it by scripting a few thousand spam entries into it and seeing if it would start spotting them and it did.
Thanks to the pluggable nature of Movable Type the plugin sits quite unobtrusively in it. I was after a much simpler solution than Blacklist without the separate GUI and management facilities etc and I think I could achieve this.
I intend to keep working at it and eventually use it on this blog so if you would like to try it contact me.
Its very true that you learn something new every day and today I learned that Yahoo are using Nutch in a research capacity.
Welcome to the Yahoo! Research Labs implementation of the Nutch open source search engine (www.nutch.org). This search engine is intended as a demonstration platform for a number of search related technologies
I found it purely by chance. If you don't believe then have a look at Yahoo's intall of Nutch. I think that its a smart move on their part because they get to see how it does its stuff and assess it. They may even be able to incorporate some of it into their own products.
I have spent a fair bit of time working on another website that had some of the most horrible HTML I have ever seen. I managed to actually upload the site last night and it is now live. I didn't design the site I just converted it to HTML Transitional that validates from some Dreamweaver mess.
I have already made a few entries about this in my blog so here's the link.
The people at Aerospace NDT realised they where not getting enough from their website so they contacted me to see if I could do something with it. I had a look at their site and wrote up what I thought of it and gave them some advice as to what I though could be done with it to improve its visibility etc. They seemed to like what I said because I got the job.
I am basically tasked with getting their site up the google ranks which I have already done and quite substantially. I was very lucky and they were unlucky in the fact that the single greatest change required to the site so far has been the removal of the splash screen. They were unlucky in this because their last developer had left them with a site that could not be seen by the search engine because there was not a single link off the splash screen. This also meant that in certain browsers without flash they could not actually see the websites.
I have made some fundamental changes to their site during the conversion from the old one so we should see an overall increase in the google ranks but time will tell. I am keeping a tally for certain search terms to make sure that what we do has a positive affect on the site so watch this space.
I am no librarian but today I got to put on my glasses and tell everyone to be quiet because I was investigating open source library systems. The first one I had to look at is
Koha is the world's first free Open Source Library System. Made in New Zealand by the Horowhenua Library Trust and Katipo Communications Ltd, the Koha system is a full catalogue, opac, circulation, member management and acquisitions package. To our knowledge Koha is used by public libraries, private collectors, university faculties, not for profit organizations, churches, schools and corporates. People from as far afield as Australia, USA, Canada, Estonia, India, Nigeria and Poland have installed Koha. Key features
This is apparently used by a lot of people and does MARC records searches etc etc.
The install was very swish (my idea of swish is not some flash GUI, a simple command line install is fine for me) which gave me the warm and fuzzies. It also came with some sample data which was nice. Different ports are used for different things which was a bit confusing because I initially went to the admin screen and was wondering where all the library data was meant to go when I discovered I needed to go to a different port number to actually use the library system.
I can't say I was too impressed with the interface. First off, its not very intuitive. This might be because I am not a librarian and don't really understand what all these funny numbers are for but I still couldn't get used to the look and feel of it. I suppose this could be customized with a little css.
The other thing I tried was to load a Z39.50 MARC record into the database from one of the online servers. This failed miserably and gave some very cryptic pop up boxes telling me I had not filled in some mandatory fields. It took me 40 minutes to realize that there are some mandatory fields that are not marked as mandatory on another screen. On filling in this it still refused to work. On hunting around the logs I noticed that when you carried out a Z39.50 search the log would be hit every second or two until you closed the search window. I can only assume this is a bug because I cannot think why you would want to do it otherwise.
One thing in its favor is that its written in Perl so if we do decide to run with it I should be able to patch or add things to it that don't work or that don't suit our install. Tomorrow I am going to be looking at phpmylibrary which from what I have read of it is quite nice.
We have been wanting a search engine at work for some time now so I started looking at Lucene. I downloaded it and got it running and doing some basic stuff but what we really wanted was something web based, ie an out of the box solution.
I suggested we try Nutch, so I spent today getting it running. Nutch itself is a piece of cake to get working, what wasn't so easy was getting Tomcat4 working with Nutch.
After much swearing and perspiration I finally manged to get it working and it is as sweet as a nut. We indexed just over 200 word documents in a few minutes (test machine is an old celeron) and gave it a whirl. Straight out of the box solution to your search engine problems. I was very impressed. I may have more to report on this next week because we might be putting it on one of the larger servers for a trial run.
What planet are ICANN transmitting from!
They have decided to change the policy on transfering domains ie if you are unable to respond to the transfer request and deny it withing 5 days the transfer goes ahead. What does this mean and why is it bad.
I am the sole contact for all of my domains which means if I was on holiday and someone initiates a transfer request and I don't respond which I won't because I am on Holiday I get back home and my domain has been given to somone else. The same thing would happen if I was in hospital. For those non techs out there the following is a good analogy.
You decide you would like to rent in London so you have a look around and get yourself a nice property and sign a contract for 2 years with a first option to extend if you want. You pay your deposit and move in. Its great people learn where you live they know where to find you and your little falt becomes prime location. Having the option to always rent this flat is also great because you want to stay.
Then one day you go on holiday and someone who wanted the flat decides to move in, under current rules they cannot. Under new rules if they knock the door and there is no reply for fives days they are able to break the lock and move in.
So when you get back someone has moved into the flat you spent so much time on and there is not a thing you could do because you didn't answer the door.
This is absolute nonsense and I can only assume ICANN are doing it because there is some way to make some money from all the court cases which are going to appear when the fraudsters start trying to snatch domains that they shouldn't have.
Luckily for me I use 123-reg.co.uk which posted me the following today:
Dear Customer,
On 12th November ICANN will introduce a new policy designed to make
transfers of non-UK domain names between Registrars quicker and easier.
From this date, if there is no acknowledgement from the domain
owner/admin contact within 5 days of a transfer request being made, the
transfer will automatically take place.
While a great step forward in ensuring domains can be freely
transferred by their owners, 123-Reg is concerned that this new system
could make it easier for your domain to be fraudulently transferred
away from 123-Reg. We would like to reassure you that we are taking
steps to guard against this happening to you. From the 12th, therefore,
all your non-UK domains registered with us will be automatically locked
so that only you can unlock them and initiate a transfer.
The new system will not affect your ability to manage your domain in
the usual way, and will simply mean that should you wish to change name
servers or transfer a domain away from 123-reg you will first need to
unlock it. This can be done quite simply from your 123-reg Control
Panel.
As we will be unable to accept liability if you unlock your domain and
an unauthorised transfer results, we strongly advise that you make sure
domains are kept locked at all times except when absolutely necessary
to change name servers or initiate a transfer.
Best Wishes,
The 123-Reg Team
Thankyou 123-reg for protecting me from the idiocy of ICANN which should now be named ICANN&IWILL.
I have just finsihed reading.
From: A Concise History of Mathematics
ISBN: 0486602559
Author: Dirk J. Struik
Edition: 4th
If one thing I can say without doubt this book is concise. It flys along at blistering pace and in just over 200 pages covers several thousand years of mathemtical history. If you are looking for a brief overvirew of the topic then this is the book.
It is also a great book to try and guage your interest in the topic. Its well written, well researched and enguaging so if you are unable rummage your way through it then I doubt one of the larger or more in depth coverages would suit you. This is of course coming from someone who has not yet read one of these but I am now looking at some of the older classics that I might try next.
One thing I have to mention is the citations. You could use this book to research topics in maths based on the amount of cited literature at the end of each chapter alone.
Personally I think the amount of work that has gone into this book is vast and in stark relation to its size. I would recommend it to any maths enthusiast or historian.





