June 2004 Archives
Have you ever wondered how many telephone lines there are in Antarctica or what type of government is running Bolivia then look no further, our friends at the CIA have decided to publish a world fact book and its available on line. It has lots of interesting facts about more countries than you could imagine.
I have now added the facility for users to add jobs to their own web pages It's a simple operation of cutting and pasting two lines of HTML into a web page and you will get a feed from the database appearing. The feed can be customised with no knowledge of CSS. For those that know some css then you can customise the feed completely by writing your own CSS.
I went looking for a jacket the other day and decided to see if the "Savoy Tailors Guild" would be open. On searching around the internet for a local branch I came across
this article. Its quite enlightening. As a quicker reference the article suggests that the following companies will not disclose if they are sourcing any of their materials from Burma. There are a good few household names in this list.
Harrods, Bay Trading, By Design Plc, Benetton, Ciro Citteri, Elle, Etam, French Connection, Intersport, Karen Millen, Liberty, Mothercare, TCS, Animal, Naf Naf, Jo Bloggs, Jeffrey Rogers, Pied a terre, Savoy Tailors Guild, Shellys, Calvin Klein, La Coste, Young Fashion, Great Universal Stores(GUS), Argos Ltd, Claire's Accessories, MK One, Shoe Box, First Sport, Lillywhites, Hawkshead, Urban Outfitters, Mambo, Mexx, Paul Smith, Reiss, Hobbs, Jane Norman, Miss Sixty, Boxfresh, and LK Bennett.
I have been working on a Distributed Search Engine project as a hobby for about a year now and decided to put up a few more notes about it. I am starting to gather up quite a bit of information on the site some of which is better than others.
I had another bright idea the other day. Please have a look at
this free page website. I am interested in any feedback about it that anyone has.
I have been meaning to throw together some thoughts about Google Page Rank etc for some time and I finally got around to it tonight.
I have been doing some more work on the HTML parser to see if I could improve the speed a bit. I decided to change yyin to read from a file rather than stdin and this has made quite a difference to the speed. It is now faster than HTML::Parser (but its not as functional or tidy)
My next task is to either find a good extendable hashing library for C or call C++ std libs from C. I have never had to do this before so it could be fun. I need to be able to use either C's equivalent to the C++ standard map and non standard hash.
So far I have not had much luck finding a C hash library that fits the bill..
I was writing a custom built parser for the search engine but Mark Fowler asked me why I was not using Yacc and friends. At the time any reference I had found to parsing HTML had said that Yacc was not the correct tool and although possible for strict html etc it does not fit well in the real world were most html is rubbish.
I decided to see what flex could offer instead and I can now safely say that flex is the exact tool that I was looking for, "thankyou Mark". Admitedly I am asking flex to do a lot more than just spit out tokens etc but what it does suits my needs . It gives me the ability to generate arbitrary code dependant upon state while parsing a document which is Exaclty what I need.
I wrote a simple parser using flex++ and then decided to see how performance would compare against the perl module HTML::Parser. Hands down HJTML::Parser is faster and not by any short margin. I have not looked into why this is. I am now going to write one in plain C and avoid C++ to see how much difference we can get.
Well! It took me about 5 minutes to change the flex++ parser back to a C parser and the performace improvement is quite drastic. I imagine this is because of the way flex++ is generating the lex.yy.cc file because C++ is only margianlly slower than C and some would argue it is faster.
I did a bit more playing around with the C version and I think I might be able to make it faster than the HTML::Parser version (of course it won't be half as functional).
Testing was done as follows
file big.html == 10Mb of reasonably formatted HTML
perl HTML::Parser (with XS) == 0.5s
flex == 0.7s
flex++ == 8s
I am no guru at flex so I imagine that it could be a lot quicker once I get to grips with it.
I know it has already been done but I have been thinking about getting my own box hosted. The reason for this is that I seem to be using more and more resources on the one I am currently on and I don't want to be taking the piss since its hosted as a group share box.
This would enable me to do some things that I cannot currently do on the box I'm on at the moment, for instance.
I was thinking about setting up a similar facility to Geocities where people can create an account and they get some disk space to create their own website. The only requirement would be that they display some ads on their site. Other than that it would be free.
If anyone would be interested in this then leave a comment or email me.