Some RSS 27 Dec 03

I have been playing with RSS for a few days and since most of the RSS that I have seen has been blogs I decided to RSS enable my plain old XHTML diary to a whizzy RSS compliant new fangled jobby. I have no other reason for doing this other than possible self promotion via my massively increased site traffic and "NOT".....

I can hear people scream "use X or Y" do not write your own. What would be the fun in using someone else's RSS generator. I had a look at some of the more noteworthy blogs and I noticed that there is an awful lot of commented out text in the source of the file. This seems to me to be a bit ignorant because I am paying for bandwidth and every bit counts ;-). I know thats a lame excuse but I could not help it nor could I think of a better one. To cut a long story short I used a very crude method to do it.

Using a couple of extra "span" tags I was able to come up with some compliant RSS from my blog. The joy of Perl.
The Script I used

The following script is quite rough around the ages but is gets the job done. If you have any questions about the Perl or why I just had to write my own feel free.

#!/usr/bin/perl

use strict;
use warnings;
use HTML::Parser;
use URI::URL;
use XML::RSS;
use LWP::Simple;

my $base = "/hjackson";
my $base_url = "http://www.hjackson.org";
my $PAGES = {
"$base_url/cgi-bin/blog/december.html" => 'htdocs/blog/december.xml',
"$base_url/cgi-bin/blog/november.html" => 'htdocs/blog/november.xml',
"$base_url/cgi-bin/blog/october.html" => 'htdocs/blog/october.xml',
"$base_url/cgi-bin/blog/september.html" => 'htdocs/blog/september.xml',
};

my $STATE = { 'intext' => 0,
'intitle' => 0,
'inlink' => 0,
'inspan' => 0, };

my $RSS = { 'link' => "",
'title' => "",
'description' => "", };

sub start_tag {


my ($self, $tag_name, $attr) = @_;
if( lc($tag_name) eq 'span') {

if( lc($attr->{class}) eq 'blogtitle') {
#print "In Span $tag_name\n";
$STATE->{intitle} = 1;
}
if( lc($attr->{class}) eq 'blogtext') {
#print "In Span $tag_name\n";
$STATE->{intext} = 1;
}
}
if( lc($tag_name) eq 'a' and $STATE->{intitle} eq '2' ) {
#print "href = $attr->{href}\n";
$STATE->{'inlink'} = 1;
$RSS->{'link'} = $attr->{href};
}
}

sub text {
my ($self, $text) = @_;

if ($STATE->{intitle} eq 1) {
#print "Title = $text\n";
$RSS->{title} = $text;
$STATE->{intitle} = 2;
}

if ($STATE->{intitle} eq 2 and $STATE->{inlink} eq 1) {
$RSS->{title} = $text;
$STATE->{inlink} = 2;
}

if ($STATE->{intext} eq 1) {
#print "$text\n";
$RSS->{description} = $text;
$STATE->{intext} = 2;
}
if ( ($STATE->{intitle} eq '2') and ($STATE->{intext} eq '2') and ($STATE->{inlink} eq '2' )) {
\&create_rss();
}
}

sub end_tag{
my ($self, $tag_name, $attr) = @_;
if( lc($tag_name) eq 'span') {
if($STATE->{intitle}) {
}
if($STATE->{intext}) {
}
}
}

my $rss;
sub create_rss{
$rss->add_item(
'title' => "$RSS->{title}",
'link' => "$RSS->{link}",
description => "$RSS->{description}",
);

$RSS->{'title'} = "";
$RSS->{'link'} = "";
$RSS->{'description'} = "";
$STATE->{intext} = 0;
$STATE->{intitle} = 0;


}

my ($html_page, $xml_page);
while ( ($html_page, $xml_page) = each %{ $PAGES } ) {
my $content = get($html_page);
#print "$html_page \n$content\n";
$rss = new XML::RSS (version => '1.0');
$rss->channel(
title => "Harry Jacksons Blog",
'link' => "www.hjackson.org",
description => "Just my Blog",
dc => {
date => '2000-08-23T07:00+00:00',
subject => "Harrys Blog",
creator => 'harry@hjackson.org',
publisher => 'harry@hjackson.org',
rights => 'Copyright 2003, Harry Jackson',
language => 'en-us',
},
syn => {
updatePeriod => "hourly",
updateFrequency => "1",
updateBase => "1901-01-01T00:00+00:00",
},
);
my @tags = ('span', 'a');
my $p = HTML::Parser->new(api_version => 3);
$p->report_tags( @tags );
$p->handler( start => \&start_tag, "self,tagname,attr");
$p->handler( text => \&text , "self,text");
$p->handler( end => \&end_tag , "self,tagname,attr");


$p->parse($content) || die $!;
open ( FILE, ">$base/$xml_page")
or die "Cannot open file $!\n";
print FILE $rss->as_string;
close(FILE);
}

Add to delicious Digg This Add to My Yahoo! Add to Google Add to StumbleUpon
| | Comments (0)

Leave a comment

About this Entry

This page contains a single entry by Harry published on December 27, 2003 12:19 AM.

RSS Job Database 23 Dec 03 was the previous entry in this blog.

Should I add a comment feature 28 Dec 03 is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 4.01