RSS Hits the Mainstream

As I mentioned a few months ago, I thought that the first portal to aggregate feeds would be on strong footing indeed. Surprise! Somebody at Yahoo! heard me.

In connection with their revamped search engine, Yahoo! has been making a lot of changes behind-the-scenes. One is a new beta My Yahoo! module, RSS Headlines. The new module allows you to load up to 25 feeds per page in a format that looks similar to its news module.

I’m a little disappointed that they didn’t adopt Netscape’s old model of one module = one site, but this is really a step ahead of the competition. In fact, it enticed me to change Firefox’s home page to My Yahoo! I never bothered with home pages before.

Of course, the module is still in beta, so you have to jump through some hoops to get to it. First, it’s not listed in My Yahoo! by default; you’ll need to use this link to add it. Second, it is buggy from time to time. In fact, Slashdot had it banned at one point for hitting them over 200 times in an hour. Ouch. For the most part, though, I’ve found it very stable and useful.

You can read more about the Yahoo! launch at Jeremy’s blog, or some interesting essays about the Portal, Blog, and RSS ecosystem here.

And, for good measure, you can add Waileia to My Yahoo! here: Add to My Yahoo!

Understanding the Deep Web

An interesting Salon article describes Yahoo’s new Content Acquisition Program, which offers paid inclusion for deep-searching online databases. These treasure troves of information are often missed by search engines, which travel the links between dynamic pages cautiously.

Yahoo! has the right idea – search engines today aren’t capturing the best has to offer, because these articles are often behind query or login pages. Yahoo’s solution seems to be to extend their search engine to understand URLs of specific sites. However, many people are upset that this new program (which is basically a combination of premium offerings from their other properties) doesn’t clearly mark the “paid inclusion” links in their main index. Some people point out that paid inclusion is a conflict of interest for search engines. (One Yahoo! employee disputes this on his personal blog.)

Ultimately, I think the solution to the problem of searching the deep web will be based in XML. Perhaps what we need is a way of defining the API databases use. A language like WSDL is a good start, but WSDL doesn’t do a good job of capturing the semantics behind a web service call. What we need is a way to map the fields in a database to a common interface – something like what DBI and DB do.

We may also want to consider ways of telling spiders a little more about the sites we run. robots.txt is great, but an expanded language could provide advanced webmasters the ability to define infinite loops better, define different presentations of the same content, specify preferred crawl schedules, and more, allowing smart robots to find even more information at a site, and categorize it intelligently.

(Original link courtesy Slashdot.)

The URL Conundrum

Dean recently gave some good advice about how to write URLs that are people-friendly. While I agree with everything he said, it’s harder than it looks to make this work in practice.

We know that cool URIs don’t change, and that words are often better than numbers. The problem is that these two goals aren’t complementary. If you already have numbers in your URLs (which was MT‘s default when I started blogging), you have three options:

  • Change the URLs, potentially breaking many, many incoming links from elsewhere;
  • Change the URLs and do fancy redirections to ensure the old ones continue to work; or
  • Keep the bad URLs.

A while ago, I ran a link checker on Waileia. I was surprised to learn that all my links to a very prominent blogger were broken because this person switched from numbers to words without letting anybody know. I had to look up the new locations for all the links manually. If somebody who knows what they’re doing forgot this, imagine how it’s going to confuse a large number of casual bloggers.

HYCW offers a link to a solution at A List Apart, but I guarantee that nearly all bloggers will be clueless as to what the article is suggesting, let alone how to do it.

I’ve thought about making this change for a while, but have shied away because of the work I know it will entail. I may tackle this problem soon; if I do, I’ll try to write a how-to for the rest of you who want to do it the right way.

In the meantime, as the adage says, “If it ain’t broke, don’t fix it” – even if it’s a little unsightly.

Link Tending

One of the perks of being an Amazon.com Associate is free error reports from their other property, Alexa. I pulled one of these yesterday, and I was simply shocked to find 12 404s (ironically, the subject of this interview is having trouble with her page as I write this). Of these 12, two are obviously my mistake, while the other 10 worked at one point.

Webmasters and bloggers, heed well: a cardinal rule of the web, as expressed by Jakob Nielson, is “never let any URL die.” No excuses – I’ve heard them all before.

So, while I go clean up all the dead links, here’s some advice:

  1. Don’t kill a perfectly good link.
  2. If you must kill a perfectly good link, use mod_rewrite to send the visitor to the new URL.
  3. Don’t kill a perfectly good link.
  4. Have a good search engine so people can find what they’re looking for.
  5. Don’t kill a perfectly good link.
  6. Have a good 404 page with common links and tips for helping your visitor find what they’re looking for.

Oh, and one more thing – don’t kill a perfectly good link.

If you have any questions, feel free to leave a comment. In the meantime, please excuse me – I have some dead links to clean up.