Ambiguous URLs

Simon Willison argues in favor of clear URLs and ensuring that each piece of content on your server has only one URL. Even if the content at two different URLs is identical, machines see them as two separate resources and will treat them as such. This means that caches frequently store two copies of the same data and cache hit rates are reduced, costing you extra bandwidth and your site visitors extra download times.

Simon says…

Social link sharing sites such as del.icio.us can’t accurately aggregate links to the same resource. That … should catch your attention if you care about effectively promoting your site. Here’s a random example, plucked from today’s del.icio.us popular.

  • http://www.convinceme.net/ has 36 saves
  • http://www.convinceme.net/index.php has 148 saves
  • http://convinceme.net/ has 211 saves
  • http://convinceme.net/index.php has 38 saves

Combined that’s 433 saves; much more impressive, and more likely to end up at the top of a social sharing sites.

And you’re probably hosting content at multiple URLs without even realizing it. If you have www.yoursite.com and yoursite.com pointing at the same content, you have two URLs. Even if you allow both yoursite.com/SomePage/ and yoursite.com/somepage/ then you have two URLs for that page.

When one of your readers adds a feed into Feed Crier with www.yoursite.com/feed.xml and someone else adds it as yoursite.com/feed.xml Feed Crier sees these as two different feeds and will check each of the feeds. This can give you inaccurate subscriber counts, and adds to the load that Feed Crier places on your server, not to mention the load it places on Feed Crier’s indexing bots.

To alleviate this problem, we monitor the content of feeds being subscribed to. If two of them have identical content, they’re flagged for review. A human editor compares each feed that’s been flagged to determine if they really are the same feed. If so, we mark one feed as the real one and the others as duplicates. This way, no matter what URL someone uses for your site, we’re grabbing the right one.

Of course, all this would be easier if you didn’t have duplicate URLs.

Where to go from here...

Comments are closed.

© 2006 Adam Kalsey