Let’s start with the difference between URL, URI, and URN because there is a lot of confusion around these terms, and rightfully so. The W3C had to issue a recommendation after URI and URL started being used interchangeably in RFC’s.
- URI: This is the umbrella or super-set that contains URLs and URNs. URN and URL are URI sub-types. Some mistakenly believe that a URI is a URL without request parameters and based on the getRequestURI method in the servlet API it’s easy to see why.
- URL: a URI that specifies the network location of a resource such as http, mailto, and ftp.
- URN: a URI that specifies a namespace and an identifier. For example an ISBN number such as urn:isbn:n-nn-nnnnnn-n is a unique identifier but it doesn’t tell you where to fetch the resource.
According to the W3C we shouldn’t use the term URL anymore and instead just refer to them as URI’s. If we want to be more specific they recommend referring to it as the HTTP URI scheme, FTP URI scheme, ISBN namespace in the URN URI scheme, etc… Ugghhh. I still regularly use the acronym URL in conversation but I don’t think it hurts to understand the difference.
Now that we have that behind us let’s breakdown an example URL… errr… URI HTTP scheme into it’s various components. I think knowing the terminology is important so you’re not referring to “the part after the hostname but before the question mark”. Example:
https://gabrito.com/path/mycontroller/foo?x=1&y=2#anchor (the & is the XHTML ampersand entity reference and not a typo):
- Scheme (or protocol for URL’s): http
- Authority (Hostname and Port for the HTTP scheme): gabrito.com
- Path: /path/mycontroller/foo
- Path info (or extra path information): /foo. This depends on your environment but many web environments let you specify additional information beyond the file/servlet/cgi/etc… This is most often used for search engine friendly URL’s.
- Query (or query parameters or request parameters): x=1&y=2
- Anchor: #anchor
The scheme and host are case insensitive but other components are generally case sensitive. It’s my preference that entire URL’s be lowercase with dashes to separate words. This plays out well for SEO where a search engine will recognize /url-versus-uri as three separate words but most will only recognize the camel case variant /urlVersusUri as one word.
I Know.
URI is Uniform Resource Identifiers , provide a simple and extensible means for identifying a resource