Skip to main content

Case Study - URL Shortener

Greetings!

There are URL shortning services like https://tinyurl.com/, https://bitly.com/ for the obvious reason. Shorten the URL to save space! Be it a twitter message, an email, your resume, it is a handy way to give your valuable URL to others. Let's study the underline techniques together.

It looks really simple piece of software at first glance, and it is. The real implementation doesn't look that scary. However, overall architecture give us a whole set of ideas.

Related articles

https://www.slmanju.com/2021/07/basebase62-encoding-with-java.html

What does it do?

Any URL shortner will have 2 main features.
Give a short URL for any given URL.
Redirect to the original URL when we enter the shorten URL in a browser.

Just 2 features and what is there to talk? That is the interesting part of this. It touches wide range of architecture decisions when we really implement. I personaly haven't experienced all those (obiously would love to get the real experience).
  • How do you shorten the URL?
  • How do you store data?
  • How can large number of concurrent users access the website without performance drop?
  • How do you scale the application?
Let's start!

Functional Requirements

  • Get a short URL from a long URL
  • Redirect to original URL when a user tries to access Short URL

Non Functional Requirements

  • Our system should be highly available
  • System should work with low latency

Estimate the capacity

Eventhough we have only 2 main features, it will generate a lots of data. We can easily think that this will have more read operations than writes.

Let's assume;
We have 100: read:write ratio.
There will be 100M new URLs per month.
That means 100 * 100M = 10B reads per month.

Write requests
100M / (30 * 24 * 3600) = 40 URL/s

Read requests
40 URL/s * 100 = 4000 URL/s

Which gives us 4040 requests per second. This doesn't look that much as modern servers can easily handle this rate.
Now if we assume 1 data is about 1kb.

100M * 5 * 12 * 1kb = 6TB

To improve the performance we obviously can use some caching solution. Using 80:20 rule, we will cache 20% of daily requests.

4000 * 3600 * 24 = 345600000 requests per month
345600000 * 20/100 * 1kb = 69GB

URL create requests - 40/s
URL read requests - 4000/s
Database storage - 6TB
Cache memory - 69GB

Let's define the algorithm

As we need to provide a short URL, we need a mechanism to map original URL to something short. Let's assume we need 6~8 characters short URL. As per above figures, we obviousely cannot use any primary key or id for this since it exceeds the limit. This is where we need to use BaseN encoding. One of the famous way to solve this is to use Base62 encoding. I already have written a little article on this. You can read it here. In short, we will create unique ids and convert it to Base62.
Generating unique identifer is again a tricky thing. As it becomes single access point for all the write requests. I have written an article on how we can use MongoDB to create unique identifiers. You can read it here.

How can we choose the database?

As per the above estimations, we can conclude that both relational and non-relational databases will fit perfectly for this. As no database is magically faster we can choose what fits for our need as well as the budget.
One drawback with NoSQL is, it doesnt give us an unique id or sequence for the Base62 algorithim to work. However, considering the request count, MongoDB solution also will not have a problem. Anyway, as I need to select one, i'll pick NoSQL solution like MongoDB or DynamoDB for the simplicity though even MySQL will be a good fit.
We can even use RDBMS as a key generation service or even pre-populated database.

Http codes

For new shortned urls, 201 can be obvious. However, for redirection we can use either 301 or 302. As in future we might need to generate stats, 302 can be the best choice.

Conclusion

Creating a URL service looks like easy. As i'm doing this as a learning excercise, you can obviousely see that there can be more to this. However, learning is fun!!!

Happy learning :)