ProDeveloperTutorial.com

Tutorials and Programming Solutions
Menu
  • Shell Scripting
  • System Design
  • Linux System Programming
  • 4g LTE
  • Coding questions
  • C
  • C++
  • DSA
  • GIT
  • 450 DSA Cracker
  • 5G NR
  • O-RAN

System Design Example 2: System Design for URL Shorter.

prodevelopertutorial February 16, 2019

So this is one of the most popular interview questions.

The theme of URL shorter is that the user will provide a long URL we need to return the short URL. If the user provides short URL we need to provide the long URL.

One of the simplest solution is to create a Map which stores a key value pair. But this solution will not be scalable and will not be distributed.

Hence we have to come up with a solution that is scalable and distributed.

The solution can be divided into 3 parts.

2.1. Memory Consumption on load
2.2. API that can be used
2.3. Application Layer

2.1. Memory Consumption on load

Before designing our system, we shall see that will be the storage space required.
Assume that Twitter it will have 300Million users per month. If your tweet is getting around 30Million per month traffic, means 1Million users per day.

And assume our URL length of the shortURL is 7.

So in our DB we need to save at minimum following fields:

LongURL -> The max length can be 2048 characters i.e. 2kb
shortURL -> The max length can be 17 characters i.e. 17byte [including domain name]
CreatedAt -> Date -> 7bytes
Total will be around 2KB of data per shortURL entry.

So for 30M users we generate 60 GB/Month. 0.7TB/Year. 3.6 TB/Year.

2.2. API that can be used

Here we create 2 simple API.
“createTiny(longURL)” this will create a shortURL from the long URL.
“getLong(shortURL)” this will get the long URL from the shortURL.

2.3. Application Layer

In this part we shall discuss on different methods on how to change a longURL to shortURL and vice versa and get the shortURL as unique as possible.

Let us understand how a user will use the service. Consider the example below:

system design tutorial example

So a user will make a API request using Rest, HTTP or any other protocol. The restAPI will go to a Load Balancer. A load balancer is used to distribute the traffic equally to multiple application servers.

Then the application server will take the longURL and convert it into shortURL and store it in the DB. And when the user sends a longURL request, it will take the shortURL and get the longURL from the server and return it into the client.

We can also have a cache server to store the popular URL’s. It can be memcache, redis or any other cache server available in the market.

2.4 Now we have understood the flow, now shall look on how to create a shortURL from the given longURL.

We shall discuss several methods in achieving the same:

Some of the assumptions that we have made is as shown below:

Below are the characters that are allowed in our shortURL.
“a to z”
“A to Z”
“0 to 9”
SO we have 26 + 26 + 10 = 62 characters.

Our shortURL will have 7 characters in length. So we get approximately 62^7 shortURLs. As this is a very large value, it will take years to finish all the values.

The DB schema will be a key-value pair. Where key is the “shortURL” and value is the “longURL”.

2.4.1 Method 1: Generate a random shortURl and check

In this method, we get a longURL and convert it into shortURL by using some random method. So once we generate “shortURL” one of the 3 are possible.

1st Possibility: You check the DB for “shortURL” by using a “get” method. If it is not present, then “put” the key-value pair.

But above method has a flaw. For example, if the server_1 will check of the random shortURL has been inserted or not, when it checks it is not inserted. Hence it will call a “put” method to insert the key-value pair. But at the same time if another server checks for the same random shortURL and tries to insert it, you will have a same shortURL pointing to 2 different longURL.

2nd Possibility: In this method we check the database, if the shortURL is absent, we directly insert it into the DB.

3rd Possibility: In this method, we insert shortURL along with the longURL into the DB. Then we get the shortURL and check if the longURL is same as the original value. If it is same as the original value, then leave it. Else again get a shortURL and insert into the DB and again check. Do this process untill you get the unique value.

In all the 3 different methods, we are at least doing one get method to check if the shortURL is taken or not. Hence we shall move to the second method.

2.4.2 Method 2: MD5 method.

In this method we use MD5 algorithm. It is a hashing function that generates 128 bits long hash. Here we take the MD5 value of the longerURL and take the first 43 bits of the result and get the shortURL. Again there is a probability of collision. Hence again we need to use a get method to know if the shortURL is already taken or not.

The only advantage is, MD5 will give the same result if the input is same. Hence in this approach, if 2 users are trying to generate shortURL for the same link, we can check our DB and give the same result instead of giving 2 random shortURL. Hence saving space.

So how to convert 43 bits long hash to a shortURL?

Once you get a binary number from the 43 bits, you take the decimal number.

For example, suppose when you convert 43 binary numbers into decimal, you will get 1362849. Then convert that number into base 62.

Once you convert the number to base 62, you get the numbers from 0 to 61.
Example:
60, 9, 30,0

Then all you need to do is to map it to the 62 characters we got it in earlier part. [A to Z, a to z, 0 to 9].

So
1 maps to A
2 maps to B
3 maps to C
.
.
.
.

This way you can generate the shortURL.

But this method also uses at least one get method to check if the shortURL is present or not.

2.4.3 Method 3: Counter based approach

In this method we can guarantee that there will no collision. Hence we no need to use get method. In the counter based approach, there are 2 different ways to achieve it.
They are:
2.4.3.1. Single Host
2.4.3.2. Range based approach

2.4.3.1. Single host approach:

In this approach, there will be a single host, all the application servers will be connecting to that host when ever it receives a shortURL request. Then the app server will get a number from the host, then the host will increment the number. Hence the app server can generate a unique shortURL based on the number. The drawback will be single point of failure [when the host is down, it will affect all the app servers] and bottleneck [when the number of request is high, it might take time to process all the requests]. The single host can be a database or a zookeeper.

2.4.3.2. Range based approach:

So in this approach, we divide the counter into ranges. And those ranges will be stored in a server, it can be zookeeper. Then we assign those ranges to a particular app server.

For example, we divide the first 5000 number into 5 parts.

1 – 1000
1001 – 2000
2001 – 3000
3001 – 4000
4001 – 5000

Here the first app server will come and selects the first range, zookeeper will reserve that range for app server 1. Similarly, app server 2 will take the next range. Assuming there are only 2 app servers, they will act on those two range. Zookeeper will increment the values every time an app server will ask for a value.
Suppose if the server 1 has exhausted its range, then it will contact zookeeper to give another value. Then zookeeper will give next range and it reserve that range.

Thus this model is highly scalable and guarantees unique shortURL.

2. 5 Some of the tools used here is

Load balancer
RestAPI
Zookeeper
NoSqlDB
CDN
MD5
Memcache

 

 

List Of Tutorials available in this website:

C Programming 20+ ChaptersC++ Programming 80+ Chapters
100+ Solved Coding QuestionsData Structures and Algorithms 85+ Chapters
System design 20+ ChaptersShell Scripting 12 Chapters
4g LTE 60+ ChaptersMost Frequently asked Coding questions
5G NR 50+ ChaptersLinux System Programming 20+ chapters
Share
Email
Tweet
Linkedin
Reddit
Stumble
Pinterest
Prev Article
Next Article

About The Author

prodevelopertutorial

Follow this blog to learn more about C, C++, Linux, Competitive Programming concepts, Data Structures.

Leave a Reply Cancel Reply

You must be logged in to post a comment.

ProDeveloperTutorial.com

Tutorials and Programming Solutions
Copyright © 2023 ProDeveloperTutorial.com
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Do not sell my personal information.
Cookie SettingsAccept
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT