Pastebin is a service, where the user can store plane text for example like source code or code review and share to other users via URL.
5.1 Before we design the system, below are the list of points to be considered
2. API Creation
3. DB design
7. Message Queue
5.2 The very basic logic to satisfy is that
1. When a user will paste the content, service should give the URL
2. When the user will paste the URL, service should give the content.
Along with the above 2 requirements, we shall also discuss below requirements also:
1. User can give expiration time for a URL
2. User can give custom URL also.
3. Limit per user for creation of URL upto 1000 URL for free tire.
4. Option to create public or private URL
5.4 API creation:
Below are the 2 basic API that we need according to our requirement:
create_paste(content, user_name, expiration_time, custom_url, is_private);
content: It will check for the length of the content [suppose 10MB is limit], and type of content if it is acceptable or not.
user_name: User_name to store in case of private url
expiration_time: User can specify the time that the URL should expire.
custom_url: User can specify the custom url if needed.
is_private: Check if the URL needs to be private.
This api will give the text when entered URL.
5.5 Database Design:
We need 2 tables:
5.5.1 User Table:
5.5.2 Short URL table
For storing the content, you can use AWS services. To store the tables, you can use Cassandra as it is distributed.
5.6 URL creation logic:
We can use key generation service to generate 6 character long string. The logic to create a shortURL can be found in chapter 2.
From the above image we can understand that, when the user will paste text and clicks on enter button, first request will go to a load balancer.
The loadbalancer will contact key generation service to get the short url. Once the short URL is generated, the content will be stored in AWS. Then AWS will give an url for the content it has stored.
That content URL will be mapped with the short URL will be stored in DB.
Data sharding should be added for distributed design.
Cache can be applied when the user is interacting the data a lot. You can use memcache service for caching.
5.7 Usage Calculation:
1. A user will paste 1KB
2. Short URL will take 7 bytes
3. Expiration date will take 4 bytes
4. Paste_path will take 255 byes
5. Is_private will take 1 byte
Then total per paste per user will be around 1.2KB.
Assuming there are 10Million paste per month, it will equate to
12.7 GB of data per month,
450GB of data in 3 years.
This completes system design for PasteBin like service.