In Talabat we have a big monolithic application written in .Net using MS SQL database with a single read node. With continuous growth in traffic and number of users, we are facing issues in terms of performance, scalability and robustness. We have bottlenecks when it comes to releases; independent release of features is a pain. Moving forward, Microservices seems to be a clear solution for our problems.
In this article I will explain the approach Talabat used to design microservices and move to AWS at the same time. This might not work for everyone but for Talabat, it looks very promising.
Microservices Design
Microservices, in general, appears to solve most of our problems. Converting a monolith app to microservices is not an easy task. Each part of the system is so tightly coupled that taking them apart requires extensive planning and effort. So, recently we decided to start with one least impacting area of our system i.e. Notifications microservice. This service will send all kinds of notifications e.g. Emails, SMS, Push Notifications, etc.
Now, designing a microservices architecture in itself is a big task. You don’t want to split your system into very small modules and make nanoservices instead of microservices as this will add a significant overhead in terms of maintenance and inter-service communications. At the same time, if you don’t split your system enough, you run the risk of having the same issues that we wanted to avoid by using microservices in the first place. Both approaches have their pros and cons and we had to be careful while finalizing the design. We decided to go with the following approach:
After finalizing this architecture, we had to make a decision whether to host these on our current physical hosting or move them to cloud. Physical hosting again is a blocker when it comes to scalability and robustness. It takes a couple of days to weeks just to setup one new server.Cloud service providers offer a wide range of out-of-box and managed services that are cheap and very fast to setup. For cloud services we opted for the most popular choice: AWS.
Using Containers
We are using Windows servers with IIS to host our applications built in .Net Framework. Now, if we just move to AWS as it is, we’ll still face scalability issues as Windows servers are not easy to scale. Plus we wanted to enable auto scaling feature of AWS to handle peak hours efficiently. The best available solution was to use Docker containers because they can run on any operating system. This was a big push in terms of infrastructure flexibility.
We can’t create docker containers for .Net framework. We had two options here, either go with a completely different tech stack like Python, Ruby, Node.js, etc. or use .Net Core. Going with a different tech stack required a huge learning curve and specialists in that technology. So, .Net Core was a clear choice for us due to its small learning curve and readiness of resources.
Using Containers
We are using Windows servers with IIS to host our applications built in .Net Framework. Now, if we just move to AWS as it is, we’ll still face scalability issues as Windows servers are not easy to scale. Plus we wanted to enable auto scaling feature of AWS to handle peak hours efficiently. The best available solution was to use Docker containers because they can run on any operating system. This was a big push in terms of infrastructure flexibility.
We can’t create docker containers for .Net framework. We had two options here, either go with a completely different tech stack like Python, Ruby, Node.js, etc. or use .Net Core. Going with a different tech stack required a huge learning curve and specialists in that technology. So, .Net Core was a clear choice for us due to its small learning curve and readiness of resources.
Container Management
Creating one application in docker and deploying it to AWS is very easy but when you have dozens of apps running in multiple containers and on multiple machines, managing them becomes a nightmare; especially when you also want auto scaling and machines are being created and destroyed automatically. So, we looked into two container management tools.
AWS has their own container management service called ECS. It is a good service with multiple features and is tightly integrated with other AWS services. But we went with Kubernetes as it is an open source project with a very active community.
Moreover, if we want to change our cloud provider in the future, it will be very easy and straightforward.
AWS Setup
After we finalized everything, we started to setup resources in AWS. First of all, we needed to setup the VPC that can be used to provision any type of resources in future. Below is the finalized VPC setup:
There are different tools available to setup Kubernetes in cloud. We used kops to setup AWS resources for Kubernetes. Kops will provision all needed resources to setup Kubernetes cluster in AWS. Once Kubernetes cluster was running, we deployed our pods and services and linked them to AWS load balancers.
To run an application on Kubernetes, first we set up an ALB with multiple target groups. These target groups will redirect traffic to all nodes in Kubernetes cluster on a specific Kubernetes service port. Then this Kubernetes service will load balance traffic for that service to all running pods.
Conclusion
Our Notifications service has now been live for quite some time. Only a single team is maintaining it without any major incidents as we have set up some alerts and reports to monitor the performance.
For this service, we are using Kubernetes cluster on EC2, RDS (PostgreSQL), ECR, EBS, Lambda, S3, SQS, KMS and CloudWatch.
Monolithic application is useful when you are building a new application. Because you have very limited resources, fewer releases and less traffic, having everything in one app makes sense. However, when your application grows and you have more resources, higher traffic and frequent deployments, moving to microservices architecture really helps.
But, microservices architecture is not a silver bullet to fix all your problems. It poses some new kinds of challenges that you need to address. This is a big topic in itself and requires a separate article but let me share some of them below:
- Performance monitoring
- Troubleshooting
- Inter-service communication
- Authorization chaining
- Change management
- Requires more resources