Yeahh.

Why?

I am part of a student chapter and last year we were conducting a coding event. And I thought, every other chapter and club is doing the same thing, why don’t we do something different? I suggested the team why not conduct the event in a platform that we created ourselves.

And the first question I got was, “Who is going to do that?”

I thought I could do it, so I sat down and started working on it.

There’s a famous law called Parkinson’s Law which states:

“Work expands so as to fill the time available for its completion.”

While the idea to create our own coding platform was good, the idea came too late. I had I think two and a half weeks till the event.

But I decided to just give it a go. I started working on it, probably 4 - 5 hours every day. I would wake up at 5 in the morning and work till 7AM, then go to college and work on it in the evening from 7PM to 9-10PM.

And yes, I completed it. Our chapter already had a website, so I integrated this into our own website and it was ready to go. The problem I didn’t see coming was scaling. Since we were short on funds at the time, we resorted to using free hosting services to host the server. I knew this MIGHT be a problem, so I urged my team to test it multiple times.

And we did, we tested with everyone in the core team and a few of my friends, ~20 people. Everything was working fine. Everyone was happy!

I told my team to test it with the help of any faculty, tell their entire class to test it. That’s where we hit the wall. The server was not able to handle the load. First 35 people were able to access the platform, but after that, slowly everyone started seeing blank screens, verryyyy long loading times, and some even got a 502.

The Problems

While the free hosting service was one of the reasons, I think the main reason was the way I designed the platform. I didn’t know about a lot of things.

Problem #1:

I wrote a pure REST API, you send a request to run a piece of code, the server runs it, evaluates with the test cases and sends back the result. This is something called long polling. The server is waiting for the code to finish running and then send back the result. Bad.

Long polling illustration

Problem #2:

One for all, all for one. A single server was doing everything. And it was running on a free hosting service. So when the server was busy running a piece of code for one person, all the other people had to wait. Bad.

Problem #3:

No caching. Unlike the first two problems, I actually knew about this. But due to the time constraint, I didn’t implement it. Bad.


So we had to resort to using a different platform for the event. Bummer, I know.

The Solution

I have started working on the platform again. But this time the entire architecture is different. And no I’m still going to use a free hosting service, but I have some hacks for that too ;)

Current Architecture

  • Distributed system: I have a number of servers just to run, evaluate the code, and separate servers for the other functions.
  • EDA: User submitting the code triggers an event, one of the servers picks it up. Completion of the run triggers another event, another server picks it up. This goes on and on, until the final result is sent back to the user.
  • Caching: A lot of caching. I have implemented caching at every possible place. The code, the test cases, the results, the user data, everything is cached.
  • Load Balancer: I have implemented a load balancer. Okay I wouldn’t call it an actual load balancer, because it is the bogo sort of load balancers.

The site is still in development, here’s the Link.