PART 2: Close the Gates! Avoiding Bad Web Traffic at Scale.
DevOps Engineer Eleanor writes about the measures her team puts in place to fight bad web traffic. If you missed the first part of this article, catch up here.
Tools for Monitoring and Identifying Requests.
Web Server Logging
Like all web applications, ours produces logs for each request processed both from front-end web servers and from the application itself. All of our logs are submitted to Logstash and recorded in ElasticSearch servers. That makes it easy for us to monitor them manually when we want to learn more about them (using Kibana to query the information from ElasticSearch) and automatically when we know what we’re looking for.
Monitoring Complete Traffic
Web server logs are rich and informative, but sometimes we need to know more about the content of a request than we usually log in order to understand more about it. For example, we might want to read all the headers of a request to identify some clues or look at the request payload. In such cases, it is useful to look at the complete request and response. We use GoReplay, a great tool for web traffic capture which is very light on resources and easy to operate. The output is a complete dump of all traffic, which we can use to analyse requests and sessions and replay them to understand their behaviour over time.
Filtering and Blocking Unwanted Traffic.
Now that we have the information we need to identify the traffic we want to avoid, we can use it to establish tools for the automatic identification and blockade of bad traffic. We use a combination of different tools that can counter bad traffic at different stages of the request lifetime.
We use AWS and the CloudFront content distribution network to serve our application. CloudFront makes available a tool called WAF (Web Application Firewall) which allows us to write rules for blocking requests at the CDN (Content Delivery Network) edges. Requests can be filtered by their originating IP or by their content, and the advantage of blocking requests in this way is that they are stopped very early and never reach our dedicated infrastructure. WAF allows us to painlessly discard many requests without having to do anything special at our end, but it’s limited to simple rules that can only be updated slowly and only look at individual requests.
NginX request rate limiting
Some requests pass through the CDN to our front-end NginX web servers. At that point, they still look like legitimate traffic, but as they start arriving in large volume and at rapid succession, we can tell that they should be blocked. NginX offers a simple and very efficient module for limiting the request rate for a single IP. Each request adds or updates an entry in a table of originating IP addresses and the rate at which requests from that origin are made. When the rate for an origin crosses a threshold (which would be much more than enough for a well-behaving user, but can easily be achieved using a script) the server starts slowing down responses and eventually blocking them. This is a great tool for dealing with high volume attacks from single IPs like many denial-of-service attacks and very aggressive scanners and scrapers.
Processing requests with NginX and Lua
Finally, some of the processing we need to do in order to filter more complex and sophisticated attacks isn’t easily served by any off-the-shelf tool, since it requires custom logic that looks at specific patterns of behaviour that are typical for our application.
We want to track such behaviour using hand-crafted logic corresponding to the behaviour of our application, but we also need to do that efficiently and as early as possible, so we needed a tool that is able to process the requests and run code at the front-end servers, before the requests ever have a chance to hit our application itself.
The OpenResty project provides an NginX extension module that augments NginX’s rich built-in functionality with the ability to run custom code written in the Lua language. Lua, though not as popular and well-supported as other dynamic languages, excels as an extensions language and compiles just-in-time to very efficient code. The module allows us to hook our custom code to the request’s lifecycle, process incoming requests, maintain in-memory tables for tracking behaviour over time, and modify the response accordingly. All of that, with very low overhead which barely affects our servers’ capacity to process requests even when facing a very high load of requests. When an unwanted request is identified, it can be blocked there and then, without touching our application servers and interfering with our ability to serve requests from legitimate clients.
Monitoring firewall activity.
Now that we have the machinery for identifying and avoiding unwanted requests, we need to continually monitor its behaviour. Patterns of usage, both legitimate and unwanted, constantly change, and we want to keep on top of them, making sure that we are still handling all requests correctly, and that we re-adjust and refine our tools to respond better to attacks. Just like with all activity, we log all events, inspect them manually and report on changes automatically.
The activity of our custom NginX code is logged in real-time and submitted, via Logstash, to ElasticSearch, where it is available for us to query using Kibana. Key metrics based on these logs, as well as the logs of our CloudFront distributions, are processed and sent to DataDog for monitoring and alerting. When a change is detected, like a sudden increase in the rate of requests being blocked, we are notified. In most cases, all that is left for us to do is sit back and watch as our defences correctly handle the situation. From time to time we identify new behaviour that calls for analysis and readjustment and an opportunity to make our tools more accurate and efficient. Rest assured, though, the forces of the dark side do not stand still. Sooner or later we will face a new challenge which will be another opportunity for us to develop new methods for identifying attacks and protecting ourselves. Once this happens, we'll be sure to let you know!