Striking lightning on a dark sky, by Michał Mancewicz

A few days ago we received an email from our hosting provider (SiteGround) letting us know that our site had already “90% of the allowed monthly usage of CPU” and that, once we exceeded 100%, the “webservice would be limited” and we migh have “trouble accessing it.” Quite frightening, right? It definitely was a totally undesirable situation that we had to fix as soon as possible. But… where to start?

Statistics of our website during the episode of high CPU usage
Statistics of our website during the episode of high CPU usage.

Today we wanted to share with you our experience facing a fairly common problem in websites, explaining what we did to identify the culprit and how we solved the issue. This way, if you face a similar problem, you’ll have some ideas on how to get started…

Nelio Unlocker

Switch to WordPress safely while keeping your design and content

Improve your SEO today and boost your site speed by converting your pages into HTML, CSS, and WordPress standards. With zero technical knowledge required, you’ll only pay for what you need.

Reasons why you can have high CPU usage

WordPress is a content management system written in PHP. This means that the content it serves is generated dynamically by a set of PHP scripts: every time a visitor arrives at your website, WordPress processes the request (which is something like “please send me your home page”) and generates a response (in this case, it sends the home page). Clearly, responding to a request implies a certain use of server resources: one has to look at the request itself, determine what the visitor wants to access, fetch it from the database, generate the HTML response, and so on.

One of the reasons why a cache system accelerates the loading time of your website should be quite obvious now: it basically saves this processing time. When a specific request arrives for the first time (“send me your home page”), WordPress kicks off and generates the response. If a cache is in place, it stores said response before it’s actually sent to the visitor. This way, future requests to that same resource (in our example, a home page) no longer require WordPress to process anything; the cache can send back the copy it saved before, thus saving time and resources.

With this performance in mind, it is not difficult to imagine what are the reasons why we can see high CPU usage on our server:

  • You get too many requests. If a lot of users come to your website at the sime time, or you receive many illegitimate requests (someone’s probably attacking your server), WordPress will have to process all those requests and, therefore, the use of server resources will increase.
  • Requests are slow to resolve. If you have a lot of plugins installed or some of your plugins is inefficient for whatever reason, all the requests you get will take longer than needed, because WordPress will run a lot of inefficient code.

So it looks like a cache is a good guard against these issues, right? And indeed it is. However, please keep in mind the cache doesn’t “fix” the issue; it simply “hides” it. And this is important to remember because there are WordPress functionalities that cannot be cached and, therefore, will always require WordPress’ running:

  • Tasks scheduled using WP-Cron. WP-Cron is a WordPress mechanism for scheduling tasks that will run in the future. For example, WordPress uses it to publish scheduled posts.
  • WordPress’ REST API. The REST API is an interface that one can use from third-party applications to interact with a WordPress site by sending and receiving JSON objects. Some REST API requests might be cached (namely, GET requests), but others aren’t (POST and PUT). Therefore, it’s something that you can’t generally cache…
  • AJAX Requests in WordPress. Before we had the REST API in WordPress, we had to use its AJAX API to create dynamic websites. This API is quite similar to the REST API, because we can also use it to send to and receive information from the server. It’s a different system, but it’s subject to the same limitations the REST API is.

Analyzing the problem

First we have to identify why CPU usage has increased on our website. Has the number of requests to our website increased? Is it now slower to serve individual requests? To answer these questions we have a very useful tool on our server: the access log.

The access log is a text file in which the server logs each and every request it receives along with useful information about them. Specifically, the access log tells us when a request was received (date and time), who made it (an IP), what resource it requested (a URL), whether the request was successful, etc. Here is an example from our server:

66.249.83.82 - - [22/Apr/2020:14:04:59 +0200] "GET /es/blog/imagenes-gratuitas-para-tu-blog/ HTTP/1.0" 200 22325 "-" "Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko; googleweblight) Chrome/38.0.1025.166 Mobile Safari/535.19"
66.249.83.84 - - [22/Apr/2020:14:05:02 +0200] "GET /es/wp-content/uploads/sites/3/2018/07/aziz-acharki-549137-unsplash-540x350.jpg HTTP/1.0" 200 10566 "https://neliosoftware.com/es/blog/imagenes-gratuitas-para-tu-blog/" "Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko; googleweblight) Chrome/38.0.1025.166 Mobile Safari/535.19"
66.249.83.82 - - [22/Apr/2020:14:05:02 +0200] "GET /es/wp-content/uploads/sites/3/2018/07/Screenshot-of-Unsplah-website-768x520.png HTTP/1.0" 200 399577 "https://neliosoftware.com/es/blog/imagenes-gratuitas-para-tu-blog/" "Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko; googleweblight) Chrome/38.0.1025.166 Mobile Safari/535.19"
...
188.79.17.218 - - [22/Apr/2020:14:06:14 +0200] "GET /es/blog/problemas-mas-comunes-de-wordpress/ HTTP/1.0" 200 110741 "https://www.google.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.2 Safari/605.1.15"
188.79.17.218 - - [22/Apr/2020:14:06:16 +0200] "GET /es/wp-content/plugins/nelio-ab-testing/assets/dist/js/alternative-loader.js?version=52b0ff65c68ab39896d47d6ff673fd59 HTTP/1.0" 200 2763 "https://neliosoftware.com/es/blog/problemas-mas-comunes-de-wordpress/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.2 Safari/605.1.15"
188.79.17.218 - - [22/Apr/2020:14:06:16 +0200] "GET /es/wp-includes/css/dist/block-library/style.min.css?ver=5.4 HTTP/1.0" 200 7627 "https://neliosoftware.com/es/blog/problemas-mas-comunes-de-wordpress/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.2 Safari/605.1.15"
...

Let’s take a closer look at it:

  • 66.249.83.82. This is the IP of the device that made the request.
  • 22/Apr/2020:14:04:59 +0200. This is the exact date and time of the request.
  • GET /es/blog/imagenes-gratuitas-para-tu-blog/ HTTP/1.0. Then we see the visitor requested (GET) a post from our blog (in Spanish).
  • Mozilla/5.0 (Linux; Android ... This is the browser’s User-Agent and gives us some information on the device and operating system that made the request.

Notice how our server received multiple requests from the same IP (66.249.83.82) after the first one. This might look like an attack, but it actually isn’t: web pages usually include several assets (images, scripts, styles) and it’s completely normal that a visitor accessing a certain web page in your site will perform multiple requests to retrieve them all.

Well, in our case we were able to verify that, indeed, we had an unusually high number of requests. Extremely high, actually. And we knew that because the log file was way bigger than usual.

A possible explanation would be that there was a peak of visits for some reason… but according to Google Analytics, that wasn’t the case. So something different was happening.

A more detailed analysis of the access log allowed us to identify the following fact: more than 15% of all the requests we were receiving came from the same IP. And (drum roll) that IP was our own web server!

Gif showing a man having a great idea

Identifying the culprit

At this point we finally knew that it was our own server the one making so many requests that it generated a peak in its CPU usage. But why? Why was this happening? Who was generating those requests? These are tougher questions to answer.

First we looked at our logs again, filtering by IP and trying to identify a pattern that could shed some light on the issue at hand:

35.214.244.124 - - [22/Apr/2020:14:06:08 +0200] "GET /es HTTP/1.0" 301 - "https://neliosoftware.com/es" "php-requests/1.7-3470169"
35.214.244.124 - - [22/Apr/2020:14:06:08 +0200] "GET /es?..." "php-requests/1.7-3470169"
35.214.244.124 - - [22/Apr/2020:14:06:18 +0200] "GET /es?..." "php-requests/1.7-3470169"
35.214.244.124 - - [22/Apr/2020:14:07:21 +0200] "GET /es?..." "php-requests/1.7-3470169"
35.214.244.124 - - [22/Apr/2020:14:07:24 +0200] "GET /es?..." "php-requests/1.7-3470169"
...

and we found one: the User-Agent of all these abnormal requests was php-requests/1.7-3470169. Interesting!

WordPress has several functions with which to trigger requests: wp_remote_request. If you take a look at the source code of these functions, you will see that they basically wrap a few methods from a class called WP_Http. This class is interesting because, by default, all the requests it makes set the User-Agent to “WordPress/version”, so it was possible that this had something to do with our problem. But it didn’t… yet.

If we keep inspecting WordPress’ source code, we see that WP_Http uses another WordPress class internally to actually make the requests: Requests. And boy is this class interesting: at the very beginning of its definition we see it defines a constant named VERSION whose value is 1.7-3470169. And a little bit later, it uses this constant to build the User-Agent we found in our logs : php-requests/1.7-3470169.

Brilliant! We have now confirmed that all these weird requests we get come from our WordPress site. This probably means the culprit is a plugin… but which one?

The idea we had to figure this out was quite simple: if we modify the User-Agent so that it includes the name of the plugin that uses the Requests class, we’ll see the name of the plugin in our server’s access log. And this is actually quite easy to achieve. All we did was edit the Requestsget_default_options function with the following snippet:

$trace = debug_backtrace();
$files = [];
foreach ( $trace as $log ) {
  if ( false !== strpos( $log['file'], '/wp-content/plugins/' ) ) {
    $files[] = $log['file'];
  }
}
if ( empty( $files ) ) {
  $debug = 'no-plugin';
} else {
  $plugins = array_map( function( $x ) {
    return preg_replace(
      '/.*\/wp-content\/plugins\/([^\/]+)\/.*/',
      '$1',
      $x
    );
  }, $files );
  $plugins = array_unique( $plugins );
  $debug   = implode( ' ', $plugins );
}
$defaults['useragent'] .= " (NelioDebug {$debug})";

Let’s see what it does step by step:

  • First we get the execution stack with debug_backtrace. This is a PHP function that generates a backtrace, revealing all the functions that have been called to reach the current one.
  • For each element in the execution stack we have information such as the function that has been invoked, the file and line in which it is defined, the arguments it has been called with, etc. What we want to focus on, however, is the file where the function was defined: if it’s in wp-content/plugins, we know for sure it’s a function defined in a plugin.
  • Once we have processed all the elements in the stack, we simply need to get the names (if any) of all the plugins we’ve found and include them in the useragent variable.

After extending WordPress as described, we quickly started to see who the culprit was:

35.214.244.124 - - [23/Apr/2020:10:59:08 +0200] "GET /es HTTP/1.0" 301 - "https://neliosoftware.com/es" "php-requests/1.7-3470169 (NelioDebug culprit)"
35.214.244.124 - - [23/Apr/2020:10:59:20 +0200] "GET /?..." "php-requests/1.7-3470169 (NelioDebug culprit)"
35.214.244.124 - - [23/Apr/2020:10:59:36 +0200] "GET /?..." "php-requests/1.7-3470169 (NelioDebug culprit)"
35.214.244.124 - - [23/Apr/2020:11:00:01 +0200] "GET /es?..." "php-requests/1.7-3470169 (NelioDebug culprit)"
35.214.244.124 - - [23/Apr/2020:11:05:21 +0200] "GET /es?..." "php-requests/1.7-3470169 (NelioDebug culprit)"
...

Solving the problem

Once we knew the name of the offending plugin, we simply contacted its developer and asked for help. They replied back and told us how to overcome the issue while they worked on a proper solution. Luckily for us, things got better quickly:

Statistics from our website after correcting a problem with CPU usage
Statistics from our website after asking for help.

As you can see, one of the great things about free software is that we can explore the source code of the applications we use and adapt it to our needs. In this case, we were able to modify WordPress itself so that it could reveal some information we needed.

I hope you liked the trick of using debug_backtrace to find out who run a certain function. Sure, it’s not an orthodox method, but it’s quick to implement and, so far, it’s always proven to be extremely helpful.

Featured image by Michał Mancewicz on Unsplash.

6 responses to “Finding the Culprit of High CPU Usage in our WordPress Server”

  1. Thomas Avatar
    Thomas

    “All we did was edit the Requests‘ get_default_options function with the following snippet”…

    And where/how did you enter this codesnippet?

    1. David Aguilera Avatar

      I edited WordPress’ itself. In particular, the file I linked thereafter: wp-includes/class-requests.php.

      Usually, editing WordPress core files is not something I’d recommend… but in this particular instance it was quite helpful 🙂

      1. SH Avatar
        SH

        Hello – would you be able to advise where abouts we can add the snippet for the tracing? I understand in /wp-includes/class-requests.php – but did you just paste it in after line 526, or somewhere else? Any help much appreciated!

        1. David Aguilera Avatar

          I think I added it right after the function signature (right after line 502).

          1. SH Avatar
            SH

            I modified the class-requests.php file with your snippet, but can’t seem to get it logging out the actual culprit info – maybe I am entering in the wrong please?

          2. SH Avatar
            SH

            Just realised that the entries in the access log don’t have php-requests/ in it, which is probably why! We have 2200 lines in the access log over a 5 minute period on a fairly low usage WordPress site… Doesn’t seem to have anything odd, just GET requests to different plugin folders etc. But during this time my CPU usage was hitting 100%, approx 35% on mySQL, and 4 php-fpm processes for about 15% each on WordPress. Any tips on how I can find out where exactly the usage was from?

Leave a Reply

Your email address will not be published. Required fields are marked *

I have read and agree to the Nelio Software Privacy Policy

Your personal data will be located on SiteGround and will be treated by Nelio Software with the sole purpose of publishing this comment here. The legitimation is carried out through your express consent. Contact us to access, rectify, limit, or delete your data.