Hosting High Traffic LearnDash Sites
Update: Please note that this post is a summary of tests we conducted to assess the number of simultaneous learners a site can support, not total learners in a system. LearnDash can accommodate tens of thousands of users in the database, but most hosts can’t accommodate that number of learners being on the site at the same time. The tests in this article were to assess how many simultaneous learners the hosting plans could accommodate.
At Uncanny Owl, we’ve been building and managing LearnDash sites for several years. In that time, you might think that we’d have a hosting solution for every possible type of LearnDash site, but we simply don’t. We have LearnDash sites on Flywheel, WP Engine, Cloudways, Kinsta, even Digital Ocean and Vultr instances. We set each site up on a platform that’s a suitable fit for the site and volume of learners, and generally we target accommodating 25 to 50 simultaneous learners. That might sound like very few, but we find it’s enough even for sites with several thousand learners. It’s just very rare for a large group of learners to visit the site and complete learning activities at the same time. As such, we target the site’s actual load to make sure our clients aren’t paying for more than the site needs.
Recently, a client asked us to develop a plan for ramping up to tens of thousands of learners quite quickly. It’s been a long time since we last did load testing on LearnDash, and we’ve never done it with higher-end tiers on managed providers. We decided to try out the client’s existing LearnDash site that was reasonably heavy (University theme with Visual Composer, a few dozen plugins and various customizations) to see how it would hold up. The site wasn’t one we created, but it’s representative of what a lot of people create with LearnDash.
We loaded the site up on 3 different hosts: a Professional plan at WP Engine ($99/month), Business plan at Kinsta ($287/month) and an 8-core/16GB Vultr instance at Cloudways ($264/month). We knew the load would really push the cores to their breaking point, so we had high hopes for the 8 core Vultr VM. We created a complex script for Blazemeter that had 1,000 total users sign in to WordPress and complete/navigate close to 60 LearnDash pages. Each page had a lot of shortcodes and a number of images or embedded video. We even had users complete a quiz. Clicks were mostly timed about 30 seconds apart, which was on the aggressive side for LearnDash use, but we wanted to see what it took to break the servers. Activity was spread out over an hour, with the number of users gradually ramping up. Because the activities didn’t take an hour to complete, we planned to see some virtual users roll off the test as others started (so we would never see 1,000 simultaneous learners, but we did want to see a few hundred).
It’s hard to tell from the chart, but Cloudways was great until about 75 users. Then page load times climbed to 5 seconds—and kept climbing. The test ended up peaking at a very high number of users because it got too slow for users to complete the script. By 100 users the response time was an unacceptable 10 seconds, and by 400 users, the error rate shot up. Because Cloudways kept getting worse and worse, and further behind, average page load time ended up being 67 seconds. The server was fine for 75 learners, but definitely no more. All cores were maxed out after that point. One thing to note: this server was held back a bit by not using a CDN. Both other sites benefitted from a CDN.
The Kinsta graph looks a lot better, right? The response time looks stable—until you notice that it’s stable at about 25 seconds. That’s far too long for users to wait. Response time on Kinsta is good until about 100 users. After that, it’s just too slow, and at 300 users we start seeing errors on top of that. On a side note, this was our first time using Kinsta and developer tools just weren’t as useful or efficient as on WP Engine. It’s not reflected in the numbers, but it’s definitely a consideration when we’re choosing where to host a site.
WP Engine Results
We ended this test early because we had the info we needed after 20 minutes. WP Engine failed early too; right at about 85 users. What was different on WP Engine was that load times didn’t increase; if response times got high WP Engine just threw (a lot of) 502 errors. A poor user experience, to be sure, but still better than Cloudways serving pages in 60+ seconds—at least users saw something. And the key takeaway is that the site was ok up to 85 simultaneous learners, which surprised us, given that WP Engine was the only shared environment and it’s one third the price of the others.
None of the results really impressed us; we hoped that at least one of the servers could have accommodated over 100 simultaneous learners. For now, at least, we put that particular client site on WP Engine. Performance is good enough for the current number of learners, it’s the lowest cost of the 3, we have an upgrade path there for additional learners, and the site tools we need are the easiest to work with. We just wish we didn’t have to always deal with caching and session/cookie issues on complex LearnDash sites hosted on WP Engine.
If you’re managing a LearnDash site that has a lot of simultaneous learners, we would love to hear how you’re hosting it in the comments!
One thing we pride ourselves on at Kinsta is our expert support, with a few easy changes (for example running the site on HHVM, fine tuning MySQL and so on) we could’ve doubled or tripled the concurrent user number you got with the default setup. We custom tailor the server software settings for almost all of our clients to be able to max out the hardware resources in each of our plans. When you’re ready to move hosting again, please contact us. Thanks!
Thanks Mark! HHVM wouldn’t have worked for this site, and I did speak to Sean and Support a few times to discuss what we were doing; tuning the server wasn’t offered. We’ll take another look after you make the move to Google and complete some more work on the client tools. I shared all of my feedback with Sean; I hope it’s helpful.
This is really interesting. I have a high traffic LD site and am having lots of problems. We really need to get the site stabilised so that we can grow comfortably. I’m starting to question whether LD can handle the kind of traffic we are getting at once. We work with schools so it is not unusual to have 100+ pupils logged on at once. Because they use the site as classes, we have many multiple users at the same time. Compounding the issue even further, they will all be doing LD quizzes and lessons at the same time. Yesterday, using a 10 GB VM with 5 cores, 30 pupils overloaded the CPU limits of the hardware. This is a feeble amount of users and one that I did not expect to cause these issues.
Do you think that LD has an absolute top end, regardless of the hardware?
Would love to hear your thoughts!
It’s not so much whether LearnDash itself can handle the load; it’s really a function of what’s on the WordPress site. On heavier sites with a big theme and lots of plugins, we can see 200+ SQL queries to generate each page, plus there might be 50+ assets being loaded, including media. It’s a lot, and caching options are limited! If there’s a significant amount of LearnDash content (e.g. thousands of topics) we’ve also seen related slowdowns; we’ve had to make changes to the LearnDash files/queries in some cases to help here. As a starting point, first we’re certainly going to try to keep things lean on the theme, plugin and customization side. It’s often not possible in a big way, but we’re at least mindful of it. Then we look at infrastructure. Certainly we’re going to be using caching and a CDN to help. But it is often a case of throwing hardware at it… I don’t know that there’s a top end, as hardware can always be scaled up, but it does quickly get more difficult and expensive.
Nice to see you put our service (Cloudways) to the test. We are confident we can score a bit higher if we optimize your sites correctly. Probably using Redis cache, CDN, and MariaDB will improve the performance of your sites.
Thanks for your comment! It’s not listed above, but I was actually using MariaDB and Memcache for the testing. No CDN and no Redis, however.
This is a nice post. It’s obvious you put some time in experimenting with different setups.
I should point out that LearnDash currently is being run on sites with tens of thousands of users without issue. There are generally two components that will dictate performance (of any site really, but also applies to LearnDash):
1. Server resources
2. Amount of course content
We always advise that if customers plan on having thousands of users on the site at any point in time with a lot of course content that they choose a scale-able hosting plan that can accommodate these requirements.
A hosting plan at $10-25 per month probably isn’t going to be sufficient.
For item #2 we do have some influence on our end.
In fact, our next point release has updates specifically around speed & performance. We’ve applied this fix on a few beta sites and the results have been positive. This update is right around the corner and will benefit learning programs of all sizes.
For anyone who has additional questions (or even concerns) around their learning program size and expected performance, do not hesitate to write us at http://www.learndash.com/contact – would be more than happy to discuss.
Thanks for your comment!
You raise a great point, and one I need to emphasize more: our test was to assess simultaneous learners completing activities on a LearnDash site, not total users in the system. LearnDash platforms can absolutely accommodate tens of thousands of users in WordPress, but without a lot of work and a lot of servers, it can’t accommodate them at the same time. This test did not consider total users in the system at all. We also didn’t change any LearnDash queries, which we do sometimes perform on larger sites to improve performance (our test site only had about 50 LearnDash pages, so the benefit here would have been negligible). Our intent was to take a somewhat typical LearnDash site and see how some WordPress hosts held up when we threw a lot of learners at it at the same time.
A very interesting article – many thanks.
We have been using LearnDash though as our users are now over 10,000, we too are increasingly experiencing issues when multiple users log in at the same time. Our site is specifically geared towards schools therefore typically classes are logging with users of approx. 30 at any one time.
Having increased our hosting resources considerably we’ve found that this has had a marginal impact. The main issue is the quizzing element for if users are mid way through a quiz, a spike in traffic can generally mean they are knocked out of the quiz and as there is no ‘save’ button, users have then lost all of their answers. It is therefore resulting in a frustrating user experience.
For LearnDash to truly work as an e-learning system, I really do think this needs to be looked into further as only having a very limited capacity of users able to log in at one time will result in larger elearning organisations having to look elsewhere. Unfortunately this may be what we will have to do too.
Thanks for your comment and sharing your experience! We have found that throwing more cores at LearnDash in particular helps (or splitting the infrastructure across multiple servers), so it’s interesting that you’re not finding that to be a benefit. It would be interesting (perhaps outside of the comments) to hear what you started with and moved to.
We’ve found the same challenges with LearnDash quizzes and try not to use them for tests with a lot of questions. In fact, we built a custom assessment tool that passes answers to the server in real time (there’s 2-way communication throughout) to ensure nothing ever gets lost and so quizzes can be resumed easily. It may be something we eventually release, but right now we use it for very specific use cases only; it’s not a drop-in replacement for LearnDash quizzes.
Have you looked at similar scalability with Sensei? I know you guys did a great review over here https://www.uncannyowl.com/wordpress-lms-showdown-sensei-vs-learndash/ and I was wondering how it compares.
We have not done any load testing with Sensei. Sensei interest from prospects has deteriorated a lot for us over the last 2 years and it’s very rare for us to work with Sensei. As such, we don’t really do any development or testing with it now.
What about running on the cloud using Aurora and S3 from AWS? It looks like you did a cloud-based test above. Did that not help? Or were these all traditional servers? Were they shared or private servers?
Since this was published we’ve done additional testing in other environments, including a big Google Cloud instance, but we don’t set up and manage our own environments. Everything is through some kind of WordPress host. As such, we can’t compare with a custom auto-scaling environment that you might set up on AWS and it’s not something we would test.
We are experimenting the same problem.
Appearently it’s a pure CPU load. However, we have few dozens concurrent accesses and 8 cores are loaded nearly at 100%
We look forward to having at least ten times tose concurrent accesses and throwing in 80 cores is something absurd.
Which are your numerics? How many concurrent accesses do you have and how many cores are you using?
Sounds like you’re on the right track with those numbers. 🙂 Optimize the site, minimize plugins, use PHP7, find any way you can to minimize CPU load, and try a managed WordPress host with a fast stack. Recently we’ve seen the best load and scalability results on Pressidium. We won’t be updating the results in this post though because we tested using a different site so a direct comparison isn’t meaningful.
I was having the same problems with our site crashing under load as the others in this post. Maxing out the cores (16 cores and 16GB RAM) and the memory on our hosting plan did not help. I took your advice and moved over to Pressidium and lo and behold we no longer have any problems with the site crashing. We we able to have 346 candidates login and take an exam at one time with no problem and a couple of days later we had 216 concurrent logins taking the same exam. So moving over to Pressidium seemed to do the trick. Thanks for that bit of information… it made all the difference.
That’s great news! Thanks so much for commenting and letting us know how your experience has been. We have a lot of data for sites that we’ve set up but we rarely get feedback about other LearnDash sites, so it’s very helpful to know that our experiences aren’t unique.
We upgraded to PHP 7 and started using Redis cache server. Result: in the same conditions where we used to have 100% CPU load, we now have 20%.
That’s great news! Thanks for following up to share the results.
Thanks for sharing your experience. I also have not had success with just increasing CPU and RAM.
Which Pressidium plan are you on?
I’m on the Pressidium Professional plan because I’m going to move some of my other sites over to Pressidium as well. I believe you get the same architecture benefit at all of their levels. Again, I tried everything and this was the only platform that worked for me.
Are you still happy with Pressidium? How many simultaneous/total users is it able to handle on the professional plan – still the same number you mentioned previously or more? I’m wondering if you’ve tested with more users.
I’m very happy with Pressidium. We’ve not had any downtime associated with simultaneous user logins since we moved to their platform. For total capacity issues, I would have you consult with the folks at Pressidium on that. So yes, I’m extremely happy and I’m looking to increase the number of learners in the coming year and I trust the Pressidium platform to handle it. Also, note that I’m using Cloudflare as well.
I know this topic is a bit old,but, are you on Pressidium yet?
I’m developing a LearnDash platform, and inclined to host with them after reading this post and comments.
We’ll expect about 100-150 concurrent user at the beginning.
Thinking on start with Professional plan on Pressidium. And also use cloudflare, for reduce latency on static content (my audience are in Brazil).
Some newbie questions – are all WP plugins equal in terms of impact on response time for courses, bandwidth, etc? Does it make a difference if we add plugins individually or wrap them up when we select a WP theme that has some of the plugins included? Thanks.
Every plugin has a tremendously different impact on site performance. Some plugins might be a few lines of code, others can be tens of thousands. A theme with lots of plugins included is often much slower than a clean, basic theme with some additional light plugins installed. Assume the more something does, the more of a performance impact it may have.
Excellent responsiveness to my question – thanks and understood. Is there a way to test a theme’s server load prior to installing?
No, it’s not possible to test themes in that way prior to installing them. There are so many variables, including PHP version, caching, other environmental variables; it would be difficult to get realistic and representative data.
Understood and thanks for your help.
The user experience shared here is immensely useful. I totally forgot that I contributed to this thread back in
Jan 2016. My how things have changed since then!
We’re now at 35000 users and are hitting 150 simultaneous log ins. Sounds good right? Not. The site is critically unstable because of it.
This week we chucked 18 cores and 60GB at the problem. The site doesn’t fall over but page time outs are frequent and it can be 60 + seconds to get a page to load.
Our user base is growing at an exponential rate. We have no alternative but to build our own, scalbale stack, from the ground up. That’s OK. We’ve worked hard and the business has reached the next level. However, we need to serve our current users with the site we have until we can move in to our new home. We are going to try several of the suggestions here but we think we need about 600 concurrent users over the next 9 months. I wonder if every single suggestion above is implemented, we can truly reach this level of load whilst powering it with LearnDash…?
I’m starting to think that we are going to need to split our site into regional, independent sites with sub domains to ensure that we can serve our customers.We are working through these problems as a matter of urgency so I’ll share experience when we’ve worked through this and stabilised the situation.
Hi Aaron, we can talk more on our call tomorrow, but one really high volume site we’re currently supporting can regularly hit 200 simultaneous learners. Under heavy load we do see LearnDash pages hit 5 second load times in the front end for learners.
Until a few days ago, loading quiz pages and profile edit pages could 503 the site, but that was entirely because of the volume of LearnDash content (over 1,100 quizzes, 4,200 topics, etc.). This week we rebuilt LearnDash queries on the quiz listing page and took load time from 100 seconds (luckily it’s an admin only page) to 5 seconds and the admin profile page from 150 seconds to 3–without losing any features. 🙂 On this site it’s going to make a BIG difference. And on the user-facing side we still have capacity to scale up (it’s currently in a dedicated 5-cluster environment) as much as needed.
Pressidium has been very helpful in optimizing a hosting environment for LearnDash performance and doing some benchmarking for us. They’ve gone out of their way to help and between their optimizations and our query optimizations, we can support a lot of students.
What do I need to ask our hosting plan providers if we want our site to be able to support 100-200 simultaneous users? We have 50GB storage, and 100GB internet traffic, but the comments above talk about cores and RAM. I have asked our provider, and they have stated that we’re on a Shared Server so we do not have a dedicated CPU/RAM.
We have not gone live yet, and intend to do it in phases. Phase I will only be a total of 20 users.
If you’re in a shared environment, and it’s not a managed WordPress provider, there’s no way to answer this question. You would need to do load testing, but even then the results would depend on other sites and other environmental variables. There’s nothing the host could tell you either because they won’t really know how the site is used or what operations are going to be really heavy (without a detailed investigation that they won’t be able to do). I would say that what you describe can’t safely handle over 100 users simultaneously, but the only way to get any kind of measure of what it can handle is with load testing. And in a shared environment, your host won’t like you doing that either.
In the intro you mention regularly setting up LearnDash sites to handle up to 50 simultaneous users. What would you suggest for minimum hosting requirements at that level? I have a client wanting help with their LD site that has a couple dozen courses, 30ish plug-ins, and just moved to the lowest level SiteGround cloud hosting (2 CPU/4GB RAM). We’ve already had slowing and exceeding the quotas with roughly 40 simultaneous users. She thinks it’s poor quality hosting, but I’m thinking it’s just not enough. Any advice greatly appreciated.
It’s unfortunately just too difficult to say. It depends on the plugins you use, typical user activity, the PHP version, volume of course content, customizations, etc. The data in this article is based on the exact same site in different environments, but the results may be entirely different for another site–this article just removes site variables so we could look at performance of this sample site across different hosts. The only way to know if hosting is the reason for performance issues is to trace why things slow down. Though, for comparison purposes, you could try it out on more than one host to see if it’s different.
Not meaning to dig in the archives, but is there an update to this?
Sorry David, we don’t have an update here. As we no longer offer consulting services (only products), we’re less involved with hosting selection and load testing real sites in a variety of contexts. We may pick this up again at some point, as it would be interesting, it’s just outside of the work we routinely do now.