In this article we will summarize the best practices when dealing with service downtime as well as excessive server resource usage caused by customers on a shared hosting server.
The most important goal is to provide a more stable hosting solution for the majority of your clients. If you are keeping the server load and services downtime to a minimum thus optimizing website performance for all users on the server - you are already providing a much better hosting solution for them.
Most of the web hosting users expect 2 things and those are of great importance for them:
- High service uptime
- Fast responsiveness for all services on their hosting account
If you can provide those 2 you already have a happy paying customer who will most probably recommend your services and bring you new clients.
Determine Problems and Provide Appropriate Solutions
You should always make sure you understand what is the problem with the corresponding server and the users that are causing the issues. Providing an appropriate solution is also one of the main goals you should set before you. We will give a few examples here:
- If the customer's website was compromised or simply using a not well written script - you would want to point the specific script causing troubles and request him to either fix the problem, upgrade the corresponding application or completely remove it.
- If the customer's website is getting an increasing number of visits you should advise him to consider upgrading to a more powerful solution that would be able to better handle the needs of his website.
Upsell dedicated solutions and gain revenue
Upgrade users with very popular websites and/or websites that are causing high resource consumption to a dedicated or semi-dedicated solution. This is one of the best ways to gain revenue and customer respect. Instead of trying to force a customer away from your services you should always seek a way to provide him with an appropriate solution.
If a paying customer is causing high server load and thus endangering the performance of other websites on the same server is moved to a more powerful solution - this is a win-win situation:
- You are not losing but instead gaining revenue
- The customer is using a more professional solution that would better suit his needs
Identifying Problematic User Accounts
Identifying Problematic Servers
First of all it is always important to prioritize the issues. You would like to always review servers with more service downtime and higher CPU Statistics first. This is where the corresponding tools of the 1H Admin Portal will help.
Prioritize Issue Handling
When trying to determine issues with hosting servers make sure you prioritize the case importance. Looking at the Downtime Statistics for your servers you should take those on top (with greater number of recorded services downtime and longer total downtime). Observe those servers and the corresponding logs. Under the Downtime Stats you will be provided with more information which services were down. This will greatly help determine the cause of the issue.
How to Determine if a Server is Problematic
If there is a significant downtime for a single server - this is certainly a problem. You should be aiming at 99.5% web uptime at least. In the process of resolving issues with problematic users on the server this percentage should jump to 99.9% web uptime for your servers. In case of service outages you should review the server and the users on it and take the necessary actions to minimize the downtime.
The CPU Stats for the server are also extremely important. Most commonly you would like to check for servers with higher than usual records for scripts execution time. It is advisable to thoroughly review the Hive - CPU Statistics section for more information regarding this interface and how to check for the needed information.
Identifying Specific Accounts Causing Resource Over-usage
Once you have determined the server with an issue it is time to go to the next step and check the specific users that are causing problems on the server.
It is always important to determine the problematic accounts for the server and provide an appropriate solution for the clients. You will most commonly see the accounts causing issues in the top users in the Local CPU Stats for the server and thus you will be able to quickly identify them.
Here are several numbers you should be looking for when trying to determine whether a single user account is causing excessive load for the server. Those are average values and have proven to work fine for hosting companies:
CPU Time Usage
- 1000/hour per account
- 24000/24hours per account
- 800/hour per account
- 20000/24hours per account
Those are example values and you should not be concerned at all about accounts that are within those limits. Note that the CPU Time Usage refers to the Real Time graph for the account and the Script Executions refers to the Executions graph.
Accounts that are keeping server resource usage around the provided values should be monitored in the future in order to make sure they will not become problematic for the server. Normally you would not have more than several users on a single server that go close to those values.
Accounts that are high above the mentioned limitations will be problematic for a shared hosting server and a resolution should be sought in this case. Such accounts will be either abusive or running very popular website. There are several common cases that would result in accounts causing excessive resource usage or even services downtime.
- The customer account was compromised - if this is the case malicious scripts should be removed.
- Not well written scripts - in this case the scripts should be either optimized or removed
- Very popular website with lots of visitors - the customer should consider upgrading to a more powerful solution.
How to check for problematic accounts?
As explained above - all accounts with high server resource usage can be checked in the Local CPU Stats interface for the server. Detailed statistics about the account resource usage can be found there.
At that point you will see the problematic account on the server. It is now important to determine what is causing the high load for this specific account. For this purpose the local logs for the server where the account is located need to be checked.
First it is advisable to check the SuExec log in order to determine what are the most commonly executed scripts for the user in question. The main SuExec log is:
- If the total process count is not very high but consistently for all processes the CPU Time is higher than the issue is with the script itself.
Note that you will already have this information from the Local CPU Statistics interface for the server. Script executions, total CPU Time and average CPU Time per script execution are displayed there. It is good to check the SuExec log in order to determine which is the exact script causing the high resource usage.
Important: In case the 1H Guardian software is running for your servers it is good to check the Guardian kill log for terminated processes for this user. The Guardian kill log is:
/usr/local/1h/var/log/guardian-kills.log - All processes killed by the Guardian are recorded here
If a process was killed by the Guardian it was either due to Critical Load on the server at this point or due to the fact that the process was running for too long and thus exceeded the limitations for process execution set in Guardian. Those are described in more details under the Guardian Configuration section.
If user processes are constantly being killed for too long execution time - this is a problem with the script that is running. In the Guardian Kill log you will also see the exact command that the user ran so you can determine what is the script causing the high CPU Time usage.
- If the average execution time for a process is not high but the total script executions count is more than normal there are two possible cases:
- The website might be very popular and getting a lot of visits
- There might be a self referring script that is causing loops for the web pages
At this point it is advisable to check the domlogs for the domain name that is used for the specific account's website. The domlog for a specific domain name is:
Where $domain is the actual domain name for the website.
An alternative is to check the Web Statistics for the website. Those can be reviewed in details via cPanel and we recommend Awstats as it provides the most detailed information.
- Check the Mail log. It is possible that the high server load is caused by multiple mail processes spawned by the user. For example if you are running Exim for your servers - check the
Important: Why should you go through all these steps?
As we will explain below it is of utmost importance to provide relevant information to the customer. When you are aware of the exact issue and what is causing troubles on the server you will be able to propose a suitable solution for your client.
How to prioritize handling problematic accounts?
Note that it is not a good practice to take one server and try to lower the resource usage for all accounts that you see over the certain limits at once. Most commonly there will be single or just a few users that far above everyone else on the sever. You should make sure issues are resolve with those first. There are two main reasons for this:
- Those top users either have
- their account compromised (for example via an outdated application) or
- they have an extremely popular website
If their website was compromised immediate actions should be taken in order to resolve the case. This way you would certainly gain a loyal customer.
If the website is extremely popular and especially in the cases when its popularity is growing, it is of great importance for the customer to consider upgrading to a more powerful solution as soon as possible. This way he will be safe from a possible service outage due to the fact that he is not the only user that is utilizing the server resources. Also his popular website will not throttle the functionality of other websites on the same server.
Which leads us to reason number 2:
- In most cases when the users with the highest usage have taken the appropriate actions - overall resource usage for the other accounts on the server will drop.
Another useful functionality of the 1H Admin Portal is the Top Users Statistics available under the Hive - CPU Stats section of the Portal. There you can check the details about the Top Users from all servers added to existing Server Groups in the Admin Portal.
As this tool gives you information about users on all servers in a descending order it can be easily used to determine all users with high resource consumption on all servers.
Communication with the Customer
Here we will provide several guidelines you should use when communicating with a customer that is running an account causing significant resource over-usage.
Provide relevant information and statistics
When addressing a customer regarding an issue with his account make sure he is treated respectfully. Every user would want to know what is the problem with his account and he will request as much information as possible. When contacting a customer try to follow those guide lines:
- Contact him both via email and using an internal way of communication for your company (Help Desk, Trouble Ticket System or similar).
- Give him enough time to address the issue and get back to your with questions or comments - The exact period might vary depending on the severity of the issue, still give at least one or two business days before taking any drastic measures.
- Provide as much information as possible. This might include:
- Total number of scripts executions and CPU time usage
- Which is the script/application causing the high resource usage.
- What are the options for the user - provide at least two.
- How much time does he have to react and what actions you will take in case the issue is not addressed in time.
Explain where the customer could verify the provided data
Each user can be provided with the full report generated for his account in Hive. Do not hesitate to give all information to your customer and specify how he is exceeding the server resource usage for a standard hosting account. In most cases relevant information can also be easily withdrawn from Awstats available in cPanel. The user can check the statistics generated by awstats. He can check total visits, page views and hits for his account as well as get detailed information about the actual URLs that was accessed the most hence what caused most script executions.
Important: Google analytics is another nice way of measuring website popularity. However, google analytics code might not be included on all pages of the website hosted under the specific user account and thus not generating full statistics. A customer might argue that according to google his website is by far not that popular - in such cases you should explain that those statistics are generated on server side and thus more reliable as no data is missed.
Explain how the website popularity/resource usage affects the server
It is important that the users understand from the initial notification about the issue that the fact that his website is using a large amount of the server resources might affect other websites on the same shared hosting server. In the case that the website is very popular and getting a lot of visits it is understandable that he will need a more powerful dedicated solution for the needs of his website. This way the user himself will not be dependent on other users on the same server and thus not risking any downtime due to outside factors. If the resource usage is caused by malicious or not well written script - the account might endanger the functionality of other websites on the same server and thus the issue should be addressed by either removing the script or resolving the problem with it.
Provide appropriate resolution
Upgrade to a more powerful solution
In case the customer website is growing in popularity - the provided solution should be upgrading to a more powerful dedicated or semi-dedicated solution. This way he will receive a more powerful platform for the needs of this website in the same time not endanger the website performance for other users on the shared hosting server and of course bring the hosting company more revenue.
It is a good practice to provide at least several possible courses of action the customer might take. In this case it is pretty much obvious that he cannot just lower the total visits for his website. Thus upgrading to a more powerful solution would be necessary. However, you can still provide at least 2 possible upgrade solutions:
- A server that will be capable of fully satisfying the needs of his website for its current state.
- A more powerful solution that would be preferred in case the trend for the website is to become substantially more popular. Suggest this option if the user expects high increase in website popularity and thus more and more visits.
Disabling/uninstalling/upgrading specific script or component
It is easy to notice in the server logs if the high resource usage is caused by a not well written or compromised scripts. For example if a bulletin board script is flooded with spam posts - the search function in it will cause very long script executions due to the database search query that is executed. If the search form is also not protected and does not require login the whole scenario could cause substantial server load only by spam bots. The bulletin board in question might not be an active one as in most cases active forums have moderators and administrators that take care of them. In such case it is useless to propose service upgrade to the customer. He will not need it and it the worst scenario you might even lose him. What should be proposed is that the script is removed or the spam removed and the script secured.
For the example given above you can propose the customer the following options:
- Completely remove the compromised script files and the database for it if he does not need it.
- If the script is in use:
- Remove the unwanted posts
- Secure the account creation and make posting possible for registered users only
- Secure the Search form - disable it for guest users on the board
If the issue is addressed in such way a customer will be more than happy to cooperate.
As you will be working with your customer towards resolutions that will some times include scripts removal from his account as well as upgrading to a more expensive hosting solution, it is best to avoid any legal issues that might arise.
- The customer should agree that the resource usage for his account will not exceed the limitations that you have decided to set. For example those might include:
- Maximum number of simultaneous script executions
- Maximum number of script executions for a given period
- Maximum CPU usage (in percentage) for a single account at any give time
- Maximum CPU Time usage for an hour and for a calendar day
- Maximum number of email messages sent simultaneously
- Maximum number of email messages for a given period
- The customer should agree to not allow malicious code execution from his account.