Best Practices

From Wiki

Jump to: navigation, search


In this article we will summarize the best practices when dealing with service downtime as well as excessive server resource usage caused by customers on a shared hosting server.

We will address several points such as the goals you should set before you, how to identify problematic servers and user accounts and possible additions you might want to add to your shared hosting Terms of Use.

Contents

Goals

The following are the main Goals you should set before you when handling the data provided by the Downtime Statistics and CPU Statistics for your servers users.

Better Shared Hosting

The most important goal is to provide a more stable hosting solution for the majority of your clients. If you are keeping the server load and services downtime to a minimum thus optimizing website performance for all users on the server - you are already providing a much better hosting solution for them.

Most of the web hosting users expect 2 things and those are of great importance for them:

If you can provide those 2 you already have a happy paying customer who will most probably recommend your services and bring you new clients.

The CPU Stats and Downtime Statistics for your servers will greatly help you in achieving this.

Determine Problems and Provide Appropriate Solutions

You should always make sure you understand what is the problem with the corresponding server and the users that are causing the issues. Providing an appropriate solution is also one of the main goals you should set before you. We will give a few examples here:

Upsell dedicated solutions and gain revenue

Upgrade users with very popular websites and/or websites that are causing high resource consumption to a dedicated or semi-dedicated solution. This is one of the best ways to gain revenue and customer respect. Instead of trying to force a customer away from your services you should always seek a way to provide him with an appropriate solution.

If a paying customer is causing high server load and thus endangering the performance of other websites on the same server is moved to a more powerful solution - this is a win-win situation:


Identifying Problematic User Accounts

Identifying Problematic Servers

First of all it is always important to prioritize the issues. You would like to always review servers with more service downtime and higher CPU Statistics first. This is where the corresponding tools of the 1H Admin Portal will help.

Prioritize Issue Handling

When trying to determine issues with hosting servers make sure you prioritize the case importance. Looking at the Downtime Statistics for your servers you should take those on top (with greater number of recorded services downtime and longer total downtime). Observe those servers and the corresponding logs. Under the Downtime Stats you will be provided with more information which services were down. This will greatly help determine the cause of the issue.

Check the Servers with highest number of executions and CPU load under CPU Stats for the Admin Portal interface.

How to Determine if a Server is Problematic

If there is a significant downtime for a single server - this is certainly a problem. You should be aiming at 99.5% web uptime at least. In the process of resolving issues with problematic users on the server this percentage should jump to 99.9% web uptime for your servers. In case of service outages you should review the server and the users on it and take the necessary actions to minimize the downtime.

The CPU Stats for the server are also extremely important. Most commonly you would like to check for servers with higher than usual records for scripts execution time. It is advisable to thoroughly review the Hive - CPU Statistics section for more information regarding this interface and how to check for the needed information.

Identifying Specific Accounts Causing Resource Over-usage

Once you have determined the server with an issue it is time to go to the next step and check the specific users that are causing problems on the server.

It is always important to determine the problematic accounts for the server and provide an appropriate solution for the clients. You will most commonly see the accounts causing issues in the top users in the Local CPU Stats for the server and thus you will be able to quickly identify them.

Here are several numbers you should be looking for when trying to determine whether a single user account is causing excessive load for the server. Those are average values and have proven to work fine for hosting companies:

CPU Time Usage

Script Executions

Those are example values and you should not be concerned at all about accounts that are within those limits. Note that the CPU Time Usage refers to the Real Time graph for the account and the Script Executions refers to the Executions graph.

Accounts that are keeping server resource usage around the provided values should be monitored in the future in order to make sure they will not become problematic for the server. Normally you would not have more than several users on a single server that go close to those values.

Accounts that are high above the mentioned limitations will be problematic for a shared hosting server and a resolution should be sought in this case. Such accounts will be either abusive or running very popular website. There are several common cases that would result in accounts causing excessive resource usage or even services downtime.


How to check for problematic accounts?

As explained above - all accounts with high server resource usage can be checked in the Local CPU Stats interface for the server. Detailed statistics about the account resource usage can be found there.

At that point you will see the problematic account on the server. It is now important to determine what is causing the high load for this specific account. For this purpose the local logs for the server where the account is located need to be checked.

First it is advisable to check the SuExec log in order to determine what are the most commonly executed scripts for the user in question. The main SuExec log is:

/usr/local/apache/logs/suexec_log

Note that you will already have this information from the Local CPU Statistics interface for the server. Script executions, total CPU Time and average CPU Time per script execution are displayed there. It is good to check the SuExec log in order to determine which is the exact script causing the high resource usage.

Important: In case the 1H Guardian software is running for your servers it is good to check the Guardian kill log for terminated processes for this user. The Guardian kill log is:

/usr/local/1h/var/log/guardian-kills.log - All processes killed by the Guardian are recorded here 

If a process was killed by the Guardian it was either due to Critical Load on the server at this point or due to the fact that the process was running for too long and thus exceeded the limitations for process execution set in Guardian. Those are described in more details under the Guardian Configuration section.

If user processes are constantly being killed for too long execution time - this is a problem with the script that is running. In the Guardian Kill log you will also see the exact command that the user ran so you can determine what is the script causing the high CPU Time usage.


At this point it is advisable to check the domlogs for the domain name that is used for the specific account's website. The domlog for a specific domain name is:

/usr/local/apache/domlogs/$domain

Where $domain is the actual domain name for the website.

An alternative is to check the Web Statistics for the website. Those can be reviewed in details via cPanel and we recommend Awstats as it provides the most detailed information.


/var/log/exim_mainlog

Important: Why should you go through all these steps?

As we will explain below it is of utmost importance to provide relevant information to the customer. When you are aware of the exact issue and what is causing troubles on the server you will be able to propose a suitable solution for your client.


How to prioritize handling problematic accounts?

Note that it is not a good practice to take one server and try to lower the resource usage for all accounts that you see over the certain limits at once. Most commonly there will be single or just a few users that far above everyone else on the sever. You should make sure issues are resolve with those first. There are two main reasons for this:

If their website was compromised immediate actions should be taken in order to resolve the case. This way you would certainly gain a loyal customer.

If the website is extremely popular and especially in the cases when its popularity is growing, it is of great importance for the customer to consider upgrading to a more powerful solution as soon as possible. This way he will be safe from a possible service outage due to the fact that he is not the only user that is utilizing the server resources. Also his popular website will not throttle the functionality of other websites on the same server.

Which leads us to reason number 2:


Top Users

Another useful functionality of the 1H Admin Portal is the Top Users Statistics available under the Hive - CPU Stats section of the Portal. There you can check the details about the Top Users from all servers added to existing Server Groups in the Admin Portal.

As this tool gives you information about users on all servers in a descending order it can be easily used to determine all users with high resource consumption on all servers.

Communication with the Customer

Here we will provide several guidelines you should use when communicating with a customer that is running an account causing significant resource over-usage.

Provide relevant information and statistics

When addressing a customer regarding an issue with his account make sure he is treated respectfully. Every user would want to know what is the problem with his account and he will request as much information as possible. When contacting a customer try to follow those guide lines:

Explain where the customer could verify the provided data

Each user can be provided with the full report generated for his account in Hive. Do not hesitate to give all information to your customer and specify how he is exceeding the server resource usage for a standard hosting account. In most cases relevant information can also be easily withdrawn from Awstats available in cPanel. The user can check the statistics generated by awstats. He can check total visits, page views and hits for his account as well as get detailed information about the actual URLs that was accessed the most hence what caused most script executions.

Important: Google analytics is another nice way of measuring website popularity. However, google analytics code might not be included on all pages of the website hosted under the specific user account and thus not generating full statistics. A customer might argue that according to google his website is by far not that popular - in such cases you should explain that those statistics are generated on server side and thus more reliable as no data is missed.

Explain how the website popularity/resource usage affects the server

It is important that the users understand from the initial notification about the issue that the fact that his website is using a large amount of the server resources might affect other websites on the same shared hosting server. In the case that the website is very popular and getting a lot of visits it is understandable that he will need a more powerful dedicated solution for the needs of his website. This way the user himself will not be dependent on other users on the same server and thus not risking any downtime due to outside factors. If the resource usage is caused by malicious or not well written script - the account might endanger the functionality of other websites on the same server and thus the issue should be addressed by either removing the script or resolving the problem with it.

Provide appropriate resolution

Upgrade to a more powerful solution

In case the customer website is growing in popularity - the provided solution should be upgrading to a more powerful dedicated or semi-dedicated solution. This way he will receive a more powerful platform for the needs of this website in the same time not endanger the website performance for other users on the shared hosting server and of course bring the hosting company more revenue.

It is a good practice to provide at least several possible courses of action the customer might take. In this case it is pretty much obvious that he cannot just lower the total visits for his website. Thus upgrading to a more powerful solution would be necessary. However, you can still provide at least 2 possible upgrade solutions:

Disabling/uninstalling/upgrading specific script or component

It is easy to notice in the server logs if the high resource usage is caused by a not well written or compromised scripts. For example if a bulletin board script is flooded with spam posts - the search function in it will cause very long script executions due to the database search query that is executed. If the search form is also not protected and does not require login the whole scenario could cause substantial server load only by spam bots. The bulletin board in question might not be an active one as in most cases active forums have moderators and administrators that take care of them. In such case it is useless to propose service upgrade to the customer. He will not need it and it the worst scenario you might even lose him. What should be proposed is that the script is removed or the spam removed and the script secured.

For the example given above you can propose the customer the following options:

If the issue is addressed in such way a customer will be more than happy to cooperate.

Specify that the customer is in Terms of Use violation

You should always explain to the customer that his account is in violation of your Terms of Use. There should be no need to explain in details the case as he should be aware of the exact terms, but it is good to always provide a link so they can be easily checked.

Terms of Use Additions

As you will be working with your customer towards resolutions that will some times include scripts removal from his account as well as upgrading to a more expensive hosting solution, it is best to avoid any legal issues that might arise.

Here are a few Terms of Use Additions you might want to include if you have not done so:

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox