Should you turn a Blind Eye to known activity?

The concept of a RAG status when monitoring systems and performance is well understood. As the analyst responsible for the system, you set a threshold against a performance metric; be it CPU utilisation, memory, file system, or anything else that seems particularly interesting or pertinent to you. You then sit back and wait for the end of the day/week/month and watch the RED or AMBER threshold get breached. Excellent stuff! You can get on with more interesting things, while the system monitors itself and tells you whether anything has happened that you should pay attention to.

In large environments, where you may have many hundreds or thousands of devices that need to be monitored, then threshold based reporting is essential. No-one should be expected to continually watch graphs of device metrics on the off-chance that a utilisation gets to a level that is 'interesting'.

The problem comes with systems that are regularly doing something that would breach the thresholds that you've set. The most common instance of this is a backup. when the system backup runs there will be a very high level of CPU, Memory and IO activity.  If you change your system thresholds so that this high level does not cause an alert then you will be setting things too high.  A backup that causes 90% CPU isn't necessarily a problem when it occurs overnight, whereas a looping process that causes 90% CPU during the middle of your working day will be a problem.  Setting the thresholds to ignore a backup will also mean that you miss that looping process.  However setting your thresholds to catch the looping process means that you'll be alerted to your backups every night.

What to do?

Most people know the story of 'The boy that cried wolf'. I'm not sure that a fairy tale about 'The Capacity Manager that cried RAG' would have been quite so popular, but the concept is the same.  If you are continually warning people about things that are not a problem, then they will learn to ignore what you say.  They will then take no notice of you when there really is a problem.

It is not sufficient to just make a personal note to 'ignore device XYZ when it breaches' since you'll still have to check that the breach occured when you expected it to, rather than at any other time.

Maybe we should just not bother to monitor that device?

That is a little harsh!  Just because a device has a regular period of known high activity you are suggesting that we exclude it from all our alerting?  No, we need a more intelligent solution than that.

Exclusions

The solution that we need to employ is that of excluding specific hours of specific days from the alerting process.  If we know that device XYZ is always busy doing its backup between 1am and 3am, then we exclude those hours from the alerting.  This way, the high activity during those hours will not cause a RED or AMBER alert to be raised, but high activity at all other times will.  This has an extra, often overlooked, benefit.

The high activity that we're trying to exclude is commonly referred to as a 'backup', it might equally be any other long running 'batch' type workload.  Other examples are large mailshots, synchronising data between devices, uploading of branch data to Head Office.  The key capacity measurement in these cases is not going to be how utilised the resources are, but how long it takes for the job to complete.  If we return to our example above, we note that the overnight job runs between 1am and 3am.  We've excluded those hours from our alerting, so we don't get prompted about problems on this device due to the overnight work.  If, however, the overnight work starts to take longer and longer to complete (due to an increase in the amount of data that has to be handled), then eventually it will not complete until 3:30 or even 4am.  Since we are not excluding the hours after 3am, the higher utilisation at this time will cause a RAG threshold to be breached, and we will get alerted to that fact.

We have simultaneously managed to prevent false alarms from our overnight work, while ensuring that we definitely get alerted when something happens that is 'out of the ordinary'.

I have used exclusions to great effect at many of my clients.  The list of excluded devices and metrics gets reviewed once a quarter to ensure that it is still appropriate.  A device might be comp;letely excluded for a few months because it is undergoing extreme stress testing; once that testing is completed we want to delete the exlcusion and resume monitoring thresholds normally.  Equally, the overnight batch work of archiving old emails might be re-scheduled; our exclusion wil have to move to match the new timing.

Exclusions should be used in conjunciton with dynamic thresholds.... more about those in a future blog posting.

Posted in Blog Posts, Capacity Management | Tagged , , , , , , , , | Leave a comment

Fairness and Simplicity in the UK Tax System

Last week's Budget did nothing for freelancers.  For many years we have been told by successive chancellors that they want to bring "fairness" into the tax system, but from a freelancer's point of view the changes have been anything but "fair".  The Coalition government trumpeted an introduction of "simplicity", however the results have been yet to materialise and the tax system seems as complicated as ever.

Is it REALLY so difficult?

 

What is Fair?

As a starting point to discussing the issue of "fairness" in the tax system, first one has to decide what is "fair".  According to the dictionary, something is "fair" if it is 'even handed without favouring one party or another'.  In the tax system, this would mean that everyone is treated the same.  However.  I don't believe that is what people REALLY mean when they talk about "fairness".  A truly "fair" tax system would impose a single rate of taxation on all payers and all forms of income regardless of whom (or what) they may be, what the level of that income might be, or where it might originate.  Multi-national companies would be paying the same percentage rate on their profits as an individual.  The same amount of "tax free" income would be available to both the pensioner with their savings account, as it would to the multi-millionaire with their various business interests.

"Fairness" in the tax system is completely dependant upon where you stand.  There will be some people that earn in excess of £60k per year that think it "unfair" that they have to pay 40% PAYE tax on their income.  However there will be others that are earning £20k per year that consider it equally "unfair" that those relatively rich peoeple aren't paying MORE than 40%!!!

A truly "fair" and equitable system would put a single rate on all income, lets say (for arguments sake) that rate was 33%.  A higher-rate tax payer that is currently paying above this rate would consider this a "fair" level, whereas a base-rate tax payer (currently at 20%) would consider this increase to be deeply un"fair".

That's the trouble with "fairness".  There are winners and losers.  Anyone that is a winner will be happy with the fairness, whereas anyone that is a "loser" from the change will not be happy.

Fairness in the tax system would therefore appear to be contrary to the first "maxim" of taxation as specified by Adam Smith in "The Wealth of Nations"  http://en.wikipedia.org/wiki/The_Wealth_of_Nations of proportionality.  Smith believed that taxation should be in proportion to respective abilities to pay, rather than necessarily in proportion to respective benefits accrued.   

 

What is Simplicity?

Well... that "fairness" discussion was not at all straightforward.   It is pretty obvious that the tax system will never be fair.. because the meaning of "fairness" is not universal.  How about "simplicity"?  Surely everyone wants a simple tax system?

Once again, the dictionary should be our starting point.  "Simplicity" means that something has few parts and is not difficult to analyse or understand.  This time, Adam Smith appears to be in agreement.  His second maxim was of "transparency".  A transparent tax system allows each taxpayer to understand what it is they are required to pay (and when, and how).  If the tax system is complex, then it becomes more difficult to understand one's liabilities.  If the tax system could be started from a blank sheet of paper, then achieving simplicity would be.... well... simple!  A single set of rates and thresholds could be set.  These rates and thresholds would apply to all taxpayers, and therefore everyone would know what it is that they had to pay. 

However, we don't have a blank sheet of paper.  We have a complicated set of rules that have evolved over time. 

Just looking at the starting rate for taxes on Income, the amount that one has to pay is influenced by your age (there are different thresholds for those aged under 65, 65-75, over 75), your profession (Capital Gains tax rates are different for "entrepreneurs" compared to everyone else), and even the health of your partner (the Blind Person's tax allowance is transferable to a partner).

This is before all the other allowances, benefits-in-kind, and taxes etc have been considered.

Complicated!

It would be much simpler to have a single starting rate for income taxes.  However, once again, perception will be key to this.  As was seen in last week's budget, trying to align this starting rate means that there will be perceived "winners" and "losers".  The under-65s will see their starting rate increase to match the rates of the over-75s.  While this is happening, the over-75s will be feeling a certain "unfairness" as their benefit under the current system is slowly eroded until it no longer exists.  Which brings us straight back to where we came in!

The Impossible Journey

So we can achieved "Simplicity" and true "Fairness", but the perception will be that it is "Unfair" and explaining the move from any current tax system to a new system will be anything but "Simple".

Posted in Blog Posts, Freelancing | Tagged , , , , , , , , , , | Leave a comment

Queuing Theory with Excel - extra

Only a few days ago, I blithly said

The Excel formula of 1/(Poisson(c ,c\rho ,false)*exp(c\rho )/(1-\rho ) + Poisson(c ,c\rho ,true)*exp(c\rho ) - Poisson(c ,c\rho ,false)*exp(c\rho ))

Is equal to
\biggl [1 + \frac{(c\rho)^c}{c!(1-\rho)} + \sum\limits_{n=1}^{c-1} \frac{(c\rho)^n}{n!} \biggr]^{-1}=\wp_0

Now you have the "tricky" \wp_o value, all other values in the M/M/c queuing theory are MUCH easier to calculate.

It has been suggested to me, that it might help if I made this formula a little easier still.

Never one to shirk a challenge.....

What can we do with this?
Starting with the formula

1/(Poisson(c ,c\rho ,false)*exp(c\rho )/(1-\rho ) + Poisson(c ,c\rho ,true)*exp(c\rho ) - Poisson(c ,c\rho ,false)*exp(c\rho ))

The exp(c\rho ) gets used quite a lot in that formula, so we can take it out and use it only once.
1/(exp(c\rho ) * (Poisson(c ,c\rho ,false)/(1-\rho ) + Poisson(c ,c\rho ,true) - Poisson(c ,c\rho ,false)))

Already, that looks much simpler to the eye.

Or if you want to limit the number of times you have to use the Poisson formula, you could then put those two Poisson(c ,c\rho ,false) statements together and say
1/(exp(c\rho ) * (Poisson(c ,c\rho ,true) + (Poisson(c ,c\rho ,false)*(\frac{\rho}{1-\rho} ) )))

Checking that you've got it right
Of course. It is always useful to check that you have entered formula correctly. For this purpose, I always try a "simple" calculation.

Let's look at an example queue in which there are 5 servers (c ), events occur every 6 minutes and the service time is 20 minutes.
The arrival rate is \lambda = \frac{1}{6}
The service rate is \mu = \frac{1}{20}
So the Utilisation U or \rho is \frac{\lambda}{c\mu} which is \frac{0.167}{5 * 0.05} = 0.6667

Traditional Formula
\biggl [1 + \frac{(c\rho)^c}{c!(1-\rho)} + \sum\limits_{n=1}^{c-1} \frac{(c\rho)^n}{n!} \biggr]^{-1}

\biggl [1 + \frac{(5*0.6667)^5}{5!(1-0.6667)} + \sum\limits_{n=1}^{4} \frac{(5*0.6667)^n}{n!} \biggr]^{-1}

\biggl [1 + 10.28806 + (3.3333 + 5.555555 + 6.17284 + 5.14403) \biggr]^{-1} = 0.03175

Excel Formula
1 / (exp(c\rho ) * (Poisson(c ,c\rho ,true) + (Poisson(c ,c\rho ,false) * (\frac{\rho}{1-\rho} ) )))

1 / (EXP(5*0.6667) * (POISSON(5,5*0.6667,TRUE) + (POISSON(5,5*0.6667,FALSE) * (0.6667/(1-0.6667)))))

1 / (28.0363 * (0.8788 + (0.1223 * (2) ) ) ) = 0.03175

So. I now know that I have got the right Excel formula (with the right brackets in the right place). I can now continue with my queuing model, fully able to calculate queue probability in a single cell, even for very large systems with a large number of servers.

Posted in Blog Posts, Capacity Management | Tagged , , , , , , , , | 2 Comments