The fun part of it all is that it is as true today as it was in 1999 when DNA wrote it. No matter what generation you happen to be, be it Boomers, X, Y, Millennials, the forthcoming post-Millennials, etc–you have the things you’re accustomed to which are your “normal,” and things which you did not experience at the right time of your life. These things are abnormal to you.
I think of myself as a “learning machine.” I’m addicted to learning new things, refreshing knowledge by triggering on something that was mentally filed away, my linguistic hobby, etc. A day in which I didn’t learn something was a day wasted.
Now we have March Madness upon us. I’ve been watching part of the ACC Basketball Championship, I will watch selection Sunday and I’ll be quite distracted during the NCAA Basketball tourney. Even during those times, I fill my time during commercial breaks to soak up new information, hopefully useful information.
What does this tell me about being a CEO, an entrepreneur? I’m not sure. We focus on the message for the business users, who are usually over 35, yet not set in their ways.
What does this make me? I’m not sure about that either. I turn 44 this month and I’ve adapted to everything I’ve come across in life so far. So far I grok things innovations created after 2005. I hope to keep doing that in the future.
M&M Mars company offered a coupon for free chocolate, but didn’t bother to think through the effects of this promotion on their web site. Any marketer should know that you get negative brand awareness if you offer something positive, like free chocolate, and then make the experience painful. This is what Mars company did this morning. The RealChocolate.com site asks you to register to win free chocolate each week for the summer and if you tried before noon or so PDT today, you probably only got frustrated. While this promotion may not have as much immediate draw as Oprah, it certainly garners a lot of attention from the deal sites such as Consumerist and DealNews.
This sort of negative publicity is easily avoidable. Proper testing methodology includes flash crowd testing from the open internet, performing an end-to-end transaction. Their IIS servers can be made to scale, if configured and built out correctly, but it needs to be proven before your customers tell you that it didn’t work.
Just as important is timing. Good testing methodology means good communication between the marketing-communications teams to the web operations teams. Testing like this needs to be done far enough in advance that you have time to fix something or correct an issue before you go live–in other words, testing the day before isn’t good enough and is more or less a waste of time and resources. Start at least a week in advance, find out what happens under potential load scenarios, practice remediation strategies, etc.
Liza Minelli, the famous daughter of the famous Judy Garland, causes more traffic than the Sydney Opera House web site can handle and crashes. The article doesn’t say how much traffic they received, only mentioning that the technicians took hours to get the site operational again. That tells me that the crash wasn’t just because of a high traffic spike by itself, because otherwise the site would have recovered after the traffic left. Moreover, the appear to not have had a monitoring service, so they may not have even known that the site was experiencing problems until customers starting calling to complain.
It is ironic that firms set up websites to handle customer traffic to lower costs and reduce the amount of operator staff to take calls. This crash flooded their call operators and caused negative publicity.
Proper load testing takes time and money. The Return On Investment is usually rather easy to see when you compare it with the damage caused by the web site crashing during an importent event like this. This was probably one of the most popular events to be at the Opera House in a while, and I doubt that the Opera House management performed end-to-end load testing as they should. I see this so often and it doesn’t have to happen.
I frequently see confusion regarding concurrent VU’s versus VU’s per hour, or what should be called sessions per hour or transactions per hour. When modeling web traffic on the open Internet, the rate-based metrics are better suited to finding out what will really happen. If a site slows down, do your users know that *before* they arrive at your site or after they get there. The concurrent user model assumes that a new user doesn’t arrive until the previous user leaves. If the last user in the queue isn’t leaving, that is the users is stuck on the system trying to perform a task, then no new user arrives. This simply doesn’t happen. A user doesn’t know and frankly doesn’t care how many other users are on your site until the user gets to the site and discovers that it is about to die–or that the user would rather die than use this site.
Transaction rate and the number of virtual users concurrently on the system affect the application server differently. Transaction rate primarily takes CPU to process the delivered pages, while the number of concurrent users primarily affects memory. Both are important, but they are independent variables. If the site performs well and the scripts are modelled correctly, then the transaction rate and total number of concurrent users will match your web analytics. If the site degrades, CPU is still maxed out, but memory may not immediately be maxed out. However, as the number of concurrent users increases, memory utilization will also increase, as well as database connections, etc.
This means that applying a rate-based metric to drive the load combined with the right scripts and use cases will drive the best load to allow you to see the behavior of your application under high loads.
The question of geo-distributed testing is really 3 questions. The first question is where you should be generating the load, second how many locations are required to generate load and third, can I use sample agents in some locations instead of having load generators everywhere.
The reasons for geographic distribution of load is both simple and complex. On one hand, you are testing outside of the firewall for 2 primary reasons: 1) that is where your users are located and 2) to do end-to-end testing.
If you are only testing externally to have an end-to-end test, then you could just as easily do the load test in a loop-back scenario, i.e. generate the load on a circuit that sends the traffic out on one interface/circuit and it comes back in through the primary ingress point(s). If you have enough bandwidth and load generation, this is pretty simple and you can even use NISTnet to try to emulate latency. However, it is really only half of the reason for performing external tests. A loop-back test doesn’t really tell you about latency, even if you try to emulate it. Moreover, you assumed that your users were sitting in your data center or lab, which is pretty unlikely.
If you wish to discover the customer’s experience of the site under load, you need to have real geo-distribution. For SUTs where there is only 1 bandwidth provider and the volume of the test is relatively small, 2 locations will probably suffice. This is especially true for situations where the customer base is centrally located in a small number of locations, for example a local retail chain that is only present in a few states. If you have customers coming to you nationally or internationally, then you need more. Given the demographics of North America, I recommend either of the following options: 2-3 load generation sites distributed across the time zones and on different ISPs with 5-6 sample locations spread out among the rest of the high-traffic areas, or my standard practice of 9 load generation locations domestically—3 east, 3 central and 3 west. If you are international, then you’d need to think about whether your traffic is European or APAC. This also lets me avoid crashing individual Content Distribution Network POPs, although it still happens. For some reason, they get annoyed at me for this.
So think about the reasons you’re even testing outside of the firewall. If it is only to do a simple end-to-end test, then don’t bother paying a provider or anyone else and just loop the traffic. If you want a good representation of your traffic, plan properly and distribute the load as well as you can. Professional load test service providers do more than just deliver some hits.
This one will be short, because there simply isn’t that much to say. Your home page is one of the most important pages on your site in terms of the visitor’s experience. If your site requires registration, authentication or identification, nearly all users must go through this page. It is the proverbial front door to your site and application.
On a recent load test, the test had to be aborted after 9 minutes, while they were only at 25% of the planned total load level of 385,000 sessions per hour. They were using a LAMJ architecture, and each home page hit generated a long running SQL query. Even very patient users, who are tolerant to slowdowns and errors, will not stick around if the home page takes several minutes. However, this site didn’t even do that! After 2 minutes into the test, the pages simply said
” Whoops! The social network is currently down for maintenance. Please be patient, we’re working on it! “
As you may imagine, their home page is now very fast–0.07 seconds in fact. That is a very fast error message that every user is seeing on the home page, and it would deliver the same for every other page too if the user actually made it that far. I don’t think I need to mention the usefulness of all users seeing that error message.
What caused this slowdown and crash you may ask? I’m glad you did. The long running queries exhausted the JDBC connection pool and maxed out the available number of connections, which is what caused the immediate error page.
The only good thing I can say is at least they didn’t just print stack traces with DSN information contained in them. I’ve been shocked at the content of some of the stack traces I’ve seen on production sites when they encounter an error, but that’s another post.
Alright, now that you’re back, think about what I want to see.
Don’t tell me you’re a good communicator, show me your’re a good communicator. Spelling and grammar errors turn me off faster than Douglas Adams’ fetid dingo kidneys. In today’s electronic medium, communication is more often via bits than spoken word. A poorly written resume is not worth the time you took to send it.
Don’t tell me you’re experienced. Especially don’t tell me you have “extensive experience” with something. That could mean that your 2 weeks of letting the guy in the cube next to you code in Mono makes you have the equivalent of 5-8 years of someone else. I don’t buy it! I want lists of your experience, when you gained the experience and what projects you used it on.
The Squawkfox article mentions “Team Player”, but I’m interested in prospects who can work independently. I like that you can work in a team, and I want you to think about the big picture, but I don’t have time to baby sit and I cannot stand micro-managers. If I have to micromanage someone, it will only be for a very short time while I give them careful directions to HR.
Be ready to explain gaps in your work history, or even worse, why you worked on so many projects. If you worked with several customers while at your consulting job, list them under the same job heading. I want someone who will stick with my project and complete it, not use me to help you find another job or even worse, work on your own company on the side.
Having a good resume is not hard, and I actually read most of the ones I get. Be concise, be confident and remember that the true purpose of a resume isn’t to get you a job but to get you an interview.