Long Dark Night of the Servers
Greetings Folks --
As I'm sure a lot of you have seen we've had some problems this week with reader blogging and comments, principally delays but also server errors and the double posting that can result from the combination of the two.
So I wanted to take a moment to explain what's going on and what we're doing about it.
The main and immediate cause of the problems is the massive upsurge of traffic during the conventions. To give a sense of perspective, on weekdays the TPM network of sites usually gets between 400k and 500k page views a day. This week we've had three straight days over 1 million page views. Equally important your rate of commenting has grown substantially, exceeding the rate of growth in page views. Both tax the servers capacities and that's basically responsible for all the problems we've been having.
We've been adding server capacity all through this cycle to keep up with the growth of the network. And already this summer we could see that we needed to take a substantial step up in our server capacity. And we've begun building an entirely new set up at a new server hosting company, which should dramatically and consistently improve the quality and speed of our sites. That's slated to happen the middle of this month. So we're almost there. But the current tsunami of readership made this last week a rough ride.
To explain one change, many of you have noticed that the avatars have disappeared from the comments threads. That's temporary. For reasons that are too difficult to explain here, the avatars were accounting for a very substantial percentage of the load on the servers. So we've taken them down temporarily to get us through the next ten days. We're going to try to bring them back early next week. But at worst they'll be back when we make the switchover around the 15th.
Finally, recently we showed you the first mock-ups of the community and discussion tracking system we're soon going to be rolling out here at TPM. We're also about to roll out a new blogging interface which will include much more robust editing features and the ability to preview posts, which I know many of you have patiently (or in some cases not so patiently but still understandably) been waiting for for some time. We're close. But we've needed to wait to get on to the new server set up to start rolling those out.
So, that's the gist of what's going. I can't promise you that reading blogging and commenting will be perfect for the next ten days or so, though I suspect they'll be considerably better than they have been over the last several days. But around the 15th we'll start moving over to our new server set up. That should dramatically improve site performance and put us in a position -- and this is critical -- where we can quickly and smoothly -- increase capacity as need demands. Then by late September we should be able to begin rolling out the community, discussion and blogging interface changes I mentioned above.
If you have any questions or comments, please comment in this thread.





In a fashion typical of what is going on around here, your whole post didn't come through.
Is there a fix at hand, or are we just waiting this out until the traffic dies down? We are losing good people on a daily basis...
And I HATE having avatars going away!
September 5, 2008 1:59 AM | Reply | Permalink
For once, the incomplete post wasn't the site's fault but mine. The back-end interface works a little differently at TPMCafe than at TPM, where I do most of my posting. And I clicked save before I was supposed to. The full post should be up now.
September 5, 2008 2:15 AM | Reply | Permalink
here's the link...
http://walterreed.tech.googlepages.com/home
these guys are a joke...
September 5, 2008 11:19 AM | Reply | Permalink
I'm not complaining, but because the multiple reposts quickly moving stuff offscreen have made me question whether some things might be worth the effort, I'd like to suggest that in the interim, you change the text on the "500" page to caution against resubmitting. I haven't done any research into what software package you're using, but the error pages are generally customizable and because yours tells us to write to some dude in Australia, I'm figuring that it's not canned and could be easily customizable to say "don't resubmit".
Otherwise, hey, good luck. It's election season and an uncommonly popular subject, so one can easily understand why you've been caught offguard by all the traffic.
September 5, 2008 2:36 AM | Reply | Permalink
A big part of the problem is that, in fact, they are not trapping those 500 server errors and redirecting to a custom error page. That page is the default page that is distributed with the web server.
IMO, there is no excuse for the site management to have not deployed custom error pages -- this is about Web 102 for a site this size. That is not Marshall's fault, he has to depend on the good judgement of the technicians, who were AWOL in the judgement department.
If custom pages had been deployed, this article could have been deployed on the error page, saving everybody a lot of grief.
It's especially ridiculous because no additional code is required to deploy them in Apache. It can all be done by configuration files. So, we're not talking about "oh, we would have to hire a developer" or some nonsense like that.
Thanks.
mp
September 5, 2008 1:54 PM | Reply | Permalink
I didn't realize the Australian dude was the default for this package, but yeah, it's not difficult to toss-up a custom error page and it'll only take a couple of minutes.
Right now at this very minute, the Top 20 reader posts are actually only 7 individual entries with all but two repeated over and over.
I commented to what I thought was the last try to one of these posts, but now I see that the one I chose is now in the middle and I could easily put up a post of my own, but since it'll probably only be visible for twenty minutes (at most), I'm of the mind to not bother.
Possibly unlike many others, I use the Cafe page as the front door to this site because I like hearing what everyone has to say and the interaction that comments allow. I do oiccasionally visit the other pages, but since the main doesn't let me speak, I generally only visit it when I've run out of reader posts or when someone on the outside points toward it.
September 5, 2008 2:56 PM | Reply | Permalink
Thanks for the update...I'm relatively new to the site, but it has become one of my favorite parts of my day. I'll wait patiently for things to improve!
September 5, 2008 2:40 AM | Reply | Permalink
Thanks for the update, Josh. Half the frustration is not knowing what's going on.
September 5, 2008 2:45 AM | Reply | Permalink
I don't envy you or your IT staff. Changing over in the middle of this is not a minor undertaking.
The growth in traffic indicates that perhaps the country is becoming more involved in the political process and is actively seeking information to guide their choices in November. That has to be a good thing overall and I think favors democrats. And don't worry about the small inconveniences with the site. Relative to what is at stake they're minor and we can certainly get by without avatars.
September 5, 2008 2:57 AM | Reply | Permalink
I'm not sure this will work for everyone, but until the problems are fixed ... :-)
When it looks like the comment is heading for '500' limbo, I click the 'STOP' icon on Firefox's toolbar and then click on the 'TPMCafe' link at the top of the page to reload all. So far the comment has been posted right away every time.
September 5, 2008 3:11 AM | Reply | Permalink
That's not all the tech problems you're having. I let you know several months ago that every post I ever wrote for TPM Cafe disappeared from the site. I even provided a URL for one of the axed posts. I never even got an apology let alone explanation.
Is this any way to run an airline (to quote an old ad tagline)? You bet it isn't.
September 5, 2008 5:22 AM | Reply | Permalink
Thanks Josh!
At least now we know what is causing our hair-pulling and blood pressure fluctuations.
Looking forward to the new interface!
September 5, 2008 7:24 AM | Reply | Permalink
Josh, Thanks. TPM has come a long way from that single, narrow column of text down the middle of my browser window that it was in 2000.
The participants on this site should have been able to figure out that it was convention-related traffic that caused the delays, loss of avatars, etc. I figured this out but even I fell victim to the embarrassing double-post effect once.
A new interface? Now THAT'S change we can believe in! I hope transition goes well. Good luck and best wishes for your continued success.
September 5, 2008 8:56 AM | Reply | Permalink
Thanks for the update, Josh. Keep doing what you're doing. This set of sites is my primary source for news now. Keep your standards high, and focus mainly on the content. You have a great group of reporters.
Readers and posters need to be more patient. When one of my comments takes forever and goes to 500-land, I know the post is really there. I just re-load the page (rather than re-posting). And there it is. Takes a while, but let's face it folks: if you're reading and posting comments, you have some free time on your hands. (Some of you have a LOT of free time, apparently.)
-- ARG
September 5, 2008 10:54 AM | Reply | Permalink
http://walterreed.tech.googlepages.com/home
September 5, 2008 11:20 AM | Reply | Permalink
Like many users, this past week has been one that has tried my patience. I am hopeful your changes will take effect quickly and work as promised.
One of the things I would suggest, to keep posts available to readers longer is a "jump" ("read more here"... or "continue reading here") that would shorten long posts while still giving readers a taste of the content of the post.
The multiple post problem has demonstrated how quickly -- even without the multiples -- a large volume of posts would be moved out of sight of most TPM readers -- who come for the front page but stay for the reader blogs.
To help fix that problem, why not create a list of blogs -- still on the front page of the Reader Blogs section -- of that day's blogs with just the title and author. If you've started commenting on one thread it is frustrating to try to find it after it is out of the recent list.
Also instead of archiving multiple days of reader blogs, please do them by single day. Again, easier to track and find items you're interested in.
I'd also like to see Reader Posts separated from TPMCafe. Give us a nav button near TPMtv all our own.
Finally, whichever hosting company you've chosen, please make certain they can provide all of your requirements on-the fly. Capacity, bandwidth, backup/restore should all be manageable quickly and transparently to us users. As traffic increases, so should your resources. But enough of my lectures. Bring on the new stuff. Bring it here! Bring it now!!!
September 5, 2008 11:23 AM | Reply | Permalink
♪ Fingers crossed and prayers ascending! ♪
September 5, 2008 11:52 AM | Reply | Permalink
♪ Fingers crossed and prayers ascending! ♪
September 5, 2008 11:55 AM | Reply | Permalink
Josh,
regular updates are valuable. (but don't overload the server)
September 5, 2008 12:58 PM | Reply | Permalink
Josh,
regular updates are valuable. (but don't overload the server with them)
September 5, 2008 12:59 PM | Reply | Permalink
Josh,
is there anything we can do, to help out.
Financial assistance?
I want to see TPM become Great, I'd like to see Candidates refer to our site as a source for ideas.
Assisting in rapid response, giving another source for material to rebut the opposition.
I don't thnk it's vain on my part, to believe that as little skills as we have in being heard over the clamor of professional counselors to the candidates, that WE don't have a perspective that might be useful, if only we could get real time exposure.
So please hurry up on the needed repairs, so that TPM can be looked at, as a reliable Real Time Aid. A TOOL in an era of the necessity for rapid response, other wise were just old news.
Which sites do the candidates rely on for input? Thats the site I'd like to be heard from.
September 5, 2008 1:15 PM | Reply | Permalink
Simple fix for the future:
Stop using an entire week for your archiving.
Keep the archives in the same size packets as the first page, in reverse chronological order, like most websites do. I.E.., twenty posts per page or whatever, with links to "next page" and "previous page" at the bottom.
I never understood why you like having the archives in big weekly glumps. You've always done it on the main TPM page. It's always been a pain and uses a lot of power to load.
Not everyone is going to visit every day, if they want to read on to what you posted before that which fits on one page, they have to load an entire week and scroll down.
It's sort of dismissive of your own content, as if what you wrote beyond what fits on one page is no longer of interest except to researchers. It shows a prejudice to "breaking" and that posts a few hours old are no longer of interest.
The more you grow the worse it will get if you keep archives in big weekly glumps, and people can't flip page to page.
September 5, 2008 1:38 PM | Reply | Permalink
And another suggestion: hire IT people who pay attention to what's going on on your site and are willing to alter instructions pages while problems like this are going on.
As another commenter has suggested, if someone had simply put a temporary "don't resubmit when you get an error message" instruction on the "blog now" page when the traffic grew, you wouldn't be having half the problems you're currently having. The problem is being exponentially enlarged by users not being informed with proper instructions.
In the three years I've been a regular user, the site never seemed to have had IT people who offer actual service. Rather, it has always seemed as if your editorial staff has to watch for tech and software problems and address them, and they don't have time to do that. That ends up being very expensive in many ways.
September 5, 2008 1:59 PM | Reply | Permalink
And another suggestion: hire IT people who pay attention to what's going on on your site and are willing to alter instructions pages while problems like this are going on.
As another commenter has suggested, if someone had simply put a temporary "don't resubmit when you get an error message" instruction on the "blog now" page when the traffic grew, you wouldn't be having half the problems you're currently having. The problem is being exponentially enlarged by users not being informed with proper instructions.
In the three years I've been a regular user, the site never seemed to have had IT people who offer actual service. Rather, it has always seemed as if your editorial staff has to watch for tech and software problems and address them, and act as liasons with whoever was doing the IT, and they don't have time to do that. That ends up being very expensive in many ways.
September 5, 2008 2:02 PM | Reply | Permalink
I don't want to seem harsh, but here you are with a double post about how TPM could remedy the problem if only they had IT guys who could tell people not to double post.
Kind-of undercuts your suggestion.
Just sayin'...
-- ARG
September 6, 2008 11:31 AM | Reply | Permalink
A big part of the problem is that, in fact, they are not trapping those 500 server errors and redirecting to a custom error page. That page is the default page that is distributed with the web server.
IMO, there is no excuse for the site management to have not deployed custom error pages -- this is about Web 102 for a site this size. That is not Marshall's fault, he has to depend on the good judgement of the technicians, who were AWOL in the judgement department.
If custom pages had been deployed, this article could have been deployed on the error page, saving everybody a lot of grief.
It's especially ridiculous because no additional code is required to deploy them in Apache. It can all be done by configuration files. So, we're not talking about "oh, we would have to hire a developer" or some nonsense like that.
Thanks.
mp
September 5, 2008 1:59 PM | Reply | Permalink
Someone keeps posting letters from apparent friends of Sarah Palin and this is the personal stuff that has to stop. I have no doubt there are those taking advantage of the multiple postings even though they know the error pages mean the post went through.
Drive-by bloggers dropping false rumors into our laps to see if we take the bait.
Josh, do you have editorial control over this site?
September 5, 2008 3:45 PM | Reply | Permalink
While it's offtopic for this thread: The letter is real and was originally circulated on Monday. Since that time, both the New York Times and the Anchorage Daily News have interviewed the lady and have allowed her to expand upon some points, while I've seen at least one person familiar with her comment that she's really one of those gadflies who attend every city council meeting.
IOW: The woman's comments are obviously based in fact because at least two reputable outlets have spoken to her and the most anyone else can say is that she characterizes some elements through her own prism. Nonetheless, it really is an old meme and the fact that it's reposted a couple of times a day, by people who apparently aren't hearing about the "500" problem and who may not have seen the previous fifty or so posting has been clogging-up the stream, but they're not the only ones by any means.
September 5, 2008 3:59 PM | Reply | Permalink
You know, looking at Josh peering over his glasses at us, it just struck me how darnright uppity he looks...
;o)
(thanks for the update on the updates -- the changes sound great!)
September 5, 2008 8:24 PM | Reply | Permalink
My biggest complaint so far is that my posts don't get higher rated more. I can't figure out how that's plausibly TPM's fault though.
September 6, 2008 5:18 AM | Reply | Permalink
Josh,
Thanks for the update, and hang in there.
I'm a long-time reader but new member, and one of the features I currently like is the lack of avatars and unnecessary clutter. I vastly prefer the simplicity and faster loads, with less stress on the network (speaking as a systems software engineer).
You should expect to continue growing, and expect huge spikes around the debate dates, and of course, a deluge of Katrina-like proportions on election day itself. Prepare well! :)
September 6, 2008 2:55 PM | Reply | Permalink
Well Done, Top Drawer, and all that Rot!
I've been a web developer and host since 1996, and I'm in the Radia Perlman camp when she admonishes us to "instead of worrying which pathway the packets take through the network, we should instead rejoice that they make it there at all."
The problem here seems to be "that last 10%" of the work, what separates the pros from the amateurs. "New setup with new host" is just what the doctor ordered.
All the very best, and thanks for arming us with what I consider to be the real Power of the People!
September 9, 2008 3:18 PM | Reply | Permalink
I don't know if anyone will see this comment, as it's attached to a thread from over a week ago, but I'm having serious problems with the site now. On any new thread, the "profile bar" no longer shows the friendly "Hi Wordie" and it appears that I'm not logged in, but there is no place on the page to log in, and no way to do so on any other page either. I also cannot comment at all. The bottom of each post says, "Post a comment" but there's no box in which to type. This has been going on for a few days. I've mentioned it to Al Shaw, but so far no solution appears to be on the horizon.
I thought the problem might be on my end, but the fact that I'm able to access this page and also still able to recommend posts suggests to me it's on yours. I wonder if this affects other TPM users, and it strikes me that it could conceivable lower TPM's click count and therefore it's revenues.
I'm sure hoping that the new stuff that's supposed to be coming soon (I could read Josh's more recent post, but not post this there) will fix this. I've been signed up here for a couple of years now, and it feels horrible to be shut out. And especially so right now, when things are so intense in the election. I hate to seem impatient, but hurry up guys!
September 14, 2008 6:00 PM | Reply | Permalink