Like Ra's Naughty Forum - Site performance, Server errors, outages and tunings

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146

The admins decided to move the VPS to another host.

And no response since a week ago. If no updates, will escalate again tomorrow.

Escalated.

Timeline:

01 September 2011 - logged with a reference to an earlier ticket
02 September 2011 - Arvixe: "other VPS'es do not affect yours"
02 - 29 September - Me: "the disks are overloaded, VPS is mostly down", Arvixe: "there's no problem"
29 September - VPS crashed
29 September - Arvixe discovered a dead disk in the RAID array
12 October - the VPS was inadvertently moved to another host. Apparently to mitigate the disk overload on Sep 29, but the problem was solved by replacing a disk
12 October - both old and new VPS'es were up simultaneously, ~12 hours downtime
14 October - discovered a new down-time pattern. Disks are overloaded from 12:00 to 13:00 GMT everyday, the VPS is inaccessible during this time.
14 October - Arvixe: nobody else complains, most likely it's your VPS problem
06 November - implemented caching and oter performance tweaks to reduce the disk load. VPS is still down from 12 to 14GMT
10 Novemver - Arvixe: shutdown your cron for 24 hours
12 November - node maintenance to address system performance (no details)
14 November - shutting cron down did not help. Made a suggestion that all crons and log processing run simultaneously on all VPS'es thus creating a cumulative overload.
16 November - Arvixe confirmed, that on this node no cron randomization was performed.
16 November - Me: suggested looking into the cpanel log processing time (by default all logs are rotated at 12:00GMT) and reduce the maximum log size from 300MB to 100MB to shortten the processing time
17 November - Arvixe is going to implement cron randomization on the node
19 November - running backups (standard Arvixe backup) overloaded disks for 2 hours, corrupted memory, what led to 6h downtime
21 November - no changes, VPS is always down from 12 to 13 GMT. Arvixe suggested to move to another node. I agreed.
27 November - the situation gets worse with every week (see the attached graphs). The VPS is still not moved.
29 November - Arvixe is going to "migrate the VPS shortly"
01 December - No changes, the VPS is still not migrated.

The monthly contract with Arvixe expire in 16 days. If the issue is not solved in 3 days, I will start migrating to another provider.

Just a heads up..

Today, I hit the forum, said there were 21 users in the last 5 minutes (not what I would call a very heavy load), and the forums were continually saying server at max load, try back later.

Not sure if that's useful information or not, just thought I'd pass it along..

Good luck!

(01 Dec 2011, 14:55 )Like Ra Wrote: [ -> ]Escalated.

Timeline:

01 September 2011 - logged with a reference to an earlier ticket
02 September 2011 - Arvixe: "other VPS'es do not affect yours"
02 - 29 September - Me: "the disks are overloaded, VPS is mostly down", Arvixe: "there's no problem"
29 September - VPS crashed
29 September - Arvixe discovered a dead disk in the RAID array
12 October - the VPS was inadvertently moved to another host. Apparently to mitigate the disk overload on Sep 29, but the problem was solved by replacing a disk
12 October - both old and new VPS'es were up simultaneously, ~12 hours downtime
14 October - discovered a new down-time pattern. Disks are overloaded from 12:00 to 13:00 GMT everyday, the VPS is inaccessible during this time.
14 October - Arvixe: nobody else complains, most likely it's your VPS problem
06 November - implemented caching and oter performance tweaks to reduce the disk load. VPS is still down from 12 to 14GMT
10 Novemver - Arvixe: shutdown your cron for 24 hours
12 November - node maintenance to address system performance (no details)
14 November - shutting cron down did not help. Made a suggestion that all crons and log processing run simultaneously on all VPS'es thus creating a cumulative overload.
16 November - Arvixe confirmed, that on this node no cron randomization was performed.
16 November - Me: suggested looking into the cpanel log processing time (by default all logs are rotated at 12:00GMT) and reduce the maximum log size from 300MB to 100MB to shortten the processing time
17 November - Arvixe is going to implement cron randomization on the node
19 November - running backups (standard Arvixe backup) overloaded disks for 2 hours, corrupted memory, what led to 6h downtime
21 November - no changes, VPS is always down from 12 to 13 GMT. Arvixe suggested to move to another node. I agreed.
27 November - the situation gets worse with every week (see the attached graphs). The VPS is still not moved.
29 November - Arvixe is going to "migrate the VPS shortly"
01 December - No changes, the VPS is still not migrated.

The monthly contract with Arvixe expire in 16 days. If the issue is not solved in 3 days, I will start migrating to another provider.

Wow.. I just read through your whole report there, you are definitely getting jerked around. Your graphs can't be accurate, how can you be serving up 40,000 pages views an hour, with your MAX number of users on-line at any one time being 97? something is wrong.

Either the report is including other sites that are being hosted or some bots are just constantly scraping your site, or some of the images on your site are being linked to and posted elsewhere where there more traffic.

If the average user is like me, and pulls up a page, reads through it, and moves on to the next page, I might get like 10-15 pages in a 30 minute period. Call it 20 pages, or 40 per hour. With 100 people all on-line for that whole hour, you'd still only be at 4000 pages.

Are those graphs from the server or from your forum/blog software? Have you looked at the forums/blog software logs/reports to see if any of the data matches what the server's report is telling you?

I'd start the move.. Even if your current provider gets their act together and suddenly fixes everything, their apparent lack of knowledge and over abundance of cluelessness only means it will happen again.

Again, good luck!

(02 Dec 2011, 20:03 )catransvestic Wrote: [ -> ]the forums were continually saying server at max load

The disks are overloaded. No requests can be served, http/php processes accumulate and take all memory, swapping puts additional loads on the disks.

21 simultaneous users is indeed nothing.

(02 Dec 2011, 20:03 )catransvestic Wrote: [ -> ]how can you be serving up 40,000 pages views an hour

It's s monthly summary graph, not average. So you have to divide 40000 by 30.

(02 Dec 2011, 20:03 )catransvestic Wrote: [ -> ]I'd start the move

Yeah, I think this is the only solution. Gonna try over the weekend.

(02 Dec 2011, 22:40 )Like Ra Wrote: [ -> ]
(02 Dec 2011, 20:03 )catransvestic Wrote: [ -> ]the forums were continually saying server at max load
The disks are overloaded. No requests can be served, http/php processes accumulate and take all memory, swapping puts additional loads on the disks.

21 simultaneous users is indeed nothing.

(02 Dec 2011, 20:03 )catransvestic Wrote: [ -> ]how can you be serving up 40,000 pages views an hour

It's s monthly summary graph, not average. So you have to divide 40000 by 30.

(02 Dec 2011, 20:03 )catransvestic Wrote: [ -> ]I'd start the move

Yeah, I think this is the only solution. Gonna try over the weekend.

Is building your own server, and just renting the rack space and internet connection an option? I host a dozen websites (albeit none with large quantities of image files) on a simple Mac Mini running Mac OS X server. It's virtually bullet proof. RAM is cheap, Disk space is cheap, CPU cycles are cheap as well.. You'll still have your issues, but you won't have to rely on anyone else to fix them. And there are huge communities of people to ping with questions should you run into any problems.

(02 Dec 2011, 23:13 )catransvestic Wrote: [ -> ]Is building your own server, and just renting the rack space and internet connection an option?

Too expensive.

(02 Dec 2011, 23:13 )catransvestic Wrote: [ -> ]I host a dozen websites (albeit none with large quantities of image files) on a simple Mac Mini running Mac OS X server.

Actually, I', pretty happy with the current configuration. The problem is simultaneous log processing on all VPSes what kills the disks.

(02 Dec 2011, 23:13 )catransvestic Wrote: [ -> ]RAM is cheap, Disk space is cheap, CPU cycles are cheap as well.

Why is dedicated hosting so expensive, them? 😁

(02 Dec 2011, 23:13 )catransvestic Wrote: [ -> ]but you won't have to rely on anyone else to fix them.

And I do not right now. I manage my own VPS. The problems begin when others affect mine. And here I can't do anything. Hence tickets.

Anyway. I'm going to look at the DNS service Linode provides (I need two NS servers, right?) and buy a VPS from them. I have exactly two weeks to configure, test, play and decide if it's a better solution.

(02 Dec 2011, 23:13 )catransvestic Wrote: [ -> ]albeit none with large quantities of image files

Read-only is not a big problem. Read-write is a problem.

Funny, everything work very smooth and fast right now. But it's so unstable when disk load increases.

Just ordered and created a new VPS. Sooooo fast!!! OK, let's hope for the best.