Thursday 27 December 2007

The attempted 1and1 cover up

I can't believe this, 1and1 are tellin gme that my server is pingable. Itry from my IP address - bl**dy hell it's working.

I log in via SSH, check the logs.... Hang on, someone from 1and1 rebooted the server 5 minutes ago!!!!


"I am fed up with 1and1 not taking me seriously.... obviously someone at 1and1 has rebooted it and now it is responsive.....
Next time read the notes....
I also can NOW access it via SSH - however that is because you will see from the server logs that it was only rebooted 5 minutes ago..... and it WASN'T me.......
There is still a problem as to why the Serial Console stops working, and that the SSH and all network connectivity also stops working - all of my other 1and1 servers are fully accessible...
This was meant to be escalated on Saturday - and yet it still isn't resolved.........
Stop rebooting my servers and thinking that this is an adequate fix......If you don't understand the problem I don't want you touching it, as it is obvious that very few people at 1and1 have the necessary intelligence or level of expertise to actually fault find this particular problem.

Regards

Netwarriors"

Getting Dizzy now

1and1 try to blame my lack of connection ability on where I am connecting from:

"Thank you for contacting us.
The contract id 16075630 has an IP address of 87.*.*.*.
This IP is pingable and accessible via ssh.
If you cannot reach it via ping or SSH then this may be a networking problem.
Please send in your IP address, you can find it by visiting a site such as
http://ipchicken.com and also send in a trace route.
This information will assist me in discovering the problem.
If you have any further questions please do not hesitate to contact us."

Finally a response but....

Hang on, I gave 1and1 the fault reference number, and the subject of the email also has tracking Tokens in it to automatically track emails to faults and accounts... Notice the 1and1 Engineer is the same engineer that couldn't be bothered last time. He knows the account, knows the fault, and the fault hasn't been closed yet, so what is going on?

"Thank you for contacting us.
In order to assist you, I need the contract that you are having a problem with.
Please reply back with the details.
If you have any further questions please do not hesitate to contact us.
--
Sincerely,
B**** E***

Dedicated Server Support
1&1 Internet"

5 days go passed

What the f**k is going on........ 5 days have gone passed and nothing from 1and1, not even the normal email from 1and1 stating that your email has been received....

Time to start chasing things up a bit

"Fault Ref: 8*******

I have raised a number of faults with regards to my Root Server, and yet 1and1 still have not rectified either the problem with the image or the problem with the hardware. For the 20th time since taking out my root server in October I still am unable to log in via the serial console connection.

I request therefore as promised under the above fault number that the issue be raised with your server fault team immediately………

As the only way to regain access to my server I am refusing to reboot it this time. Every time I reboot the server 1and1 tell me that there is no problem with the server because they can now logon…..this time I want 1and1 to see that there is actually a problem, and to stop treating me like some uneducated idiot that doesn’t have a clue about administering a Linux Server. This is a Fedora Core 6 (x64) minimal image. The only change I have made to the server is to install MySQL and to add another user and remove Root access for SSH………..

This does not explain why every 2-3 days I have to reboot the server via the 1and1 Control Panel. This is completely unacceptable and I need this sorted out now. I am supposed to have 24/7 support with this server, including Bank Holidays, and yet I have not received any further correspondence nor is the server fixed since Saturday 22nd December………..

You are now in breach of your contract with me, and demand a full refund for this appalling service. I also would like this server fixed today, without fail. I demand that someone from the server fault team contact me to resolve this issue. You have had nearly 2 months to solve this problem, and it still there.

Kind regards


Netwarriors"

Friday 21 December 2007

Erm, 24 hours gone by and server is down again

After re-imaging my server which I hoped would rectify the issues I was having, the server died of death again. From now on I start emailing complaints@1and1.com to keep them in the picture:

"My server is yet again unresponsive this morning, for the 20th time since I took the server out with yourselves. This server as explained is a vital development server and cannot keep having to reboot it so that I can access it. This is unacceptable.
Nothing was added or changed to the server after re-imaging with Fedora Core 6 (x64) Minimal yesterday as advised, and yet the server is unresponsive to both SSH and the Serial Console.
This is totally unacceptable now as I have pointed out that this server has problems and it has been ignored by 1and1.....
This must be rectified immediately along with a full refund for the passed two months I have been unable to use the server.....
I believe under this package I get a Hardware Guarantee and swap out and ask that my server be swapped out immediately. You are further delaying an important project of which I am now losing money and will be talking to my solicitors to seek compensation from 1and1 and it is obvious that you cannot provide the service I am paying for.
I expect this to be sorted out immediately without any delay, along with a refund for the pathetic service you are offering.
Regards


Netwarriors"

Thursday 20 December 2007

Apologies from 1and1

I receive an apologetic email from 1and1:

"Thank you for contacting us.
Sorry for the misunderstanding. We will forward your issue to our server department for them to double check further. For compensation benefits we suggest you email directly our complaints department at complaints@1and1.com.
If you have any further questions please do not hesitate to contact us.
--
Sincerely,
H****** A*** F*****
Technical Support
1&1 Internet
---------------"

Getting Angry now

I take the opportunity to point out that reading the fault is important, and to also point out a couple of important issues:

1) RAID should only rebuild disks if there is an error. If the RAID is rebuilding disks then there is an error - and if there is an error it should be investigated. RAID shouldn't have to rebuild disks after every CLEAN shutdown. If it is then this is indicative of a further fault.

2) I have re-imaged the server due to 1and1 having an error in their image

3) I have identified a potential problem with their images, dating back to November 2007, and therefore I disserve a refund.

4) And most important - I shouldn't have to reboot a server every time I want it to server some information.....


"I think you are missing the point....
I had to reboot my server to get it to come back up again. This isn't normal behaviour and you will see that this is the forth time I have had to reboot my server to get it to respond.
I also notice from various newsgroups that the Fedora Core 6 image that 1and1 created for the Business Root Server 1 wasn't done correctly causing the Softlock error for the CPU.
I have re-imaged my server with your new Fedora Core 6 minimal image to see if the new image you have created has stopped the bugs...
I would appreciate it H****** if you would read the ticket and what I have done, rather than just react to your opinions.......
I want a refund for the last two months as it is either a poor image that 1and1 created that has been causing the server to stop responding - as mentioned here:
http://www.1and1faq.com/forums/showthread.php?p=4866 OR there is a fault with the hardware.
Where I appreciate that in previous emails relating to this same problem it would appear that RAID was in fact doing it's job and re-synching the disks, this is still not adequate behaviour for a server and RAID is a mechanism to stop data being lost rather than it being relied upon to resync my disks every time the server is rebooted..... This again points to an underlying problem with either the hardware or the image that 1and1 have supplied for these servers.....

Regards

Netwarriors"

Bu**er it's Christmas

Well as there was no surpise really- I got no response and started doing some digging around. It would appear on a number of newsgroups that the SOFTLOCK error on the CPU was caused by a bug in the Fedora Core 6 image that 1and1 were installing on the servers.

As I doubted that I would get a response, eventhough the support contract does actually cover bank Holidays - Christmas Included, I decided to re-image the server........

Considering 1and1 can see when your server is rebooted and re-imaged, it was incredibly annoying to then receive this message from 1and1:


"Thank you for contacting us.
We have checked your issue then tried to ping your server here in our end and it is responding. Can you try to reboot your server into normal mode then access again serial console.
If you have any further questions please do not hesitate to contact us.
--
Sincerely,
H***** A*** F*****
Technical Support
1&1 Internet"

F**k it, this is cr*p

Couldn't believe this, so I decided the next time the server became unresponsive I would leave it alone and then 1and1 can get on with it.....Due to other commitments, I sort of left the mess with 1and1 until the problem presented itself again - as I knew it would.

I had been really busy, so don't know exactly when it happened, but I didn't notice it until I started winding down for the Christmas Holidays and was eager to pick up the development where I left off....

Open up my SSH Client - connect to my server.....Oh, no login prompt.

Open up my SSH Client - connect to the Serial Console - what's all this crap over the screen?


"login as: u*********
Using keyboard-interactive authentication.
Password:
Trying 172.19.95.73...
Escape character is '^]'.
BUG: soft lockup detected on CPU#0!
Call Trace:
[] softlockup_tick+0xfa/0x120 [] __do_softirq+0x5f/0xd0 [] update_process_times+0x57/0x90 [] smp_local_timer_interrupt"


Everything is unresponsive again, so lets try Support again:

"Problem still exists with my Root Server.
I cannot logon to me Root Server this morning via SSH. Whilst connecting via the Serial Console I am greeted with the following message:
login as: u46918866
Using keyboard-interactive authentication.
Password:
Trying 172.19.95.73...
Escape character is '^]'.
BUG: soft lockup detected on CPU#0!
Call Trace:
[] softlockup_tick+0xfa/0x120 [] __do_softirq+0x5f/0xd0 [] update_process_times+0x57/0x90 [] smp_local_timer_interrupt
Please can you arrange for this hardware to be replaced Urgently. I have waited for nearly 2 months for 1and1 to resolve the issues I am experiencing with this hardware.
I also would like to talk to your Customer Support Department for a refund for the passed two months as this server has been unusable.
Regards

Netwarriors"