Sunday 13 January 2008
No response from 1and1
Oh hang on, what's this in my inbox:
"Thank you for contacting us.
We understand that you're dedicated server had experienced some
downtime.
As you may or may not know, we are working around the clock on a move to
a new and greatly improved data center. We have been trying to move
into this new data center without any downtime or delays. However, we
know this isn't the case for you. We know that you have been seeing a
number of different issues, including downtime for your server.
We also know that our explanations and apologies only compensate for
your issues so much. This is why I have added one (1) month of free
hosting to your account. You will see this free month of hosting on
your next invoice/statement.
If you have any further questions please do not hesitate to contact us.
--
Sincerely,
J***** Q****
Technical Support
1&1 Internet"
1 month compensation? How long was this migration taking?
Saturday 12 January 2008
Inform 1and1 of blog
My server is still non-responsive - If I reboot it it locks up after 3-4 days.....
"WHY IS IT THE 12TH OF JANUARY AND STILL NO REPLY TO MY EMAILS BELOW….. AS YOU WILL SEE IT HAS BEEN ATLEAST 5 DAYS BEFORE YOUR HAVE RESPONSED, AND BEFORE THAT IT WAS 2 WHOLE WEEKS. THIS FAULT WAS RAISED ON THE 20TH DECEMEBR 2007 AND YOU ARE STILL NOT TAKING IT SERIOUSLY.
I AM SUPPOSED TO HAVE 24/7 SUPPORT WITH HARDWARE SWAPOUT, AND MY SERVER IS STILL UNRESPONSIVE AND YOU HAVE THEREFORE BROKEN THAT CONTRACT.
I ALSO REQUIRE ALL MONIES SPENT ON THIS ACCOUNT BE CREDITED BACK IMMEDIATELY AS I STILL HAVE NOT BEEN ABLE TO USE THIS ACCOUNT…….
I EXPECT SOME SORT OF RESOPNSE TODAY AS THIS IS PATHETIC…..
IF YOU FAIL TO REPOND I WILL BE PUTTING UP A WEBSITE ON THE INTERNET PUBLISHING EVERY EMAIL I HAVE SENT YOU ALONG WITH YOUR POOR RESPONSES AND I WILL ENSURE THAT YOU START LOSING CUSTOMERS TO YOUR COMPETITORS. I WILL ALSO BE INCLUDING A DIARY OF EVENTS WITH EVERY RECORDED TELEPHONE CONVERSATION I HAVE HAD WITH YOUR SUPPORT STAFF……IS IT ANY SURPRISE WHY YOUR STAFF ARE LEAVING AND GOING TO OTHER COMPANIES SUCH AS RACKSPACE????
GET THIS SORTED……………."
Sunday 6 January 2008
Server locks up again
within 48 hours of the supposed hardware test, the server locks up again - again the softlock CPU error appears in the logs:
"My server is still causing problems and locking up.
The logs state that the server stopped responding:
Jan 3 20:27:19 s15278325 kernel: BUG: soft lockup detected on CPU#0!
Jan 3 20:27:19 s15278325 kernel:
Jan 3 20:27:19 s15278325 kernel: Call Trace:
Jan 3 20:27:19 s15278325 kernel:
Jan 3 20:27:19 s15278325 kernel: [
Jan 3 20:27:19 s15278325 kernel: [
Jan 3 20:27:19 s15278325 kernel: [
Jan 3 20:27:19 s15278325 kernel: [
Jan 3 20:27:19 s15278325 kernel: [
Jan 3 20:27:19 s15278325 kernel: [
Jan 3 20:27:19 s15278325 kernel:
Jan 3 20:27:19 s15278325 kernel: [
Jan 3 20:27:19 s15278325 kernel: [
Jan 3 20:27:19 s15278325 kernel: [
Jan 3 20:27:19 s15278325 kernel: [
Jan 3 20:27:19 s15278325 kernel:
Jan 4 06:01:04 s15278325 dhclient: DHCPREQUEST on eth0 to 87.*.*.* port 67
Jan 4 06:01:47 s15278325 last message repeated 3 times
Also according to dmesg:
BUG: soft lockup detected on CPU#0!
Call Trace:
[
[
[
[
[
[
[
[
[
[
This server still has a problem and I need this sorted out. I understand that you have ran a full test on the hardware and have found no problems, however as you can see from the logs there is still a problem. I need this rectified – if you do not have the skills to resolve this issue then please escalate it urgently to a senior admin.
Kind regards
Netwarriors"
Friday 4 January 2008
Logs are sent
Everyone at 1and1 is ignoring the problem and nobody cares........ Eventually I request that the hardware be tested and and 1and1 request authorisation....
I gave authorisation on 27th Decemebr 2007 and yet the server wasn't checked until 3rd January 2008!!!! Surprise, surpise no faults are found, and yet nobody at 1and1 thinks it is strange that the server has to be rebooted for me to get access...
1and1 also don't think it is strange for the serial console to stop working....
This either means they set very low expectations for their network, of the calibre of people 1and1 employ is incredibly low and of very poor quality....
Thursday 27 December 2007
The attempted 1and1 cover up
I log in via SSH, check the logs.... Hang on, someone from 1and1 rebooted the server 5 minutes ago!!!!
"I am fed up with 1and1 not taking me seriously.... obviously someone at 1and1 has rebooted it and now it is responsive.....
Next time read the notes....
I also can NOW access it via SSH - however that is because you will see from the server logs that it was only rebooted 5 minutes ago..... and it WASN'T me.......
There is still a problem as to why the Serial Console stops working, and that the SSH and all network connectivity also stops working - all of my other 1and1 servers are fully accessible...
This was meant to be escalated on Saturday - and yet it still isn't resolved.........
Stop rebooting my servers and thinking that this is an adequate fix......If you don't understand the problem I don't want you touching it, as it is obvious that very few people at 1and1 have the necessary intelligence or level of expertise to actually fault find this particular problem.
Regards
Netwarriors"
Getting Dizzy now
"Thank you for contacting us.
The contract id 16075630 has an IP address of 87.*.*.*.
This IP is pingable and accessible via ssh.
If you cannot reach it via ping or SSH then this may be a networking problem.
Please send in your IP address, you can find it by visiting a site such as http://ipchicken.com and also send in a trace route.
This information will assist me in discovering the problem.
If you have any further questions please do not hesitate to contact us."
Finally a response but....
"Thank you for contacting us.
In order to assist you, I need the contract that you are having a problem with.
Please reply back with the details.
If you have any further questions please do not hesitate to contact us.
--
Sincerely,
B**** E***
Dedicated Server Support
1&1 Internet"
5 days go passed
Time to start chasing things up a bit
"Fault Ref: 8*******
I have raised a number of faults with regards to my Root Server, and yet 1and1 still have not rectified either the problem with the image or the problem with the hardware. For the 20th time since taking out my root server in October I still am unable to log in via the serial console connection.
I request therefore as promised under the above fault number that the issue be raised with your server fault team immediately………
As the only way to regain access to my server I am refusing to reboot it this time. Every time I reboot the server 1and1 tell me that there is no problem with the server because they can now logon…..this time I want 1and1 to see that there is actually a problem, and to stop treating me like some uneducated idiot that doesn’t have a clue about administering a Linux Server. This is a Fedora Core 6 (x64) minimal image. The only change I have made to the server is to install MySQL and to add another user and remove Root access for SSH………..
This does not explain why every 2-3 days I have to reboot the server via the 1and1 Control Panel. This is completely unacceptable and I need this sorted out now. I am supposed to have 24/7 support with this server, including Bank Holidays, and yet I have not received any further correspondence nor is the server fixed since Saturday 22nd December………..
You are now in breach of your contract with me, and demand a full refund for this appalling service. I also would like this server fixed today, without fail. I demand that someone from the server fault team contact me to resolve this issue. You have had nearly 2 months to solve this problem, and it still there.
Kind regards
Netwarriors"
Friday 21 December 2007
Erm, 24 hours gone by and server is down again
"My server is yet again unresponsive this morning, for the 20th time since I took the server out with yourselves. This server as explained is a vital development server and cannot keep having to reboot it so that I can access it. This is unacceptable.
Nothing was added or changed to the server after re-imaging with Fedora Core 6 (x64) Minimal yesterday as advised, and yet the server is unresponsive to both SSH and the Serial Console.
This is totally unacceptable now as I have pointed out that this server has problems and it has been ignored by 1and1.....
This must be rectified immediately along with a full refund for the passed two months I have been unable to use the server.....
I believe under this package I get a Hardware Guarantee and swap out and ask that my server be swapped out immediately. You are further delaying an important project of which I am now losing money and will be talking to my solicitors to seek compensation from 1and1 and it is obvious that you cannot provide the service I am paying for.
I expect this to be sorted out immediately without any delay, along with a refund for the pathetic service you are offering.
Regards
Netwarriors"
Thursday 20 December 2007
Apologies from 1and1
"Thank you for contacting us.
Sorry for the misunderstanding. We will forward your issue to our server department for them to double check further. For compensation benefits we suggest you email directly our complaints department at complaints@1and1.com.
If you have any further questions please do not hesitate to contact us.
--
Sincerely,
H****** A*** F*****
Technical Support
1&1 Internet
---------------"
Getting Angry now
I take the opportunity to point out that reading the fault is important, and to also point out a couple of important issues:
1) RAID should only rebuild disks if there is an error. If the RAID is rebuilding disks then there is an error - and if there is an error it should be investigated. RAID shouldn't have to rebuild disks after every CLEAN shutdown. If it is then this is indicative of a further fault.
2) I have re-imaged the server due to 1and1 having an error in their image
3) I have identified a potential problem with their images, dating back to November 2007, and therefore I disserve a refund.
4) And most important - I shouldn't have to reboot a server every time I want it to server some information.....
"I think you are missing the point....
I had to reboot my server to get it to come back up again. This isn't normal behaviour and you will see that this is the forth time I have had to reboot my server to get it to respond.
I also notice from various newsgroups that the Fedora Core 6 image that 1and1 created for the Business Root Server 1 wasn't done correctly causing the Softlock error for the CPU.
I have re-imaged my server with your new Fedora Core 6 minimal image to see if the new image you have created has stopped the bugs...
I would appreciate it H****** if you would read the ticket and what I have done, rather than just react to your opinions.......
I want a refund for the last two months as it is either a poor image that 1and1 created that has been causing the server to stop responding - as mentioned here: http://www.1and1faq.com/forums/showthread.php?p=4866 OR there is a fault with the hardware.
Where I appreciate that in previous emails relating to this same problem it would appear that RAID was in fact doing it's job and re-synching the disks, this is still not adequate behaviour for a server and RAID is a mechanism to stop data being lost rather than it being relied upon to resync my disks every time the server is rebooted..... This again points to an underlying problem with either the hardware or the image that 1and1 have supplied for these servers.....
Netwarriors"
Bu**er it's Christmas
As I doubted that I would get a response, eventhough the support contract does actually cover bank Holidays - Christmas Included, I decided to re-image the server........
Considering 1and1 can see when your server is rebooted and re-imaged, it was incredibly annoying to then receive this message from 1and1:
"Thank you for contacting us.
We have checked your issue then tried to ping your server here in our end and it is responding. Can you try to reboot your server into normal mode then access again serial console.
If you have any further questions please do not hesitate to contact us.
--
Sincerely,
H***** A*** F*****
Technical Support
1&1 Internet"
F**k it, this is cr*p
I had been really busy, so don't know exactly when it happened, but I didn't notice it until I started winding down for the Christmas Holidays and was eager to pick up the development where I left off....
Open up my SSH Client - connect to my server.....Oh, no login prompt.
Open up my SSH Client - connect to the Serial Console - what's all this crap over the screen?
"login as: u*********
Using keyboard-interactive authentication.
Password:
Trying 172.19.95.73...
Escape character is '^]'.
BUG: soft lockup detected on CPU#0!
Call Trace:
Everything is unresponsive again, so lets try Support again:
"Problem still exists with my Root Server.
I cannot logon to me Root Server this morning via SSH. Whilst connecting via the Serial Console I am greeted with the following message:
login as: u46918866
Using keyboard-interactive authentication.
Password:
Trying 172.19.95.73...
Escape character is '^]'.
BUG: soft lockup detected on CPU#0!
Call Trace:
Please can you arrange for this hardware to be replaced Urgently. I have waited for nearly 2 months for 1and1 to resolve the issues I am experiencing with this hardware.
I also would like to talk to your Customer Support Department for a refund for the passed two months as this server has been unusable.
Regards
Netwarriors"
Wednesday 21 November 2007
Erm..... what the F**k - Why 4 days to respond?
"Thank you for contacting us.
If DHCP is working properly now, it's likely this was a temporary issue.
I'm not aware of any ongoing issues to this extent.
The message you received about the raid simply means the raid had to reconstruct some data, this may have occurred if the system was shutdown improperly and thus some data hadn't been synchronized between the two drives. This does not in and of itself mean that a hard drive has failed, as this would present an entirely different set of errors.
If you have any further questions please do not hesitate to contact us.
--
Sincerely,
J***** B*********
Technical Support
1&1 Internet"
Saturday 17 November 2007
Where's my bloody light-sabre
"The DHCPACK did not happen until the server was rebooted - as you will see from the details shown at the bottom of this email, a DHCPACK was not received on NOV11 04:00 onwards - it was only after I initialised a reboot that a DHCPACK was received....
Last Sunday the Server became unresponsive and had to be forced a reboot form the Control Panel.
Kind regards
Netwarriors"
All servers have logs, and most logs have dates and times on them. How they managed to not see that a DHCPACK was not received amazes me. Nor were they able to comprehend that the DHCPACK only worked after the server was rebooted - obvious an underlying issue with 1and1.
Wow a response in less than a week
"Thank you for contacting us.
When was the last time the server shutdown unexpectantly?
I do not see anything indicative of a hard disk failure in the logs.
Do you get any errors when working with the file system?
If the server continues to reboot then we can have the hardware replaced.
I do see an ack from the DHCP server.
DHCPACK from 87.106.137.249
If you have any further questions please do not hesitate to contact us.
--
Sincerely,
B**** E***
Technical Support
1&1 Internet"
I would like to ask you all a question - look at the log I submitted earlier in the post - where was the DHCPACK???? When I have I suggested the server keeps rebooting?
try it out, then ask for information
"Password is the same as the initial password and hasn’t changed. The only thing is ROOT access is disabled via SSH, and will have to go through the serial console.
Kind regards
Netwarriors"
Finally a response from the Dark Side
"Thank you for contacting us.
Please reset the root password so we can futher investigate.
Reply back when the password has been reset to the value found in the
1&1 control panel.
If you have any further questions please do not hesitate to contact us.
--
Sincerely,
B**** E***
Dedicated Server Support
1&1 Internet"
Friday 16 November 2007
Surprise - no response
"I still haven't had a response to this email below:
------------------------------------------------
So can you explain this entry:
md: md6: raid array is not clean -- starting background reconstruction
Please also explain why there is no DHCPACK:
Nov 11 04:44:23 s15278325 dhclient: DHCPREQUEST on eth0 to
87.106.137.249
There must be a reason that the server stops responding, it's your Fedora Core 6 minimal image....and your hardware.
So please explain why the serial console stopped working.....
Kind regards
Netwarriors"
Thursday 15 November 2007
Return of the "Computer says no...."
Absolutely astonished at their previous response:
"So can you explain this entry:
md: md6: raid array is not clean -- starting background reconstruction
Please also explain why there is no DHCPACK:
Nov 11 04:44:23 s15278325 dhclient: DHCPREQUEST on eth0 to
87.106.137.249
There must be a reason that the server stops responding, it's your Fedora Core 6 minimal image....and you hardware.
So please explain why the serial console stopped working.....
Kind regards
Netwarriors"
The Phillipines strike back.......
Why is it then 3 days have passed before receiving a response:
"Thank you for contacting us.
I performed a memory and hard drive test on your server though found no errors at this time. Those messages below are not actually any sort of errors with the raid, what you are seeing is the raid autodetection process as the system looks for partitions with matching UUID's to enable as part of the raid.
If you have any further questions please do not hesitate to contact us.
--
Sincerely,
J***** B*********
Technical Support
1&1 Internet
---------------"
Monday 12 November 2007
In a galaxy far, far away........
The server specification included a Serial Console connection that enables the user access to the server should a serious mistake be made and you accidentally lock yourself out of your server and SSH no longer will allow your connections. The Serial Console connects the user directly a console connection on the server that is independent of SSH and is effectivley the same as working on the server locally in 'text console' mode.
This is vital for a development server as quite often a simple mistake can prevent the user from gaining access, such as a poorly written script could use up a large % of the CPU time and therefore there either isn't the RAM or CPU available to open a new SSH connection to the server..... The serial Console isn't effected by this, and it is possible on 90% of occasions to logon to the server and terminate the problematic script.
After 3 days of taking out my new server, selecting the Linux Image I required - Fedora Core 6 (x64), my server just stopped responding. Originally I thought this was a bit strange but didn't think much of it and decided to logon via the Serial Console........
Oh S**t, the Serial Console didn't work either. I logged on to my 1and1 control panel, and rebooted the server.........
After looking through the logs I noticed a number of concerning things, and sent the following to 1and1:
"I suspect that there is a hardware error with a new server package that I took out with 1and1. After about 3 days of non-activity the server becomes unresponsive. It is built with Fedora Core 6 minimal and the only things added to it were Apache, MySQL and PHP.
Logging in via Serial Console gives nothing and the server has to be rebooted from the 1and1 control panel before I get telnet access.
Dmesg shows what appears to be a failing RAID:
md: md6: raid array is not clean -- starting background reconstruction
raid1: raid set md6 active with 2 out of 2 mirrors
md: considering sdb5 ...
md: adding sdb5 ...
md: sdb1 has different UUID to sdb5
md: adding sda5 ...
md: sda1 has different UUID to sdb5
md: resync of RAID array md6
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
md: using 128k window, over a total of 4891648 blocks.
md: created md5
md: bind
md: bind
md: running:
raid1: raid set md5 active with 2 out of 2 mirrors
md: considering sdb1 ...
md: adding sdb1 ...
md: adding sda1 ...
md: created md1
md: bind
md: bind
md: running:
raid1: raid set md1 active with 2 out of 2 mirrors
md: ... autorun DONE.
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting. Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 380k freed
md: Autodetecting RAID arrays.
It would also appear that there is a problem with you DHCP server as no DHCP request is acknowledged as per /var/log/messages:
Nov 11 04:07:06 s15278325 syslogd 1.4.1: restart.
Nov 11 04:44:23 s15278325 dhclient: DHCPREQUEST on eth0 to 87.106.137.249 port 67
Nov 11 04:45:07 s15278325 last message repeated 4 times
Nov 11 04:46:19 s15278325 last message repeated 5 times
Nov 11 04:47:34 s15278325 last message repeated 4 times
Nov 11 04:48:50 s15278325 last message repeated 5 times
Nov 11 04:49:55 s15278325 last message repeated 4 times
Nov 11 04:51:05 s15278325 last message repeated 4 times
Nov 11 04:52:10 s15278325 last message repeated 5 times
Nov 11 04:53:13 s15278325 last message repeated 6 times
Nov 11 04:54:16 s15278325 last message repeated 4 times
Nov 11 04:55:23 s15278325 last message repeated 7 times
Nov 11 04:56:43 s15278325 last message repeated 6 times
Nov 11 04:57:48 s15278325 last message repeated 5 times
Nov 11 04:58:51 s15278325 last message repeated 6 times
Nov 11 04:59:52 s15278325 last message repeated 4 times
This is the third time this has happened to this server and would appreciate someone looking into it. It is not yet a current production server so if 1and1 need to carry out reboot and testing on the server to check the hardware then this will be fine as it will not affect any of our services. I finish work on Friday and by the time I log on Monday morning the server hardware is non-responsive again.
This needs to be rectified ASAP as this is a development server for a large World Wide Record Company and this will be holding up their development for their global website.
Kind regards
Netwarriors"
Sunday 11 November 2007
The Beginning
I required a hosting company, like many others, but because it was just an 'idea' I didn't want to spend enormous amounts of money and wanted to just try it and see how it went.
I did, for my sins, decide to host a server with 1and1 Internet, a choice I wish I had never made. Having been in the IT industry for 15 years, I have seen good hosting, and I have seen some very, very bad hosting....... I will leave you to read through this blog to decide which category 1and1 fall in to.
This blog contains factual information with regards a fault I had experienced with a server I had with 1and1. This information is not derogatory, it is not embellished in anyway. This blog accurately records the tribulations I have suffered whilst dealing with this company.