The Answer Gang

By Jim Dennis, Ben Okopnik, Dan Wilder, Breen, Chris, and the Gang, the Editors of Linux Gazette... and You!
Send questions (or interesting answers) to linux-questions-only@ssc.com

There is no guarantee that your questions here will ever be answered. Readers at confidential sites must provide permission to publish. However, you can be published anonymously - just let us know!

Please need help !!! ext2 problem !!!

From Angel Lacal

Answered By Thomas Adam, Mike Orr, John Karns, Heather Stern

Please, please, help me... I'm desperated !!!!!!!!!

[Thomas] Have some coffee, that always helps

I was shutting down our officce server this evening when I realised that samba daemon didn't stopped fine... I boot up again when I saw the problem... A Windows 2000 PC of my offcie was still ON and connected to the server !!!!!!

[Thomas] It would be if the samba daemon (smbd) did not "shut down" as you say

We have three IDE disk

hda: It's the boot disk
hdc: It's the disk where we work
hdd: It's a disk where we make the backups.

Problem: The backup was unfinshed... so it was completely unusable.

[Mike] You'll have to delete the backup and run it again.

To protect against problems like this in the future, consider a journalling filesystem. In a journalling filesystem like ext3 and ReiserFS, there's a separate file where the filesystem logs every action before it does it. Then, if the computer gets shut down or crashes between steps or in the middle of a step, on the next boot, it can use the journal to continue where it left off. See the article http://www.linuxgazette.com/issue68/dellomodarme.html for more information. Ext3 claims to be backwards compatible with ext2, and is more mature now than when the article was written. Nevertheless, you may want to experiment with a journalling filesystem on a test machine first to get used to it before putting it on your production server.

[Heather] You might also consider forms of backups which are not complete copies of the filesystem... since the point is about recovery. Sure it makes things speedier if you can just drop a fresh drive into place, but it all depends. A copy of a well-configured kickstart floppy or 'dpkg --get-selections' list, and backups of the couple of dozen textfiles you had to tweak as a sysadmin before things were ready to start filling with data, can reduce the size of your regular backup tapes or cdrw media-packs to something much easier to cart around.

Problem2: After advising I had unconsistency problems, I logged as root and run e2fsck:

e2fsck /dev/hdd

GOD !!! It was full of bad Inodes, references, duplicates... etc...

[Thomas] Umm, definate data corruption. A pity that you did not post us a sample of "ls -al" command. I would have been interested in it.

Problem3: I began to run e2fsck over /dev/hdc .... AND THE SAME THING !!! I CTRL-C ... just when it started... What do I do now ?????? Continue with e2fsck ??? Or try other thing ???

[Mike]

********** !!! WAIT !!! WAIT !!! WAIT !!! ****************** *********** MAYDAY, MAYDAY!!!!! ****************************

You do NOT run fsck on entire drives (/dev/hdc, /dev/hdd). YOu run it on PARTITIONS (/dev/hdc1, /dev/hdd5, etc). Unless you've set up the entire drive as a single partition without a partition table, which is not normally done.

Of course, with floppies the partition and the drive are the same thing, since floppies don't have partition tables, but we're talking about hard disks.

[Thomas] ERrr... Mmmm... yeah... I discovered that later... but, well... at last I noticed my fault and could save the partition with the superblocks-secret-position gently given by mke2fs... heheehehe--

[John] man fsck>

In actuality, fsck is simply a front-end for the various file system checkers (fsck.fstype) available under Linux. The file system-specific checker is searched for in /sbin first, then in /etc/fs and /etc, and finally in the direc- tories listed in the PATH environment variable. Please see the file system-specific checker manual pages for fur- ther details.

So it wouldn't give any different results than e2fsck, since it would be running e2fsck.

It seems the partition is broken... The disk can't be mounted obviouesly... What can I do ??????

PLEASE, PLEASE, PLEASE.... I'm desperated !!!!!!!!!!!!!!!!! HELP ME !!!!!!!!!

[Mike] Continue with fsck. That will get the partitions in a consistent state so that you can determine what is intact and what has been lost. Most of the time, fsck can fully repair things.

[Heather] I can vouch for that. One time Jim Dennis, who wasn't yet the famous Answer Guy at the time, really mangled a drive by, ummm, telling X to use the drive controllers memory as video controller memory. Darn typos.

[Mike] Like when Linus told his terminal emulator to dial his hard drive... that's what motivated him to implement file permissions in Linux.

[Heather] Anyways, he fsck'd it over and over because it was still fixing things. In 6 passes - the system was actually, to our shock, almost usable. Please don't try to repeat that at home, kids... unless you enjoy breaking things horribly.

[Mike] Sometimes it will put recovered files in /MOUNT_POINT/lost+found with filenames like "#12345" (since the original path/filename has been lost). If you get files like that, you'll have to determine whether they are worth keeping (they may be temporary files you don't care about), whether they are complete, and what their proper paths/filenames are.

Fortunately, another part of fsck gives a clue as to what the original filenames are. For every recovered file, there's an orphaned directory entry somewhere that's no longer attached to any file. Fsck will report these as "link count is wrong for file FILENAME, is 1, should be 0" or something like that, and will adjust the link count and/or delete the filename. Write down the filenames-without-files it reports, and use that list as you go through the lost+found files. Reconstruct the original files as best you can, and move them back to their original places.

Normally, you don't have to deal with lost+found files at all, and it's even less common for them to contain important data. In your case, I can't tell whether the errors you describe are an ordinary fsck or something especially severe. It really depends on the quantity of errors. Here's a rundown of the most common errors (from memory):

"Deleted inode has zero dtime." -> unimportant.

"link count wrong for file FILENAME" -> may or may not be important. The filesystem has to modify the file itself (the inode) and the filename (a directory entry) separately, so the crash happened between the two steps. Fsck prefers to preserve data, so it usually does the right thing. You'll probably find that your file still exists under one of its names at least (or as a lost+found file), and then it'll just be a matter of redoing links to it that you already created or deleted.

"block bitmap differences." -> a few of these are common. If you get hundreds of them, I would be concerned. However, often it will correct those hundreds and you'll never have trouble with them again. It all depends on what caused the differences, which is something fsck doesn't know.

There are other common errors, but I can't remember them offhand.

After fsck successfully completes, run it again to look for more errors. Continue running it until you get the "clean" message, then run "fsck -f" once more. (The "-f" option forces fsck to run, even if it doesn't think it needs to.) Repeat for your other ext2 partitions.

[John] If it were me in that situation and the data was very critical, I would use dd to do a raw copy of the partitions to another media, either tape or another disk with sufficient capacity to hold the data. Then run the e2fsck, but perhaps specifying an alternate superblock with the -b option. See man e2fsck for more details.

Then if e2fsck fails to fix the problem satisfactorily, you would at least have the option of restoring from the dd dump and trying other options.

... our reader replies ...

Thanks a lot for your VERY helpfull information,

[Mike] And thank YOU for telling us what succeeded. It would make an excellent 2-Cent Tip if it weren't already being formatted as an Answer Gang thread.

We're planning to switch to ext3, because we have a lot of theese errors because win2K + linux-samba is a risky situation.

[Mike] I just saw yesterday afternoon that Red Hat 7.2 is out, and its default filesystem is ext3. That may encourage other distributions to switch too.

Ah... at last I could repair my damaged filesystem... Want to know how... At the deepest state of my desperation I "mke2fs" the backup disk... And then I saw the light. Both damaged disk were very likely each other, and when I formatted the backup disk, mke2fs gave me the clue: THE POSITION OF THE [SECRET REBEL BASE] BACKUP SUPERBLOCKS !!!!

They weren't at 8193 as docs said, nor 32 as Linux Unleashed claims. They were at 32XXX, 9XXXX, 12XXXX (I don't remmember the exact numbers) and so on...

So I tried directly to "fsck -b [One of those superblock positions] -n /dev/hdc1" ... I checked out that most of the Inode messages dissapeared. So I decide to run fsck without -n and pray. GOD !!! IT WORKED !!!

But I had a little problem... Those Inode tables presumebly "repaired" by fsck in the first and stopped round where lost. But it was a minor problem. The very rest of the disk was recovered.

So a little moral for this fary tale: NEVER run fsck when the system ask you to.

[Mike] Normally, running fsck is what people want to do. Your case was unusual.

[Heather] fsck was wanted, but you needed the "secret rebel base" parameters to use a superblock that hadn't yet been afflicted by the reformatting side of The Force.

FIRST ask the people who knows something about filesystems.

[Mike] Or at least, people who have run fsck many times and know what the typical errors are like, and what is a typical quantity of errors.

SECOND find all the information about your HD and run fsck with -n option

THREE weigh up the consequences.

FOUR back your disk up with "dd" and mount the back up as loop

[Mike] I sometimes suggest people back up their disk with dd (dd if=/dev/hda of=/some/file/on/another/partition) but people usually say they don't have enough disk space for it.

[Heather] Which question really comes down to, how important is that data to you, really? One customer I had was very desperate - and there sure wasn't room to spare - we sent it up an ssh pipe to another system, while booted off of a mini-distro. Besides, if you can't trust the local hard disk... you're going to put the bits on the local hard disk? Maybe not!

FIVE... GO FOR IT !

Thanks for your attention. A happy user.

HTML script maintained by Heather Stern of Starshine Technical Services, http://www.starshine.org/

1 2 3 4 5 6 7 8 9

The Answer Gang

By Jim Dennis, Ben Okopnik, Dan Wilder, Breen, Chris, and the Gang, the Editors of Linux Gazette... and You! Send questions (or interesting answers) to linux-questions-only@ssc.com

Please need help !!! ext2 problem !!!

This page edited and maintained by the Editors of Linux Gazette Copyright © 2001 Published in issue 72 of Linux Gazette November 2001

HTML script maintained by Heather Stern of Starshine Technical Services, http://www.starshine.org/

By Jim Dennis, Ben Okopnik, Dan Wilder, Breen, Chris, and the Gang, the Editors of Linux Gazette... and You!
Send questions (or interesting answers) to linux-questions-only@ssc.com

This page edited and maintained by the Editors of Linux Gazette Copyright © 2001
Published in issue 72 of Linux Gazette November 2001