fsck segfault on a big partition, 4.6

by Rob Sheldonon 2010-01-26T23:42:23+00:00
Hi,
So, the short version is that I have a server with OpenBSD 4.6 that can't
fsck its big partition; fsck fails with a segfault every time. If I "ulimit
-d unlimited" before fsck'ing, it just takes a little longer to segfault.
It produces no other output. IIRC, the partition is roughly 6 TB. Two
questions then: is there any way through this that doesn't involve
newfs'ing the partition, and is there a "right" way to do a partition of
that size in OpenBSD given fsck's 1G hard limit?
The longer version: this is a backup server running backuppc for a
corporate client ("large enough number of workstations") that does research
work ("some really big files"). I _thought_ I had read the big filesystem
FAQ carefully, but somehow missed that fsck simply couldn't handle anything
over 1TB without doing funny things during the fs setup. So, this
particular partition was backuppc's data directory, and it was set up with
the default block sizes. Also possibly noteworthy: there's no swap, the OS
and other partitions are all running off of a USB flash drive for various
reasons.
If I have to wipe the partition and start over, it's not a disaster. This
was a newer server, the old backup server was still online and still had
some disk left, so I get to keep my butt out of a sling. But, if I'm going
to have to do that, then I also need to consider whether it might just be
better to use a different OS. (No foul intended, I'm a big fan of OpenBSD,
but it just might not be the right tool for this job.)
There's no dmesg attached because I'm not on-site with the server at the
moment, and because AFAICT this is a known problem.
Thanks,
- R.
--
[__ Robert Sheldon
[__ Founder, No Problem
[__ Information technology support and services
[__ Software and web design and development
[__ (530) 575-0278
[__ "You must be the change you wish to see in the world." -- Mahatma
Gandhi

Re: fsck segfault on a big partition, 4.6

by L. V. Lammerton 2010-01-27T00:14:57+00:00.
On Wed, 27 Jan 2010, Rob Sheldon wrote:
> Hi,
>
> So, the short version is that I have a server with OpenBSD 4.6 that can't
> fsck its big partition; fsck fails with a segfault every time. If I "ulimit
> -d unlimited" before fsck'ing, it just takes a little longer to segfault.
> It produces no other output. IIRC, the partition is roughly 6 TB. Two
> questions then: is there any way through this that doesn't involve
> newfs'ing the partition, and is there a "right" way to do a partition of
> that size in OpenBSD given fsck's 1G hard limit?
>
Don't know if this is related to a problem I had on a machine recently, ..
however I found that if I hung the 'bad' drive on ANOTHER machine, the
fsck ran just fine!
Might be worth a try, ..
Lee

Re: fsck segfault on a big partition, 4.6

by Tobias Ulmeron 2010-01-27T01:58:23+00:00.
On Wed, Jan 27, 2010 at 12:38:47AM +0000, Rob Sheldon wrote:
> Hi,
>
> So, the short version is that I have a server with OpenBSD 4.6 that can't
> fsck its big partition; fsck fails with a segfault every time. If I "ulimit
> -d unlimited" before fsck'ing, it just takes a little longer to segfault.
> It produces no other output. IIRC, the partition is roughly 6 TB. Two
> questions then: is there any way through this that doesn't involve
> newfs'ing the partition, and is there a "right" way to do a partition of
> that size in OpenBSD given fsck's 1G hard limit?
Amd64 allows 8G. Increase newfs blocksize to 64k (make sure you don't
run out of inodes), that should lessen the memory requirements a bit
and make fsck runs a little faster.
I have my doubts about OpenBSD as a (backup) file server with large
filesystems, there might be a more appropriate OS for the job.

Re: fsck segfault on a big partition, 4.6

by Otto Moerbeekon 2010-01-27T05:50:31+00:00.
On Wed, Jan 27, 2010 at 12:38:47AM +0000, Rob Sheldon wrote:
> Hi,
>
> So, the short version is that I have a server with OpenBSD 4.6 that can't
> fsck its big partition; fsck fails with a segfault every time. If I "ulimit
> -d unlimited" before fsck'ing, it just takes a little longer to segfault.
> It produces no other output. IIRC, the partition is roughly 6 TB. Two
> questions then: is there any way through this that doesn't involve
> newfs'ing the partition, and is there a "right" way to do a partition of
> that size in OpenBSD given fsck's 1G hard limit?
No, there is no other way. I've posted a small piece of code some time
ago that estimate the amount of mem needed for doing an fsck during newfs.
Therse days, amd64 is the only platform that increases the limit
(MAXDSIZE) to 8G. Though you venture into untested territory, we
(myself at least) just do not have the hardware to test anything
beyond 2T.
>
> The longer version: this is a backup server running backuppc for a
> corporate client ("large enough number of workstations") that does research
> work ("some really big files"). I _thought_ I had read the big filesystem
> FAQ carefully, but somehow missed that fsck simply couldn't handle anything
> over 1TB without doing funny things during the fs setup. So, this
> particular partition was backuppc's data directory, and it was set up with
> the default block sizes. Also possibly noteworthy: there's no swap, the OS
> and other partitions are all running off of a USB flash drive for various
> reasons.
The SEGVs may be related to not having swap. Running OpenBSD in
overcommitted state is not what you want.
>
> If I have to wipe the partition and start over, it's not a disaster. This
> was a newer server, the old backup server was still online and still had
> some disk left, so I get to keep my butt out of a sling. But, if I'm going
> to have to do that, then I also need to consider whether it might just be
> better to use a different OS. (No foul intended, I'm a big fan of OpenBSD,
> but it just might not be the right tool for this job.)
>
> There's no dmesg attached because I'm not on-site with the server at the
> moment, and because AFAICT this is a known problem.
A pity, since it does matter what platform you run on. fsck needing a
lot of memory is indeed a known problem, but the SEGVs are not. You
might want to check if they still occur when you have enough swap.
-Otto

Re: fsck segfault on a big partition, 4.6

by Rob Sheldonon 2010-01-27T12:55:00+00:00.
On Tue, 26 Jan 2010 19:10:47 -0600 (CST), "L. V. Lammert"

wrote:
> On Wed, 27 Jan 2010, Rob Sheldon wrote:
>
> Don't know if this is related to a problem I had on a machine recently,
..
> however I found that if I hung the 'bad' drive on ANOTHER machine, the
> fsck ran just fine!
To be honest, I'm not sure how I'd set that up without a ton of effort.
The 6TB are done through multiple drives (raid 6) through an Areca raid
controller; without having an identical machine to swap the hardware into,
I don't think I could pull that off. Even if I did have an identical system
to do that with, I doubt it would gain me anything in this case.
Thanks for the tip though. :-)
- R.
--
[__ Robert Sheldon
[__ Founder, No Problem
[__ Information technology support and services
[__ Software and web design and development
[__ (530) 575-0278
[__ "You must be the change you wish to see in the world." -- Mahatma
Gandhi

Re: fsck segfault on a big partition, 4.6

by Rob Sheldonon 2010-01-27T13:12:47+00:00.
On Wed, 27 Jan 2010 07:42:42 +0100, Otto Moerbeek wrote:
> On Wed, Jan 27, 2010 at 12:38:47AM +0000, Rob Sheldon wrote:
>
>> Hi,
>
> Therse days, amd64 is the only platform that increases the limit
> (MAXDSIZE) to 8G. Though you venture into untested territory, we
> (myself at least) just do not have the hardware to test anything
> beyond 2T.
OK. I just went back and looked at the order sheet for this thing, and it
looks like it shipped with enough RAM to require amd64, so it should be
(had better be!) running that kernel.
I'd like to help, if at all possible. I should be able to get on-site with
the client for at least a couple of hours today, and I can probably draw
this out for a few days before I have to get the server back on-line. I can
provide a dmesg and any other system specs without too much trouble -- is
there any way to help track down the exact source of the segfault?
> The SEGVs may be related to not having swap. Running OpenBSD in
> overcommitted state is not what you want.
What do you mean by "overcommitted state" -- not enough resources? The
only thing this machine is supposed to do is run backuppc, which is just
rsync with some Perl scripts. The old backup server was doing the same job
with less resources for quite a while. The old server did have a swap
partition, but as near as I could tell it was rarely used. ...In fact, I
just logged in to the old server; it has an 8G swap partition, and top says
it's not using any of it.
So here's something I don't understand then: in the generic kernel, will
fsck allocate more than 1G if swap is available, or is it still limited to
just 1G?
>> There's no dmesg attached because I'm not on-site with the server at
the
>> moment, and because AFAICT this is a known problem.
>
> A pity, since it does matter what platform you run on. fsck needing a
> lot of memory is indeed a known problem, but the SEGVs are not. You
> might want to check if they still occur when you have enough swap.
OK. I'll get that info to you, and anything else you need (that I can
handle), and I'll futz around with it and see if I can cable in a spare
drive for swap.
- R.
--
[__ Robert Sheldon
[__ Founder, No Problem
[__ Information technology support and services
[__ Software and web design and development
[__ (530) 575-0278
[__ "You must be the change you wish to see in the world." -- Mahatma
Gandhi

Re: fsck segfault on a big partition, 4.6

by Otto Moerbeekon 2010-01-27T13:36:35+00:00.
On Wed, Jan 27, 2010 at 02:06:20PM +0000, Rob Sheldon wrote:
> On Wed, 27 Jan 2010 07:42:42 +0100, Otto Moerbeek wrote:
> > On Wed, Jan 27, 2010 at 12:38:47AM +0000, Rob Sheldon wrote:
> >
> >> Hi,
> >
> > Therse days, amd64 is the only platform that increases the limit
> > (MAXDSIZE) to 8G. Though you venture into untested territory, we
> > (myself at least) just do not have the hardware to test anything
> > beyond 2T.
>
> OK. I just went back and looked at the order sheet for this thing, and it
> looks like it shipped with enough RAM to require amd64, so it should be
> (had better be!) running that kernel.
>
> I'd like to help, if at all possible. I should be able to get on-site with
> the client for at least a couple of hours today, and I can probably draw
> this out for a few days before I have to get the server back on-line. I can
> provide a dmesg and any other system specs without too much trouble -- is
> there any way to help track down the exact source of the segfault?
>
> > The SEGVs may be related to not having swap. Running OpenBSD in
> > overcommitted state is not what you want.
>
> What do you mean by "overcommitted state" -- not enough resources? The
> only thing this machine is supposed to do is run backuppc, which is just
> rsync with some Perl scripts. The old backup server was doing the same job
> with less resources for quite a while. The old server did have a swap
> partition, but as near as I could tell it was rarely used. ...In fact, I
> just logged in to the old server; it has an 8G swap partition, and top says
> it's not using any of it.
The point is that fsck_ffs need loads of memory.
>
> So here's something I don't understand then: in the generic kernel, will
> fsck allocate more than 1G if swap is available, or is it still limited to
> just 1G?
Depends on the arch. i386 is limited to 1G, amd64 is limited to 8G per
process. What happens if more memory is allocated than the available
swap is that the kernel will kill random processes to free swap. That
might be what is going on in your case. Also, in some cases a lack of
physical memory might kill processes.
-Otto
>
> >> There's no dmesg attached because I'm not on-site with the server at
> the
> >> moment, and because AFAICT this is a known problem.
> >
> > A pity, since it does matter what platform you run on. fsck needing a
> > lot of memory is indeed a known problem, but the SEGVs are not. You
> > might want to check if they still occur when you have enough swap.
>
> OK. I'll get that info to you, and anything else you need (that I can
> handle), and I'll futz around with it and see if I can cable in a spare
> drive for swap.
>
> - R.
>
> --
> [__ Robert Sheldon
> [__ Founder, No Problem
> [__ Information technology support and services
> [__ Software and web design and development
> [__ (530) 575-0278
> [__ "You must be the change you wish to see in the world." -- Mahatma
> Gandhi

Re: fsck segfault on a big partition, 4.6

by frantisek holopon 2010-01-27T14:04:50+00:00.
hmm, on Wed, Jan 27, 2010 at 03:28:12PM +0100, Otto Moerbeek said that
> Depends on the arch. i386 is limited to 1G, amd64 is limited to 8G per
> process. What happens if more memory is allocated than the available
> swap is that the kernel will kill random processes to free swap. That
> might be what is going on in your case. Also, in some cases a lack of
> physical memory might kill processes.
the kernel will kill random processes? are we talking about linux's OOM
here or openbsd? since when is this in openbsd? i seem to recall
some debate where openbsd devs found that idea ridiculous. i know i do,
and the machine should panic instead of starting shooting down processes.
-f
--
to get a loan you must prove you don't need it.

Re: fsck segfault on a big partition, 4.6

by Joe Gidion 2010-01-27T14:16:23+00:00.
On Wed, January 27, 2010 9:28 am, Otto Moerbeek wrote:
> Depends on the arch. i386 is limited to 1G, amd64 is limited to 8G per
> process. What happens if more memory is allocated than the available
> swap is that the kernel will kill random processes to free swap. That
> might be what is going on in your case. Also, in some cases a lack of
> physical memory might kill processes.
>
> -Otto
Does this mean that amd64 can now handle >4G of RAM, or is that a separate
issue?
--
Joe Gidi
joe@entropicblur.com
On Wed, 27 Jan 2010 16:00:32 +0100, frantisek holop
wrote:
> hmm, on Wed, Jan 27, 2010 at 03:28:12PM +0100, Otto Moerbeek said that
>
> the kernel will kill random processes? are we talking about linux's OOM
> here or openbsd? since when is this in openbsd? i seem to recall
> some debate where openbsd devs found that idea ridiculous. i know i do,
> and the machine should panic instead of starting shooting down
processes.
I remember reading a thread here about killing random processes a long
time ago, but I don't recall the results of that. I can't find it (quickly)
in the archives.
If you (and all) don't mind, if there's going to be any debate about this,
I'd like to see it under a different thread instead.
- R.
--
[__ Robert Sheldon
[__ Founder, No Problem
[__ Information technology support and services
[__ Software and web design and development
[__ (530) 575-0278
[__ "You must be the change you wish to see in the world." -- Mahatma
Gandhi

Re: fsck segfault on a big partition, 4.6

by Ted Unangston 2010-01-27T14:46:13+00:00.
On Wed, Jan 27, 2010 at 10:00 AM, frantisek holop wrote:
> hmm, on Wed, Jan 27, 2010 at 03:28:12PM +0100, Otto Moerbeek said that
>> Depends on the arch. i386 is limited to 1G, amd64 is limited to 8G per
>> process. What happens if more memory is allocated than the available
>> swap is that the kernel will kill random processes to free swap. That
>> might be what is going on in your case. Also, in some cases a lack of
>> physical memory might kill processes.
>
> the kernel will kill random processes? are we talking about linux's OOM
> here or openbsd? since when is this in openbsd? i seem to recall
> some debate where openbsd devs found that idea ridiculous. i know i do,
> and the machine should panic instead of starting shooting down processes.
Some archs will kill processes, some will panic. i386 and amd64
should both panic I believe.

Re: fsck segfault on a big partition, 4.6

by Roberton 2010-01-27T14:52:57+00:00.
frantisek holop wrote:
> the kernel will kill random processes? are we talking about linux's OOM
> here or openbsd? since when is this in openbsd? i seem to recall
> some debate where openbsd devs found that idea ridiculous. i know i do,
> and the machine should panic instead of starting shooting down processes.
>
> -f
Am I missing something here?
If the OS runs out of (any) memory then there is already a serious
problem. In such a case I would prefer that the kernel kills some random
applications but protects itself, so that I can login on the console and
check what's going on. It might even be possible to make a clean reboot
(avoiding a long fsck).
A kernel panic is IMHO the worst option.
?
Please explain your point of view, or why the devs consider it a bad
idea (a quick search on the list didn't show anything).
(I understand that in case of kernel development a panic would be useful
as it shows information, but I consider the "daily usage" case)
regards,
Robert
PS:
What is the actual situation in OpenBSD? Does it have some OOM killer?

Re: fsck segfault on a big partition, 4.6

by Otto Moerbeekon 2010-01-27T15:32:44+00:00.
On Wed, Jan 27, 2010 at 10:31:40AM -0500, Ted Unangst wrote:
> On Wed, Jan 27, 2010 at 10:00 AM, frantisek holop wrote:
> > hmm, on Wed, Jan 27, 2010 at 03:28:12PM +0100, Otto Moerbeek said that
> >> Depends on the arch. i386 is limited to 1G, amd64 is limited to 8G per
> >> process. What happens if more memory is allocated than the available
> >> swap is that the kernel will kill random processes to free swap. That
> >> might be what is going on in your case. Also, in some cases a lack of
> >> physical memory might kill processes.
> >
> > the kernel will kill random processes? are we talking about linux's OOM
> > here or openbsd? since when is this in openbsd? i seem to recall
> > some debate where openbsd devs found that idea ridiculous. i know i do,
> > and the machine should panic instead of starting shooting down processes.
>
> Some archs will kill processes, some will panic. i386 and amd64
> should both panic I believe.
Somewhere in my memory is says that on i386 at least, it can happen
that a trap handler isn't able to allocate a physcial page which
eventually leads to a SEGV of the process.
But my memory isn't what it used to be, and I do not have time to dig
into this further right now.
-Otto

Re: fsck segfault on a big partition, 4.6

by frantisek holopon 2010-01-27T18:11:52+00:00.
hmm, on Wed, Jan 27, 2010 at 04:35:19PM +0100, Robert said that
> If the OS runs out of (any) memory then there is already a serious
there's plenty of discussion about the virtues/stupidity
of the OOM killer approach, including various "pardon" policies.
google for "out of fuel linux" for amusement.
> problem. In such a case I would prefer that the kernel kills some
> random applications but protects itself, so that I can login on the
> console and check what's going on. It might even be possible to make
riiight. and how pray if that random process happens to be the
ssh daemon or some other process supporting your infrastructure?
if a process is out of control, i'd rather have the system complain
loudly and angrily. i am not keen on seeing mysterious missing
processes, user/customer complaints because of untraceable failures
of transactions, tasks, jobs, whatever.
-f
--
fish and guests smell in three days.

Re: fsck segfault on a big partition, 4.6

by Rob Sheldonon 2010-01-27T19:49:22+00:00.
On Wed, 27 Jan 2010 07:42:42 +0100, Otto Moerbeek wrote:
> On Wed, Jan 27, 2010 at 12:38:47AM +0000, Rob Sheldon wrote:
>
>> There's no dmesg attached because I'm not on-site with the server at
the
>> moment, and because AFAICT this is a known problem.
>
> A pity, since it does matter what platform you run on. fsck needing a
> lot of memory is indeed a known problem, but the SEGVs are not. You
> might want to check if they still occur when you have enough swap.
OK, I was able to visit for a few minutes today, enough to get the machine
answering ssh again.
First, disklabel so you know what it actually has:
$ sudo disklabel sd1
# /dev/rsd1c:
type: SCSI
disk: SCSI disk
label: Transcend 4GB
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 488
total sectors: 7843840
rpm: 3600
interleave: 1
boundstart: 63
boundend: 7839720
drivedata: 0
16 partitions:
# size offset fstype [fsize bsize cpg]
a: 7839657 63 4.2BSD 2048 16384 1 # /
c: 7843840 0 unused
$ sudo disklabel sd0
# /dev/rsd0c:
type: SCSI
disk: SCSI disk
label: ARC-1220-VOL#00
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 729458
total sectors: 11718749184
rpm: 10000
interleave: 1
boundstart: 63
boundend: 3128808178
drivedata: 0
16 partitions:
# size offset fstype [fsize bsize cpg]
a: 11718749121 63 4.2BSD 2048 16384 1
c: 11718749184 0 unused
...and the dmesg...
$ dmesg
OpenBSD 4.6 (GENERIC.MP) #81: Thu Jul 9 21:26:19 MDT 2009
deraadt@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 3486973952 (3325MB)
avail mem = 3370655744 (3214MB)
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xcfedf000 (39 entries)
bios0: vendor Phoenix Technologies LTD version "1.2a" date 12/19/2008
bios0: Supermicro X7SB4/E
acpi0 at bios0: rev 2
acpi0: tables DSDT FACP _MAR MCFG APIC BOOT SPCR ERST HEST BERT EINJ SLIC
SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT
acpi0: wakeup devices PXHA(S5) PXHB(S5) PEX_(S5) LAN_(S5) USB4(S5)
USB5(S5) USB7(S5) ESB2(S5) EXP1(S5) EXP5(S5) EXP6(S5) USB1(S5) USB2(S5)
USB3(S5) USB6(S5) ESB1(S5) PCIB(S5) KBC0(S1) MSE0(S1) COM1(S5) COM2(S5)
PWRB(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Pentium(R) Dual-Core CPU E5200 @ 2.50GHz, 2494.07 MHz
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,EST,TM2,CX16,xTPR,NXE,LONG
cpu0: 2MB 64b/line 8-way L2 cache
cpu0: apic clock running at 199MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Pentium(R) Dual-Core CPU E5200 @ 2.50GHz, 2493.75 MHz
cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,EST,TM2,CX16,xTPR,NXE,LONG
cpu1: 2MB 64b/line 8-way L2 cache
ioapic0 at mainbus0 apid 2 pa 0xfec00000, version 20, 24 pins
ioapic1 at mainbus0 apid 3 pa 0xfecc0000, version 20, 24 pins
ioapic2 at mainbus0 apid 4 pa 0xfecc0400, version 20, 24 pins
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 2 (PXHA)
acpiprt2 at acpi0: bus 3 (PXHB)
acpiprt3 at acpi0: bus 4 (PEX_)
acpiprt4 at acpi0: bus 7 (EXP1)
acpiprt5 at acpi0: bus 13 (EXP5)
acpiprt6 at acpi0: bus 15 (EXP6)
acpiprt7 at acpi0: bus 17 (PCIB)
acpicpu0 at acpi0: C3, PSS
acpicpu1 at acpi0: C3, PSS
acpibtn0 at acpi0: PWRB
acpivideo0 at acpi0: IGD0
ipmi at mainbus0 not configured
cpu0: Enhanced SpeedStep 2493 MHz: speeds: 2500, 2400, 2000, 1600, 1200
MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 3200/3210 Host" rev 0x01
ppb0 at pci0 dev 1 function 0 "Intel 3200/3210 PCIE" rev 0x01: apic 2 int
16 (irq 5)
pci1 at ppb0 bus 1
ppb1 at pci1 dev 0 function 0 "Intel PCIE-PCIE" rev 0x09
pci2 at ppb1 bus 2
"Intel IOxAPIC" rev 0x09 at pci1 dev 0 function 1 not configured
ppb2 at pci1 dev 0 function 2 "Intel PCIE-PCIE" rev 0x09
pci3 at ppb2 bus 3
"Intel IOxAPIC" rev 0x09 at pci1 dev 0 function 3 not configured
ppb3 at pci0 dev 6 function 0 "Intel 3210 PCIE" rev 0x01: apic 2 int 16
(irq 5)
pci4 at ppb3 bus 4
ppb4 at pci4 dev 0 function 0 "Intel IOP333 PCIE-PCIX" rev 0x00
pci5 at ppb4 bus 5
arc0 at pci5 dev 14 function 0 "Areca ARC-1220" rev 0x00: apic 2 int 18
(irq 11)
arc0: 8 ports, 256MB SDRAM, firmware V1.46 2009-01-06
scsibus0 at arc0: 16 targets
sd0 at scsibus0 targ 0 lun 0: SCSI3
0/direct fixed
sd0: 5722045MB, 512 bytes/sec, 11718749184 sec total
ppb5 at pci4 dev 0 function 2 "Intel IOP333 PCIE-PCIX" rev 0x00
pci6 at ppb5 bus 6
uhci0 at pci0 dev 26 function 0 "Intel 82801I USB" rev 0x02: apic 2 int 16
(irq 5)
uhci1 at pci0 dev 26 function 1 "Intel 82801I USB" rev 0x02: apic 2 int 17
(irq 10)
uhci2 at pci0 dev 26 function 2 "Intel 82801I USB" rev 0x02: apic 2 int 18
(irq 11)
ehci0 at pci0 dev 26 function 7 "Intel 82801I USB" rev 0x02: apic 2 int 18
(irq 11)
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
ppb6 at pci0 dev 28 function 0 "Intel 82801I PCIE" rev 0x02: apic 2 int 16
(irq 5)
pci7 at ppb6 bus 7
ppb7 at pci0 dev 28 function 4 "Intel 82801I PCIE" rev 0x02: apic 2 int 16
(irq 5)
pci8 at ppb7 bus 13
em0 at pci8 dev 0 function 0 "Intel PRO/1000MT (82573E)" rev 0x03: apic 2
int 16 (irq 5), address 00:30:48:ba:3e:00
ppb8 at pci0 dev 28 function 5 "Intel 82801I PCIE" rev 0x02: apic 2 int 17
(irq 11)
pci9 at ppb8 bus 15
em1 at pci9 dev 0 function 0 "Intel PRO/1000MT (82573L)" rev 0x00: apic 2
int 17 (irq 10), address 00:30:48:ba:3e:01
uhci3 at pci0 dev 29 function 0 "Intel 82801I USB" rev 0x02: apic 2 int 23
(irq 7)
uhci4 at pci0 dev 29 function 1 "Intel 82801I USB" rev 0x02: apic 2 int 22
(irq 10)
uhci5 at pci0 dev 29 function 2 "Intel 82801I USB" rev 0x02: apic 2 int 18
(irq 11)
ehci1 at pci0 dev 29 function 7 "Intel 82801I USB" rev 0x02: apic 2 int 23
(irq 7)
usb1 at ehci1: USB revision 2.0
uhub1 at usb1 "Intel EHCI root hub" rev 2.00/1.00 addr 1
ppb9 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0x92
pci10 at ppb9 bus 17
vga1 at pci10 dev 4 function 0 "ATI ES1000" rev 0x02
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
radeondrm0 at vga1: apic 2 int 22 (irq 10)
drm0 at radeondrm0
pcib0 at pci0 dev 31 function 0 "Intel 82801IR LPC" rev 0x02
ichiic0 at pci0 dev 31 function 3 "Intel 82801I SMBus" rev 0x02: apic 2
int 17 (irq 10)
iic0 at ichiic0
lm1 at iic0 addr 0x2d: W83627HF
wbng0 at iic0 addr 0x2f: w83793g
spdmem0 at iic0 addr 0x50: 2GB DDR2 SDRAM non-parity PC2-5300CL5
spdmem1 at iic0 addr 0x51: 2GB DDR2 SDRAM non-parity PC2-5300CL5
spdmem2 at iic0 addr 0x52: 2GB DDR2 SDRAM non-parity PC2-5300CL5
spdmem3 at iic0 addr 0x53: 2GB DDR2 SDRAM non-parity PC2-5300CL5
"Intel 82801I Thermal" rev 0x02 at pci0 dev 31 function 6 not configured
usb2 at uhci0: USB revision 1.0
uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb3 at uhci1: USB revision 1.0
uhub3 at usb3 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb4 at uhci2: USB revision 1.0
uhub4 at usb4 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb5 at uhci3: USB revision 1.0
uhub5 at usb5 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb6 at uhci4: USB revision 1.0
uhub6 at usb6 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb7 at uhci5: USB revision 1.0
uhub7 at usb7 "Intel UHCI root hub" rev 1.00/1.00 addr 1
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
midi0 at pcppi0:
spkr0 at pcppi0
wbsio0 at isa0 port 0x2e/2: W83627HF rev 0x41
lm2 at wbsio0 port 0x290/8: W83627HF
lm1 detached
mtrr: Pentium Pro MTRR support
umass0 at uhub1 port 4 configuration 1 interface 0 "JetFlash Mass Storage
Device" rev 2.00/1.00 addr 2
umass0: using SCSI over Bulk-Only
scsibus1 at umass0: 2 targets, initiator 0
sd1 at scsibus1 targ 1 lun 0: SCSI2
0/direct removable
sd1: 3830MB, 512 bytes/sec, 7843840 sec total
uhidev0 at uhub5 port 1 configuration 1 interface 0 "Dell Dell USB Entry
Keyboard" rev 1.10/1.78 addr 2
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 modifier keys, 6 key codes
wskbd1 at ukbd0 mux 1
wskbd1: connecting to wsdisplay0
softraid0 at root
root on sd1a swap on sd1b dump on sd1b
...that's odd, it's showing swap (and dump) on sd1b, but there's no such
thing:
$ sudo df /dev/sd1b
df: /dev/sd1b: Device not configured
...maybe it really doesn't like running without swap?
Oh wait, it's showing only 3G of memory installed. I just physically
checked the machine, and it has 4 full banks of 2G each. amd64 should be
able to address that, right?
That could certainly explain why fsck is unhappy.
Thanks,
- R.
--
[__ Robert Sheldon
[__ Founder, No Problem
[__ Information technology support and services
[__ Software and web design and development
[__ (530) 575-0278
[__ "You must be the change you wish to see in the world." -- Mahatma
Gandhi

Re: fsck segfault on a big partition, 4.6

by Otto Moerbeekon 2010-01-27T20:14:12+00:00.
On Wed, Jan 27, 2010 at 08:43:40PM +0000, Rob Sheldon wrote:
> On Wed, 27 Jan 2010 07:42:42 +0100, Otto Moerbeek wrote:
> > On Wed, Jan 27, 2010 at 12:38:47AM +0000, Rob Sheldon wrote:
> >
> >> There's no dmesg attached because I'm not on-site with the server at
> the
> >> moment, and because AFAICT this is a known problem.
> >
> > A pity, since it does matter what platform you run on. fsck needing a
> > lot of memory is indeed a known problem, but the SEGVs are not. You
> > might want to check if they still occur when you have enough swap.
>
> OK, I was able to visit for a few minutes today, enough to get the machine
> answering ssh again.
>
> First, disklabel so you know what it actually has:
>
> $ sudo disklabel sd1
> # /dev/rsd1c:
> type: SCSI
> disk: SCSI disk
> label: Transcend 4GB
> flags:
> bytes/sector: 512
> sectors/track: 63
> tracks/cylinder: 255
> sectors/cylinder: 16065
> cylinders: 488
> total sectors: 7843840
> rpm: 3600
> interleave: 1
> boundstart: 63
> boundend: 7839720
> drivedata: 0
>
> 16 partitions:
> # size offset fstype [fsize bsize cpg]
> a: 7839657 63 4.2BSD 2048 16384 1 # /
> c: 7843840 0 unused
>
> $ sudo disklabel sd0
> # /dev/rsd0c:
> type: SCSI
> disk: SCSI disk
> label: ARC-1220-VOL#00
> flags:
> bytes/sector: 512
> sectors/track: 63
> tracks/cylinder: 255
> sectors/cylinder: 16065
> cylinders: 729458
> total sectors: 11718749184
> rpm: 10000
> interleave: 1
> boundstart: 63
> boundend: 3128808178
> drivedata: 0
>
> 16 partitions:
> # size offset fstype [fsize bsize cpg]
> a: 11718749121 63 4.2BSD 2048 16384 1
> c: 11718749184 0 unused
>
> ...and the dmesg...
>
> $ dmesg
> OpenBSD 4.6 (GENERIC.MP) #81: Thu Jul 9 21:26:19 MDT 2009
> deraadt@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 3486973952 (3325MB)
> avail mem = 3370655744 (3214MB)
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xcfedf000 (39 entries)
> bios0: vendor Phoenix Technologies LTD version "1.2a" date 12/19/2008
> bios0: Supermicro X7SB4/E
> acpi0 at bios0: rev 2
> acpi0: tables DSDT FACP _MAR MCFG APIC BOOT SPCR ERST HEST BERT EINJ SLIC
> SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT
> acpi0: wakeup devices PXHA(S5) PXHB(S5) PEX_(S5) LAN_(S5) USB4(S5)
> USB5(S5) USB7(S5) ESB2(S5) EXP1(S5) EXP5(S5) EXP6(S5) USB1(S5) USB2(S5)
> USB3(S5) USB6(S5) ESB1(S5) PCIB(S5) KBC0(S1) MSE0(S1) COM1(S5) COM2(S5)
> PWRB(S3)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Pentium(R) Dual-Core CPU E5200 @ 2.50GHz, 2494.07 MHz
> cpu0:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,EST,TM2,CX16,xTPR,NXE,LONG
> cpu0: 2MB 64b/line 8-way L2 cache
> cpu0: apic clock running at 199MHz
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Pentium(R) Dual-Core CPU E5200 @ 2.50GHz, 2493.75 MHz
> cpu1:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,EST,TM2,CX16,xTPR,NXE,LONG
> cpu1: 2MB 64b/line 8-way L2 cache
> ioapic0 at mainbus0 apid 2 pa 0xfec00000, version 20, 24 pins
> ioapic1 at mainbus0 apid 3 pa 0xfecc0000, version 20, 24 pins
> ioapic2 at mainbus0 apid 4 pa 0xfecc0400, version 20, 24 pins
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 2 (PXHA)
> acpiprt2 at acpi0: bus 3 (PXHB)
> acpiprt3 at acpi0: bus 4 (PEX_)
> acpiprt4 at acpi0: bus 7 (EXP1)
> acpiprt5 at acpi0: bus 13 (EXP5)
> acpiprt6 at acpi0: bus 15 (EXP6)
> acpiprt7 at acpi0: bus 17 (PCIB)
> acpicpu0 at acpi0: C3, PSS
> acpicpu1 at acpi0: C3, PSS
> acpibtn0 at acpi0: PWRB
> acpivideo0 at acpi0: IGD0
> ipmi at mainbus0 not configured
> cpu0: Enhanced SpeedStep 2493 MHz: speeds: 2500, 2400, 2000, 1600, 1200
> MHz
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "Intel 3200/3210 Host" rev 0x01
> ppb0 at pci0 dev 1 function 0 "Intel 3200/3210 PCIE" rev 0x01: apic 2 int
> 16 (irq 5)
> pci1 at ppb0 bus 1
> ppb1 at pci1 dev 0 function 0 "Intel PCIE-PCIE" rev 0x09
> pci2 at ppb1 bus 2
> "Intel IOxAPIC" rev 0x09 at pci1 dev 0 function 1 not configured
> ppb2 at pci1 dev 0 function 2 "Intel PCIE-PCIE" rev 0x09
> pci3 at ppb2 bus 3
> "Intel IOxAPIC" rev 0x09 at pci1 dev 0 function 3 not configured
> ppb3 at pci0 dev 6 function 0 "Intel 3210 PCIE" rev 0x01: apic 2 int 16
> (irq 5)
> pci4 at ppb3 bus 4
> ppb4 at pci4 dev 0 function 0 "Intel IOP333 PCIE-PCIX" rev 0x00
> pci5 at ppb4 bus 5
> arc0 at pci5 dev 14 function 0 "Areca ARC-1220" rev 0x00: apic 2 int 18
> (irq 11)
> arc0: 8 ports, 256MB SDRAM, firmware V1.46 2009-01-06
> scsibus0 at arc0: 16 targets
> sd0 at scsibus0 targ 0 lun 0: SCSI3
> 0/direct fixed
> sd0: 5722045MB, 512 bytes/sec, 11718749184 sec total
> ppb5 at pci4 dev 0 function 2 "Intel IOP333 PCIE-PCIX" rev 0x00
> pci6 at ppb5 bus 6
> uhci0 at pci0 dev 26 function 0 "Intel 82801I USB" rev 0x02: apic 2 int 16
> (irq 5)
> uhci1 at pci0 dev 26 function 1 "Intel 82801I USB" rev 0x02: apic 2 int 17
> (irq 10)
> uhci2 at pci0 dev 26 function 2 "Intel 82801I USB" rev 0x02: apic 2 int 18
> (irq 11)
> ehci0 at pci0 dev 26 function 7 "Intel 82801I USB" rev 0x02: apic 2 int 18
> (irq 11)
> usb0 at ehci0: USB revision 2.0
> uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> ppb6 at pci0 dev 28 function 0 "Intel 82801I PCIE" rev 0x02: apic 2 int 16
> (irq 5)
> pci7 at ppb6 bus 7
> ppb7 at pci0 dev 28 function 4 "Intel 82801I PCIE" rev 0x02: apic 2 int 16
> (irq 5)
> pci8 at ppb7 bus 13
> em0 at pci8 dev 0 function 0 "Intel PRO/1000MT (82573E)" rev 0x03: apic 2
> int 16 (irq 5), address 00:30:48:ba:3e:00
> ppb8 at pci0 dev 28 function 5 "Intel 82801I PCIE" rev 0x02: apic 2 int 17
> (irq 11)
> pci9 at ppb8 bus 15
> em1 at pci9 dev 0 function 0 "Intel PRO/1000MT (82573L)" rev 0x00: apic 2
> int 17 (irq 10), address 00:30:48:ba:3e:01
> uhci3 at pci0 dev 29 function 0 "Intel 82801I USB" rev 0x02: apic 2 int 23
> (irq 7)
> uhci4 at pci0 dev 29 function 1 "Intel 82801I USB" rev 0x02: apic 2 int 22
> (irq 10)
> uhci5 at pci0 dev 29 function 2 "Intel 82801I USB" rev 0x02: apic 2 int 18
> (irq 11)
> ehci1 at pci0 dev 29 function 7 "Intel 82801I USB" rev 0x02: apic 2 int 23
> (irq 7)
> usb1 at ehci1: USB revision 2.0
> uhub1 at usb1 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> ppb9 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0x92
> pci10 at ppb9 bus 17
> vga1 at pci10 dev 4 function 0 "ATI ES1000" rev 0x02
> wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
> wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
> radeondrm0 at vga1: apic 2 int 22 (irq 10)
> drm0 at radeondrm0
> pcib0 at pci0 dev 31 function 0 "Intel 82801IR LPC" rev 0x02
> ichiic0 at pci0 dev 31 function 3 "Intel 82801I SMBus" rev 0x02: apic 2
> int 17 (irq 10)
> iic0 at ichiic0
> lm1 at iic0 addr 0x2d: W83627HF
> wbng0 at iic0 addr 0x2f: w83793g
> spdmem0 at iic0 addr 0x50: 2GB DDR2 SDRAM non-parity PC2-5300CL5
> spdmem1 at iic0 addr 0x51: 2GB DDR2 SDRAM non-parity PC2-5300CL5
> spdmem2 at iic0 addr 0x52: 2GB DDR2 SDRAM non-parity PC2-5300CL5
> spdmem3 at iic0 addr 0x53: 2GB DDR2 SDRAM non-parity PC2-5300CL5
> "Intel 82801I Thermal" rev 0x02 at pci0 dev 31 function 6 not configured
> usb2 at uhci0: USB revision 1.0
> uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> usb3 at uhci1: USB revision 1.0
> uhub3 at usb3 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> usb4 at uhci2: USB revision 1.0
> uhub4 at usb4 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> usb5 at uhci3: USB revision 1.0
> uhub5 at usb5 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> usb6 at uhci4: USB revision 1.0
> uhub6 at usb6 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> usb7 at uhci5: USB revision 1.0
> uhub7 at usb7 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> isa0 at pcib0
> isadma0 at isa0
> com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
> pckbc0 at isa0 port 0x60/5
> pckbd0 at pckbc0 (kbd slot)
> pckbc0: using irq 1 for kbd slot
> wskbd0 at pckbd0: console keyboard, using wsdisplay0
> pcppi0 at isa0 port 0x61
> midi0 at pcppi0:
> spkr0 at pcppi0
> wbsio0 at isa0 port 0x2e/2: W83627HF rev 0x41
> lm2 at wbsio0 port 0x290/8: W83627HF
> lm1 detached
> mtrr: Pentium Pro MTRR support
> umass0 at uhub1 port 4 configuration 1 interface 0 "JetFlash Mass Storage
> Device" rev 2.00/1.00 addr 2
> umass0: using SCSI over Bulk-Only
> scsibus1 at umass0: 2 targets, initiator 0
> sd1 at scsibus1 targ 1 lun 0: SCSI2
> 0/direct removable
> sd1: 3830MB, 512 bytes/sec, 7843840 sec total
> uhidev0 at uhub5 port 1 configuration 1 interface 0 "Dell Dell USB Entry
> Keyboard" rev 1.10/1.78 addr 2
> uhidev0: iclass 3/1
> ukbd0 at uhidev0: 8 modifier keys, 6 key codes
> wskbd1 at ukbd0 mux 1
> wskbd1: connecting to wsdisplay0
> softraid0 at root
> root on sd1a swap on sd1b dump on sd1b
>
> ...that's odd, it's showing swap (and dump) on sd1b, but there's no such
> thing:
>
> $ sudo df /dev/sd1b
> df: /dev/sd1b: Device not configured
>
> ...maybe it really doesn't like running without swap?
>
> Oh wait, it's showing only 3G of memory installed. I just physically
> checked the machine, and it has 4 full banks of 2G each. amd64 should be
> able to address that, right?
No, currently the amount of physical memory an amd64 can address is limited.
-Otto
>
> That could certainly explain why fsck is unhappy.
>
> Thanks,
>
> - R.
>
> --
> [__ Robert Sheldon
> [__ Founder, No Problem
> [__ Information technology support and services
> [__ Software and web design and development
> [__ (530) 575-0278
> [__ "You must be the change you wish to see in the world." -- Mahatma
> Gandhi

Re: fsck segfault on a big partition, 4.6

by Rob Sheldonon 2010-01-27T20:47:37+00:00.
On Wed, 27 Jan 2010 22:06:19 +0100, Otto Moerbeek wrote:
>
> No, currently the amount of physical memory an amd64 can address is
> limited.
Well, F___. :-(
The rule here then is, if you've got a partition bigger than 1TB, you
*must* have swap?
- R.
--
[__ Robert Sheldon
[__ Founder, No Problem
[__ Information technology support and services
[__ Software and web design and development
[__ (530) 575-0278
[__ "You must be the change you wish to see in the world." -- Mahatma
Gandhi

Re: fsck segfault on a big partition, 4.6

by Brad Tilleyon 2010-01-27T21:29:32+00:00.
On Wed, 27 Jan 2010 20:43 +0000, "Rob Sheldon" wrote:
[snip]
> softraid0 at root
> root on sd1a swap on sd1b dump on sd1b
>
> ...that's odd, it's showing swap (and dump) on sd1b, but there's no such
> thing:
>
> $ sudo df /dev/sd1b
> df: /dev/sd1b: Device not configured
>
> ...maybe it really doesn't like running without swap?
It's there. disklabel -vh sd1 and you'll see b is swap. Try swapctl as well... also dmesg | grep swap:
root on sd1a swap on sd1b dump on sd1b
^^^^^^^^^^^^
> Oh wait, it's showing only 3G of memory installed. I just physically
> checked the machine, and it has 4 full banks of 2G each. amd64 should be
> able to address that, right?
I think you would need a bigmem enabled kernel.

> That could certainly explain why fsck is unhappy.
>
> Thanks,
>
> - R.
>
> --
> [__ Robert Sheldon
> [__ Founder, No Problem
> [__ Information technology support and services
> [__ Software and web design and development
> [__ (530) 575-0278
> [__ "You must be the change you wish to see in the world." -- Mahatma
> Gandhi

Re: fsck segfault on a big partition, 4.6

by Stuart Hendersonon 2010-01-27T21:33:29+00:00.
On 2010-01-27, Rob Sheldon wrote:
> The longer version: this is a backup server running backuppc for a
> corporate client ("large enough number of workstations") that does research
> work ("some really big files"). I _thought_ I had read the big filesystem
> FAQ carefully, but somehow missed that fsck simply couldn't handle anything
> over 1TB without doing funny things during the fs setup.
"The default is to create an inode for each 8192 bytes of data space".
They aren't especially funny things; if you have a fairly large
filesystem with files most people would now call "medium" or larger,
you'll probably be rather surprised at the difference in fsck time
if you lower the inode density a bit...
If it's not essential data I don't think I'd waste time tryings
to fsck it. Force a read-only mount and copy any backuppc config
you need off first, disklabel, allocate some swap, consider
splitting into smaller chunks, and newfs with more appropriate
settings, you'll still have the main OS install on the other
partitions. Or, indeed, use a different OS if you prefer.

Re: fsck segfault on a big partition, 4.6

by Brad Tilleyon 2010-01-27T21:43:36+00:00.
Whoops... re-reading, I see that I missed your disklabel output... sorry.
On Wed, 27 Jan 2010 17:25 -0500, "Brad Tilley" wrote:
> On Wed, 27 Jan 2010 20:43 +0000, "Rob Sheldon"
> wrote:
>
> [snip]
>
> > softraid0 at root
> > root on sd1a swap on sd1b dump on sd1b
> >
> > ...that's odd, it's showing swap (and dump) on sd1b, but there's no such
> > thing:
> >
> > $ sudo df /dev/sd1b
> > df: /dev/sd1b: Device not configured
> >
> > ...maybe it really doesn't like running without swap?
>
> It's there. disklabel -vh sd1 and you'll see b is swap. Try swapctl as
> well... also dmesg | grep swap:
>
> root on sd1a swap on sd1b dump on sd1b
> ^^^^^^^^^^^^
>
> > Oh wait, it's showing only 3G of memory installed. I just physically
> > checked the machine, and it has 4 full banks of 2G each. amd64 should be
> > able to address that, right?
>
> I think you would need a bigmem enabled kernel.
>
> > That could certainly explain why fsck is unhappy.
> >
> > Thanks,
> >
> > - R.
> >
> > --
> > [__ Robert Sheldon
> > [__ Founder, No Problem
> > [__ Information technology support and services
> > [__ Software and web design and development
> > [__ (530) 575-0278
> > [__ "You must be the change you wish to see in the world." -- Mahatma
> > Gandhi

Re: fsck segfault on a big partition, 4.6

by nixlistson 2010-01-27T22:25:30+00:00.
On Wed, Jan 27, 2010 at 10:35 AM, Robert wrote:
> frantisek holop wrote:
>>
>> the kernel will kill random processes? are we talking about linux's OOM
>> here or openbsd? since when is this in openbsd? i seem to recall
>> some debate where openbsd devs found that idea ridiculous. i know i do,
>> and the machine should panic instead of starting shooting down processes.
>>
>> -f
>
> Am I missing something here?
> If the OS runs out of (any) memory then there is already a serious problem.
> In such a case I would prefer that the kernel kills some random
applications
> but protects itself, so that I can login on the console and check what's
> going on. It might even be possible to make a clean reboot (avoiding a long
> fsck).
> A kernel panic is IMHO the worst option.
Why kill random processes that may not be misbehaving and/or cause a
kernel panic when you want to kill the process(es) that leak memory or
are hungry in the first place? It's possible to avoid kernel panics in
this case IMO, and not kill random processes.
When starting daemons (and other stuff you suspect can be hungry), you
can use the shell's 'ulimit' to tell the kernel to kill the process
should it try to allocate more memory than you think it needs.
look up setrlimit(2)
The 'chpst' utility from the 'runit' package or 'softlimit' from
daemontools is more convenient for this purpose than the shell. Many,
if not most people run their daemons without memory limits though.

Re: fsck segfault on a big partition, 4.6

by Denis Doroshenkoon 2010-01-27T23:58:19+00:00.
On 1/28/10, nixlists wrote:
> Why kill random processes that may not be misbehaving and/or cause a
> kernel panic when you want to kill the process(es) that leak memory or
> are hungry in the first place? It's possible to avoid kernel panics in
> this case IMO, and not kill random processes.
aren't you missing the point of original comment made by Otto?
consider a situation, when all the processes in the system "are
behaving", none of them violates their rlimits, but they all together
have allocated more memory than the box contains (RAM + swap).
so the OS needs to do something. what should it do? should it just
panic? or may be losing one process is better than losing them all?
then, what are the criteria for choosing processes to be killed?..
wondering if "random" means the process with PID 1 could be one of them...

Re: fsck segfault on a big partition, 4.6

by Johan Beisseron 2010-01-28T00:23:31+00:00.
On Wed, Jan 27, 2010 at 4:53 PM, Denis Doroshenko
wrote:
> so the OS needs to do something. what should it do? should it just
> panic? or may be losing one process is better than losing them all?
> then, what are the criteria for choosing processes to be killed?..
>
> wondering if "random" means the process with PID 1 could be one of them...
The process killer in modern Linux 2.6 doesn't quite suffer the same
stupid as early versions. It doesn't mean I like it, but it's unlikely
to cause you nearly as much pain.
Back to your regularly scheduled OpenBSD fsck(8) discussion.

Re: fsck segfault on a big partition, 4.6

by nixlistson 2010-01-28T00:25:53+00:00.
On Wed, Jan 27, 2010 at 7:53 PM, Denis Doroshenko
wrote:
> On 1/28/10, nixlists wrote:
>> Why kill random processes that may not be misbehaving and/or cause a
>> kernel panic when you want to kill the process(es) that leak memory or
>> are hungry in the first place? It's possible to avoid kernel panics in
>> this case IMO, and not kill random processes.
>
> aren't you missing the point of original comment made by Otto?
>
> consider a situation, when all the processes in the system "are
> behaving", none of them violates their rlimits, but they all together
> have allocated more memory than the box contains (RAM + swap).
The idea is to limit memory such that running out of RAM+swap is not
possible, or unlikely. You can set the limit on the allowed number of
processes as well.
You know how much memory you have, you know how much memory to give to
your processes. You can set limits. IOW, you should tell the system
which processes to kill when they use too much, and how many processes
to run - not let the system reach the OOM state and start killing
random processes (and I think this is stupid) or panic.
> so the OS needs to do something. what should it do? should it just
> panic? or may be losing one process is better than losing them all?
> then, what are the criteria for choosing processes to be killed?..
Again, the configuration should be such that reaching the OOM state is
unlikely. If after all, this state is reached, I think letting the
kernel going berserk and kill random processes isn't helping much.
> wondering if "random" means the process with PID 1 could be one of them...

Re: fsck segfault on a big partition, 4.6

by bofhon 2010-01-28T01:28:43+00:00.
On Wed, Jan 27, 2010 at 8:14 PM, nixlists wrote:
> On Wed, Jan 27, 2010 at 7:53 PM, Denis Doroshenko
> wrote:
>> aren't you missing the point of original comment made by Otto?
>>
>> consider a situation, when all the processes in the system "are
>> behaving", none of them violates their rlimits, but they all together
>> have allocated more memory than the box contains (RAM + swap).
>
> The idea is to limit memory such that running out of RAM+swap is not
> possible, or unlikely. You can set the limit on the allowed number of
> processes as well.
$ ulimit -m
971876
$ dmesg | grep real\ mem
real mem = 1039691776 (991MB)
So... this box should run only one process?
$ ps -auxww|wc
54 713 4936
If I were to use the max memory usage of each process, I would need a
53Gig ram machine?
--
http://www.glumbert.com/media/shift
http://www.youtube.com/watch?v=tGvHNNOLnCk
"This officer's men seem to follow him merely out of idle curiosity."
-- Sandhurst officer cadet evaluation.
"Securing an environment of Windows platforms from abuse - external or
internal - is akin to trying to install sprinklers in a fireworks
factory where smoking on the job is permitted." -- Gene Spafford
learn french: http://www.youtube.com/watch?v=30v_g83VHK4

Re: fsck segfault on a big partition, 4.6

by Ted Unangston 2010-01-28T02:57:32+00:00.
Obviously, as any competent sysadmin like nixlists knows, you should
restrict all your processes to a max of 20 megs.
On Jan 27, 2010, at 9:23 PM, bofh wrote:
> On Wed, Jan 27, 2010 at 8:14 PM, nixlists wrote:
>> On Wed, Jan 27, 2010 at 7:53 PM, Denis Doroshenko
>> wrote:
>>> aren't you missing the point of original comment made by Otto?
>>>
>>> consider a situation, when all the processes in the system "are
>>> behaving", none of them violates their rlimits, but they all
>>> together
>>> have allocated more memory than the box contains (RAM + swap).
>>
>> The idea is to limit memory such that running out of RAM+swap is not
>> possible, or unlikely. You can set the limit on the allowed number of
>> processes as well.
>
>
> $ ulimit -m
> 971876
> $ dmesg | grep real\ mem
> real mem = 1039691776 (991MB)
>
> So... this box should run only one process?
>
> $ ps -auxww|wc
> 54 713 4936
>
> If I were to use the max memory usage of each process, I would need a
> 53Gig ram machine?
>
>
> --
> http://www.glumbert.com/media/shift
> http://www.youtube.com/watch?v=tGvHNNOLnCk
> "This officer's men seem to follow him merely out of idle curiosity."
> -- Sandhurst officer cadet evaluation.
> "Securing an environment of Windows platforms from abuse - external or
> internal - is akin to trying to install sprinklers in a fireworks
> factory where smoking on the job is permitted." -- Gene Spafford
> learn french: http://www.youtube.com/watch?v=30v_g83VHK4

Re: fsck segfault on a big partition, 4.6

by nixlistson 2010-01-28T04:46:19+00:00.
On Wed, Jan 27, 2010 at 9:23 PM, bofh wrote:
>> The idea is to limit memory such that running out of RAM+swap is not
>> possible, or unlikely. You can set the limit on the allowed number of
>> processes as well.
>
>
> $ ulimit -m
> 971876
> $ dmesg | grep real\ mem
> real mem = 1039691776 (991MB)
>
> So... this box should run only one process?
>
> $ ps -auxww|wc
> 54 713 4936
>
> If I were to use the max memory usage of each process, I would need a
> 53Gig ram machine?
Hmm seems like someone is playing dumb or trolling... Have you read
the man pages? Read setrlimit(2), read your shell's man page. Read the
login.conf man page.
$ man ksh:
[snip]
-d n Impose a size limit of n kilobytes on the size of the data
area.
-f n Impose a size limit of n blocks on files written by the
shell and its child processes (files of any size may be
read).
-H Set the hard limit only (the default is to set both hard
and soft limits).
-l n Impose a limit of n kilobytes on the amount of locked
(wired) physical memory.
-m n Impose a limit of n kilobytes on the amount of physical
memory used.
-n n Impose a limit of n file descriptors that can be open at
once.
-p n Impose a limit of n processes that can be run by the user
at any one time.
-S Set the soft limit only (the default is to set both hard
and soft limits).
-s n Impose a size limit of n kilobytes on the size of the
stack area.
-t n Impose a time limit of n CPU seconds spent in user mode
to
be used by each process.
[/snip]
I use 'chpst' from the runit package in my run scripts though.
$ man chpst
[snip]
-m bytes
limit memory. Limit the data segment, stack seg-
ment, locked physical pages, and total of all seg-
ment per process to bytes bytes each.
-d bytes
limit data segment. Limit the data segment per
process to bytes bytes.
-o n limit open files. Limit the number of open file
descriptors per process to n.
-p n limit processes. Limit the number of processes per
uid to n.
-f bytes
limit output size. Limit the output file size to
bytes bytes.
-c bytes
limit core size. Limit the core file size to bytes
bytes.
[/snip]
I just use '-m' with it.
An additional layer of protection from setrlimit() is great to have
even if your daemon limits itself. If there's a bug and a process
starts eating away at memory, it will be killed.
Services as run by 'runit' are supervised by 'runsv' so if a daemon
dies (for any reason, just think of some reasons) it will get
restarted in a second. With runit you can configure some services not
to get restarted, run a script when a service exits, etc, etc. More
features than 'daemontools', but daemontools-compatible.
smarden.org/runit

Re: fsck segfault on a big partition, 4.6

by Roberton 2010-01-28T05:36:41+00:00.
nixlists wrote:
> The idea is to limit memory such that running out of RAM+swap is not
> possible, or unlikely. You can set the limit on the allowed number of
> processes as well.
I do use ulimit / login.conf for some processes, but does anybody really
use it for *all possible* processes on each production machine?
Including the necessary research into what could be the max. memory they
*might* need in a spike situation?
I honestly doubt that...
So I think the "safe option" is so far to have enough physical RAM for
the usual workload (based on an estimate), and then add a generous swap
space for the worst cases.
Does this sound practical? Or am I running into other issues with a 20GB
swap?
regards,
Robert

Re: fsck segfault on a big partition, 4.6

by nixlistson 2010-01-28T07:44:55+00:00.
On Thu, Jan 28, 2010 at 1:24 AM, Robert wrote:
> nixlists wrote:
>>
>> The idea is to limit memory such that running out of RAM+swap is not
>> possible, or unlikely. You can set the limit on the allowed number of
>> processes as well.
>
> I do use ulimit / login.conf for some processes, but does anybody really use
> it for *all possible* processes on each production machine?
I set memory limits on most daemons. Especially on the 'net-connected
stuff for obvious reasons.
> Including the necessary research into what could be the max. memory they
> *might* need in a spike situation?
> I honestly doubt that...
Better estimate/guesstimate and limit some services than not at all.

Re: fsck segfault on a big partition, 4.6

by Kenneth R Westerbackon 2010-01-28T12:14:38+00:00.
On Wed, Jan 27, 2010 at 10:48:01PM -0500, Ted Unangst wrote:
> Obviously, as any competent sysadmin like nixlists knows, you should
> restrict all your processes to a max of 20 megs.
64KB is enough for anyone. Giving people more resources they may
misuse is just "stupid". And swap is doubly so since if you properly
tote up all resources for all tasks and combination of tasks you
may ever run and limit them appropriately you will never use swap,
thus it is just a waste of disk space. Plus if you run with disk
cache enabled you are probably losing the data anyway.
In fact this whole 'virtual' memory thing is a crock. You should just
know what all of your tasks will use and hard code that into the
source. Then they can always run at the same physical address and
life is much better.
And don't get me started on the sillyness of 'shared' libraries. Not
every task mix needs all those routines. So when we are statically
figuring out the physical memory locations for every combination of
tasks we will ever run we should determine exactly what routines
are needed and put them into each program. Preferably tailored to only
those situations we know the program will encounter. After all, those
error checks are for situations we will have thought about and avoided.
In short, this whole OS thing is a giant scam. And compilers ...
.... Ken
>
> On Jan 27, 2010, at 9:23 PM, bofh wrote:
>
> >On Wed, Jan 27, 2010 at 8:14 PM, nixlists wrote:
> >>On Wed, Jan 27, 2010 at 7:53 PM, Denis Doroshenko
> >> wrote:
> >>>aren't you missing the point of original comment made by Otto?
> >>>
> >>>consider a situation, when all the processes in the system "are
> >>>behaving", none of them violates their rlimits, but they all
> >>>together
> >>>have allocated more memory than the box contains (RAM + swap).
> >>
> >>The idea is to limit memory such that running out of RAM+swap is not
> >>possible, or unlikely. You can set the limit on the allowed number of
> >>processes as well.
> >
> >
> >$ ulimit -m
> >971876
> >$ dmesg | grep real\ mem
> >real mem = 1039691776 (991MB)
> >
> >So... this box should run only one process?
> >
> >$ ps -auxww|wc
> > 54 713 4936
> >
> >If I were to use the max memory usage of each process, I would need a
> >53Gig ram machine?
> >
> >
> >--
> >http://www.glumbert.com/media/shift
> >http://www.youtube.com/watch?v=tGvHNNOLnCk
> >"This officer's men seem to follow him merely out of idle curiosity."
> >-- Sandhurst officer cadet evaluation.
> >"Securing an environment of Windows platforms from abuse - external or
> >internal - is akin to trying to install sprinklers in a fireworks
> >factory where smoking on the job is permitted." -- Gene Spafford
> >learn french: http://www.youtube.com/watch?v=30v_g83VHK4