RE: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



All,
Sorry to come in late and top post. I've been out of the office and I'm
trying to get to the gist of this issue.
Exactly what is the problem? I'm not familiar with kdump so I don't have
a clue about what's going on. 
There are a couple of reset features supported by _some_ cciss
controllers. I'd have to go back to the open spec to see whats in the
public domain. We're trying to get the open spec updated and more
complete but we're waiting on the lawyers. :(

mikem

> -----Original Message-----
> From: Eric W. Biederman [mailto:[email protected]] 
> Sent: Monday, June 26, 2006 11:01 AM
> To: [email protected]
> Cc: Maneesh Soni; Andrew Morton; [email protected]; 
> [email protected]; Miller, Mike (OS Dev); 
> [email protected]; [email protected]
> Subject: Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver 
> initialization issue fix
> 
> Vivek Goyal <[email protected]> writes:
> 
> > On Mon, Jun 26, 2006 at 08:17:27AM -0600, Eric W. Biederman wrote:
> >> Vivek Goyal <[email protected]> writes:
> >> 
> >> > On Mon, Jun 26, 2006 at 07:41:00AM +0530, Maneesh Soni wrote:
> >> >
> >> > Maneesh, Keeping this code under a config option becomes 
> a problem 
> >> > when we will have a relocatable kernel. At some point of time we 
> >> > got to have relocatable kernel so that people don't have 
> to build 
> >> > two kernels. In fact this is becoming a pain area for distros. 
> >> > That's the reason I thought of making it a command line 
> parameter.
> >> 
> >> Ok. Even if we do this with a command line, we need to 
> have a clean concept.
> >> If the concept is ignore devices with a brittle init 
> routine that is
> > comprehensible
> >> and potentially useful for other reasons than crash dumps.
> >> 
> >
> > Looks like there are two problems to be solved.
> >
> > - Framework/capability to mark and isolate the drivers, 
> either at compile
> >   time or run time, which are not hardened enough to 
> initialize properly
> >   when the underlying device is in operational or in unknown state.
> >
> > - Actually hardening a driver to be able to initialize in a 
> potentially
> >   unreliable environment.  
> >
> >
> > Solving first problem will help more in terms of people knowing in 
> > advance that certain drivers are known to have problems in specific 
> > environemnt and a user has got the option of skipping the 
> > execution/compilation of those drivers. (This is something close to 
> > what CONFIG_EXPERIMENTAL does)
> >
> > Second problem deals more with actually hardening the 
> driver and not 
> > skipping its compilation/execution.
> >
> > I think people would like to change a driver's behaviour at 
> run time.
> > For example if they are booting in a unreliable environment 
> they would 
> > like to reset the device otherwise they would skip that as BIOS has 
> > already done that for them.
> 
> In the general case the device reset does not hurt.  Yes 
> there is the case of the slow scsi probe.  But a lot of that 
> appears to be a poor implementation of the scsi probe.  So I 
> can see a kernel command line option to play fast and loose 
> but we should be safe and thorough by default.
> 
> The more code paths you introduce the harder code is to 
> maintain and test.  The earlier discussion suggested you 
> cannot harden some drivers.  We can take action against 
> drivers like that simply and easily.
> 
> Hacks in the driver initialization are a completely different story.
> 
> > But looks like not all devices have got the capablity to be 
> reset from 
> > software. In those cases probably one need to put some hooks, relax 
> > driver's consistency checks etc in special boot environment.
> 
> Forget the concept of a special boot environment.  A buggy 
> BIOS or rebooting after being in windows can potentially have 
> the same effect as a kdump, environment.
> 
> > Here I am trying to solve the second problem so that a 
> driver comes to 
> > know that it is initializing in a special boot environment 
> and it can 
> > modify its behavior at run time.
> 
> As Andrew said that encourages hacks.  
> For the specific megaraid example it would be simple enough 
> to always ignore the condition and just print a warning.
> 
> There is no such thing as a special boot environment there 
> are only quality of implementation differences.  And in a 
> kexec on a panic scenario the quality of implementation is terrible.
> 
> >> If the concept is crashdump it is a poorly defined concept 
> and all of 
> >> Andrews objections apply.
> >> 
> >
> > I think this parameter is generic enough and not limited to 
> crashdumps.
> > If a user decides to implement a different scheme than 
> kdump in kexec 
> > on panic and boot a customized kernel, he can very well use this 
> > parameter to make sure that next kernel is able to at least 
> boot and 
> > not panic in between.
> 
> The name crashboot is certainly not generic enough to make it 
> clear what it means or to make it sound interesting outside 
> of a crashdump scenario.
> 
> > Solving first problem will help in doing a plain kexec. We 
> can simply 
> > mark the drivers known to have problems and slowly people can fix 
> > those drivers. Fixing the driver in this case is different because 
> > most likely driver authors will provide a shutdown routine in the 
> > driver so that device can be shutdown and then one can boot 
> into the second kernel.
> > Till then a user can happily skip the drivers known to have 
> problems.
> >
> > In summary, we got two problems to solve. Currently I am 
> focussed on 
> > solving second problem which enables boot a kernel in an unreliable 
> > environment and do some minimal specific operation and then 
> boot back 
> > to regular kernela. So I think just introducing a command line 
> > parameter which drivers can use to determine that they are 
> > initializing in an special environement, solves it and is 
> generic enough.
> >
> > Options like COFIG_BRITTLE_INIT or sikkping execution of brittle 
> > driver based on a command line option seems to be the 
> solution for the 
> > first problem.
> 
> Among other things it is social engineering to solve the 
> first problem.
> 
> > Please correct me if I am wrong. I know little about drivers.
> >
> >
> >> > I remember few months back, Eric had mentioned that he has got 
> >> > patches for relocatable kernel ready for review for i386 and 
> >> > x86_64. Eric, do you have any plans to post the patches 
> for review?
> >> 
> >> I have some code that I keep intending to get to.  It has probably 
> >> bit rotted since I wrote it, but it shouldn't be too bad 
> to clean up.
> >> Unfortunately the whole crashdump thing is fairly low on 
> my priority list.
> >> 
> >
> > I am willing to work on it. Building from scratch always 
> takes more time.
> > If you are willing, I will more than happy to build on top of your 
> > patches.
> 
> I will see what I can dig up.
> 
> Eric
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux