Re: sg regression in 2.6.16-rc5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 1 Mar 2006, Douglas Gilbert wrote:

> Linus Torvalds wrote:
> > 
> > On Wed, 1 Mar 2006, Matthias Andree wrote:
> > 
> >>On Tue, 28 Feb 2006, Douglas Gilbert wrote:
> >>
> >>
> >>>You can stop right there with the 1 MB reads. Welcome
> >>>to the new, blander sg driver which now shares many
> >>>size shortcomings with the block subsystem.
> >>
> >>What is the reason to break user-space applications like this?
> > 
> > 
> > Did you read the whole thread? It was a low-level SCSI driver issue, where 
> > nothing broke user space, but the command was just fed to the drive 
> > differently, which then hit a limit in the driver.
> 
> Linus,
> That is an optimistic take. The maximum data carrying
> capacity of a single SCSI command via the SG_IO ioctl
> depends on the maximum data carrying capacity of a
> scatter gather list. Assuming all scatter gather list
> elements carry the same amount of data then the
> maximum capacity is:
> 'max_bytes_per_element * max_num_elements'
> 
> Only the latter figure is a "low-level SCSI driver issue"
> whose maximum seems to be SG_ALL (255). It is the former
> figure that has changed. The sg driver in lk 2.6.15 used
> __get_free_pages() with the order set to get 32 KB where
> as the generic routine used now get a single page (usually
> 4 KB). Kai Makisara proposed changes in the SCSI LLD
> template that made things better in my experiments with
> scsi_debug.
> 
> However today James Bottomley confirmed that relying on
> coalescing pages that may be adjacent is not deterministic:
> http://marc.theaimsgroup.com/?l=linux-scsi&m=114122991606658&w=2
> 
If this is James's final opininion, then the changes to st for 2.6.16 
to use scsi_execute_async _must be reverted_ before final 2.6.16. They 
cause user-visible regression.

I don't think relying on coalescing in this case is non-deterministic: we 
know that enough pages are adjacent to coalesce. This is because the pages 
have been split into bios from larger kernel space buffers. (Conceptually 
I don't like this splitting and re-merging but it works provided the HBA 
parameters are good.)

I am a little frustrated with this whole thing. Several people have talked 
about switching st to use the block layer. Mike Christie finally did the 
work and the details were discussed on linux-scsi. I thought that 
everybody agreed on the details and I tested that the code works for st. 
Now it seems that there was no agreement!

> That leaves a worst case scatter gather list data capacity
> of (4 * 255) KB if the SCSI LLD (or SATA) uses SG_ALL. That
> is still just under the 1 MB bar that started this thread.
> 
This is not acceptable.

> So I guess we might find out how many people do big,
> single SCSI command data, transfers when lk 2.6.16 comes
> out.
> 
Learn the hard way ;-( I know that there are applications that read and 
write large tape blocks but I think they will hit the problems much later. 
These are production systems that are probably not updated frequently. 
When they find out this, they probably have to move away from Linux.

I have talked about st but I think the same arguments apply mostly to sg. 

-- 
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux