On Friday 26 May 2006 01:13, Mike Christie wrote:
> Bryan Holty wrote:
> > On Tuesday 21 March 2006 14:48, Bryan Holty wrote:
> >> On Tuesday 21 March 2006 13:17, Mike Christie wrote:
> >>> Bryan Holty wrote:
> >>>> On Tuesday 21 March 2006 10:19, Dan Aloni wrote:
> >>>>> On Tue, Mar 21, 2006 at 09:54:54AM -0600, James Bottomley wrote:
> >>>>>> This is a good email to discuss on the scsi list:
> >>>>>> [email protected]; whom I've added to the cc list.
> >>>>>>
> >>>>>> On Tue, 2006-03-21 at 10:38 +0200, Dan Aloni wrote:
> >>>>>>> Improper calculation of the number of pages causes bio_alloc() to
> >>>>>>> be called with nr_iovecs=0, and slab corruption later.
> >>>>>>>
> >>>>>>> For example, a simple scatterlist that fails: {(3644,452), (0,
> >>>>>>> 60)}, (offset, size). bufflen=512 => nr_pages=1 => breakage. The
> >>>>>>> proper page count for this example is 2.
> >>>>>>
> >>>>>> Such a scatterlist would likely violate the device's underlying
> >>>>>> boundaries and is not legal ... there's supposed to be special code
> >>>>>> checking the queue alignment and copying the bio to an aligned
> >>>>>> buffer if the limits are violated. Where are you generating these
> >>>>>> scatterlists from?
> >>>>>
> >>>>> These scatterlists can be generated using the sg driver. Though I am
> >>>>> actually running a customized version of the sg driver, it seems the
> >>>>> conversion from a userspace array of sg_iovec_t to scatterlist stays
> >>>>> the same and also applies to the original driver (see
> >>>>> st_map_user_pages()).
> >>>>
> >>>> Hello,
> >>>> I am seeing the same issue when using direct io with sg. sg will
> >>>> perform direct io on any date that is aligned with the devices
> >>>> dma_align. The default for drivers that do not specify is 512. sg
> >>>> builds the scatter gather list from the user specified location,
> >>>> offsetting the first entry in the list if not page aligned. This is
> >>>> the case that causes the improper allocation of "nr_iovec" in
> >>>> scsi_req_map_sgand the later slab corruption.
> >>>>
> >>>> I don't think it is necessary to calculate nr_pages from the
> >>>> entire list. Only sgl[0] is allowed to have an offset, so we can
> >>>> calculate from that as follows.
> >>>> --- a/drivers/scsi/scsi_lib.c 2006-03-03 13:17:22.000000000 -0600
> >>>> +++ b/drivers/scsi/scsi_lib.c 2006-03-21 11:36:39.389763804 -0600
> >>>> @@ -368,12 +368,15 @@
> >>>> int nsegs, unsigned bufflen, gfp_t gfp)
> >>>> {
> >>>> struct request_queue *q = rq->q;
> >>>> - int nr_pages = (bufflen + PAGE_SIZE - 1) >> PAGE_SHIFT;
> >>>> + int nr_pages = 0;
> >>>> unsigned int data_len = 0, len, bytes, off;
> >>>> struct page *page;
> >>>> struct bio *bio = NULL;
> >>>> int i, err, nr_vecs = 0;
> >>>> -
> >>>> +
> >>>> + if (nsegs)
> >>>
> >>> you can drop that test
> >>>
> >>>> + nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >>
> >>>> PAGE_SHIFT; +
> >>>
> >>> I think we can do this without looping but I think this is broken. If
> >>> we had a slight variant of Dan's example but we have a page and some
> >>> change in the first entry {3644, 4548} and 0,60} in the last one, that
> >>> would would only calculate two pages but we want three.
> >>>
> >>> I think we can have to calculate the first and last entries but the
> >>> middle ones we can assume have no offset and lengths that are multiples
> >>> of a page.
> >>
> >> You are absolutely correct.
> >> bufflen is not page masked and when added to the offset of the first
> >> entry, will generate the correct nr_pages.
> >>
> >> nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
> >> ==
> >> nr_pages = ((bufflen & PAGE_MASK) + (PAGE_SIZE-1) + sgl[0].offset +
> >> sgl[nsegs-1].length) >> PAGE_SHIFT;
> >>
> >> Either will calculate correctly.
> >>
> >> --
> >> Bryan Holty
> >> -
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> >> in the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at http://www.tux.org/lkml/
> >
> > Based on above, I think the most intuitive fix would be the offset
> > addition of the first entry to the initialization of nr_pages.
> >
> > Without this change, for instance, with 4K io's every sg io that is
> > dma_aligned for direct io, but not page aligned will cause slab
> > corruption and an oops
> >
> > I am able to run a number of tests with sg that cause the boundary to be
> > crossed, and with this fix there is no slab corruption or data
> > corruption.
> >
> > Thanks Dan, I had been hunting for this for a couple of days!!
> >
> > Thoughts??
> >
> > Signed-off-by: Bryan Holty <[email protected]>
> >
> > --- a/drivers/scsi/scsi_lib.c 2006-03-03 13:17:22.000000000 -0600
> > +++ b/drivers/scsi/scsi_lib.c 2006-03-22 06:09:09.669599539 -0600
> > @@ -368,7 +368,7 @@
> > int nsegs, unsigned bufflen, gfp_t gfp)
> > {
> > struct request_queue *q = rq->q;
> > - int nr_pages = (bufflen + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > + int nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > unsigned int data_len = 0, len, bytes, off;
> > struct page *page;
> > struct bio *bio = NULL;
> > --
>
> Was this patch ok? I was looking at some other problem and noticed this
> was not merged. The original calculation that I did in the code is
> wrong. I thought the calculation in this patch was correct, but I have
> showed that my math skills are lacking so I may be wrong.
Yes, this patch fixes the problem. We have been running with it since it was
originally posted without problem. We crash near immediately without the
patch in place. We are using sg with direct_io heavily.
--
Thanks,
Bryan Holty
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]