Re: AMD64 Machine hardlocks when using memset

Paul Jackson wrote:

The x86_64 memset(), both in user space and the kernel, for whatever gcc
I have, and for a current kernel, uses the "repz stos" or "rep stosq"
prefixed instruction for the bulk of the copy.  This combination is a
long running, interruptible Intel string instruction that loops on
itself until the CX register decrements to zero.

Was your windows app using "stos"?

I'll wager a nickel that the actual crash you see comes when the
processor has to handle an interrupt while in the middle of this
instruction.

I'll wager a dime it's hardware, though interrupt activity may be
required to provoke it.

I ended up making a test program which essentially did the same thingexcept not using memset (just moving an int* up repeatedly and settingthe value there to 0). That worked fine on both Windows and Linux. Ithen tried such a program using a long* compiled as 64-bit on Linux,that also worked fine. It seems like I can only reproduce it when memsetis actually used..

I don't remember exactly what the Windows memset was using, that was onmy work machine - it was inline assembly though, and I do know that ithad only one instruction for the whole set, so it was likely "repz stos"or something similar to that.

As it turns out, the memset in my version of glibc x86_64 is not usingsuch a string instruction though - it seems to be using two differentsets of instructions depending on the size of the memset (not sureexactly how they're calculating the threshold between these..) For sizesbelow the treshold, this is the inner loop - it's using normal movinstructions:


3:	/* Copy 64 bytes.  */
	mov	%r8,(%rcx)
	mov	%r8,0x8(%rcx)
	mov	%r8,0x10(%rcx)
	mov	%r8,0x18(%rcx)
	mov	%r8,0x20(%rcx)
	mov	%r8,0x28(%rcx)
	mov	%r8,0x30(%rcx)
	mov	%r8,0x38(%rcx)
	add	$0x40,%rcx
	dec	%rax
	jne	3b

For sizes above the threshold though, this is the inner loop. It's usingmovnti which is an SSE cache-bypasssing store:


11:	/* Copy 64 bytes without polluting the cache.  */
	/* We could use	movntdq    %xmm0,(%rcx) here to further
	   speed up for large cases but let's not use XMM registers.  */
	movnti	%r8,(%rcx)
	movnti  %r8,0x8(%rcx)
	movnti  %r8,0x10(%rcx)
	movnti  %r8,0x18(%rcx)
	movnti  %r8,0x20(%rcx)
	movnti  %r8,0x28(%rcx)
	movnti  %r8,0x30(%rcx)
	movnti  %r8,0x38(%rcx)
	add	$0x40,%rcx
	dec	%rax
	jne	11b

I'm wondering if one does a ton of these cache-bypassing stores whethersomething gets hosed because of that. Not sure what that could bethough. I don't imagine the chipset is involved with any of that on theAthlon 64 - either the CPU or RAM seems the most likely suspect to me


--
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: AMD64 Machine hardlocks when using memset
  - From: Denis Vlasenko <[email protected]>
- Re: AMD64 Machine hardlocks when using memset
  - From: Alan Cox <[email protected]>

References:
- Re: AMD64 Machine hardlocks when using memset
  - From: Robert Hancock <[email protected]>
- Re: AMD64 Machine hardlocks when using memset
  - From: Paul Jackson <[email protected]>

Prev by Date: Re: 64bit build of tulip driver
Next by Date: [PATCH] ppc: eliminate gcc warning in prom.c
Previous by thread: Re: AMD64 Machine hardlocks when using memset
Next by thread: Re: AMD64 Machine hardlocks when using memset
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]