On 10Dec2010 14:28, stan <gryt2@xxxxx> wrote: | On Fri, 10 Dec 2010 03:11:25 +0000 (UTC) | "Amadeus W.M." <amadeus84@xxxxxxxxxxx> wrote: | > I have a binary file with data. Each block of 48 bytes is a record. I | > want to extract the first 8 bytes within each record. I'm thinking | > this should be possible with dd, but gawk, perl - anything goes. It | > just has to be fast, because the data files are ~ 1Gb. | > | > I can do this in C++ but I was just wondering if it can be done with | > existing well tested tools. | | The binary aspect makes it tricky. If they were EOL delimited records, | lots of tools could do this. | | Here's a python function, not checked though. It does require that you | have enough memory to slurp the file into memory. Put it in a file, | edit for the filenames, and run it as python <filename>. I guess it | should take less than a minute, but not sure, should be fine for one | off. | | def extract (filename1 = None, filename2 = None): | if filename1 != None and filename2 != None: I'd not bother with this check - it is a special purpose function that will not be misused, and if is _is_ misused it will fail silently, which is not good. | infile = open (filename1, "rb") | slurp = infile.read () # at least as much memory as the file size | infile.close () | outfile = open (filename2, "wb") | while len (slurp) > 0: | record = slurp [:48] # extract a record | first8 = record [:8] # slice off first 8 positions | outfile.write (first8) # write them out, no separator | slurp = slurp [48:] # chop them off the file This step is Very Expensive. Don't reallocate a 1GB string every 48 bytes, just pull out the pieces you need. | outfile.close () | | extract (filename1 = "your input filename with path", | filename2 = "your output filename with path") Untested example: def get8of48(fp): while True: chunk = fp.read(48) if len(chunk) == 0: break yield chunk[:8] if (len(chunk) != 48: print >>sys.stderr, "warning: short read from %s (%d bytes)" % (fp, len(chunk)) for chunk8 in get8of48(open("your filename here", "rb")): ... do something with chunk8, the 8-byte chunk ... Shorter and faster and using less memory. Cheers, -- Cameron Simpson <cs@xxxxxxxxxx> DoD#743 http://www.cskk.ezoshosting.com/cs/ The general consensus on covered [litter] boxes is that they are a Good Thing, and having bought one myself now and tried it out for a couple of weeks, I agree. No more litter sprayed halfway across the city. - krw@xxxxxxxxxxxx (Kenneth Wood) -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines