PDA

View Full Version : Argh! Will the endian madness ever...


SpaceTiger
02-11-2005, 11:16 AM
So I spent the past few hours trying just read this output file from a simulation that one of my colleagues had run. "Why would that take a few hours?", you may ask. Well, I'll tell you.

What I had was a 1.5 GB file that had been written in fortran (unknown version) and the following block of code (sent by email) for reference:

NPART = 512*512*512
NPART16 = NPART/16
WRITE(1)ZR
C
WRITE(1)(XV(1,N),N= 1, NPART16)
WRITE(1)(XV(1,N),N= NPART16+1, 2*NPART16)
WRITE(1)(XV(1,N),N= 2*NPART16+1, 3*NPART16)...

It went on from there in basically the same fashion as the last three lines. There were several problems with this, some of which didn't become apparent until later, but right away it was clear that I wasn't told what the data types were. This would mean a process of trial and error, the scope of which would depend on the set of data types available in fortran.

So, I did a little reading on fortran. It turns out that there are many data types (of varying sizes) that can be declared, as well as a variety of output formats. The block of code seemed to indicate that the data was "unformatted" (presumably to save space), but it didn't specify whether the data was stored in "sequential" or "direct" mode (this would have been given in the "open" statement). I tried all possible combinations of data types and io modes, but nothing seemed to be giving me understandable results.

Eventually, I resorted to a byte-by-byte analysis of the file. First, I calculated the exact number of bytes I expected the output file to have based on the snippet of code (including 8 pad bytes appended by fortran for each call of WRITE). After some temporary confusion resulting from the ridiculous definition of "kilobyte" (1,024 bytes instead of the reasonable 1,000), I was able to match the file size to the data types. The result came out exactly right if all variables were 4-byte data types -- reals or integers, presumably.

Despite seemingly knowing the exact format of the file, however, I still couldn't get reasonable results. I was about to give up when I remembered something my roommate once told me about byte organization. A google search revealed that some machines read and write multi-byte data in the "big endian" system and some in the "little endian" system. Most PCs are little endian, while mainframes are generally big endian. The data had been written by a cluster, so I figured there was some chance I'd have to convert.

How did I do this? Well, it turns out that some fortran compilers give you the option of converting between the two systems (at compile time) for input and output. Of course, the one I was using didn't have that option, so I had to scour the documentation on our network for alternative compilers. I finally found one called "ifort" (i for Intel) and I set the relevant flag. Finally, my data came out!

What a nuisance. I understand that my advisors expect a little ingenuity from me, but this is ridiculous! A little more information would have been nice. <img src=smilies/banghead.gif>

Bah, well, at least I learned a thing or two.

<P ID="signature">----
"And dreams may come
That are everlasting
Though all just plastic too..." </P>

Isildur
02-11-2005, 06:57 PM
> After some temporary confusion resulting
> from the ridiculous definition of "kilobyte" (1,024 bytes
> instead of the reasonable 1,000)

While defining a kilobyte as 1,000 bytes might make more sense etymologically speaking, it would make very little sense from a programming perspective, since that isn't a round number in binary or hexadecimal.

<P ID="signature"><center>
<a href=http://1001insomniacnights.com><img src=http://pages.nyu.edu/~jc73/misc/1k1IN.gif border=0>
1k1IN:</a><font color=#903030> A Dark Comedy About 2 Roomates</font></center></P>

SwampGas
02-11-2005, 07:01 PM
> While defining a kilobyte as 1,000 bytes might make more
> sense etymologically speaking, it would make very little
> sense from a programming perspective, since that isn't a
> round number in binary or hexadecimal.

Computer values are calculated by 2^N. 2^10 = 1024, not 1000.

<P ID="signature"><marquee direction=right scrollamount=10>http://www.zophar.net/personal/swampgas/hsrun.gif</marquee></P>

Isildur
02-11-2005, 08:19 PM
> > While defining a kilobyte as 1,000 bytes might make more
> > sense etymologically speaking, it would make very little
> > sense from a programming perspective, since that isn't a
> > round number in binary or hexadecimal.
>
> Computer values are calculated by 2^N. 2^10 = 1024, not
> 1000.
>

I know that, obviously. Did you even read my post? I was pointing out that 1000 = 0x3E8 = %1111101000

...as opposed to 1024 = 0x400 = %10000000000


Edit: Oh, wait, I guess you meant to reply to SpaceTiger.

<P ID="signature"><center>
<a href=http://1001insomniacnights.com><img src=http://pages.nyu.edu/~jc73/misc/1k1IN.gif border=0>
1k1IN:</a><font color=#903030> A Dark Comedy About 2 Roomates</font></center></P><P ID="edit"><FONT class="small">Edited by Isildur on 02/11/05 04:20 PM.</FONT></P>

SpaceTiger
02-11-2005, 11:49 PM
> While defining a kilobyte as 1,000 bytes might make more
> sense etymologically speaking, it would make very little
> sense from a programming perspective, since that isn't a
> round number in binary or hexadecimal.

But the etymological problem was exactly the one I was talking about. The prefix "kilo" is supposed to imply "1000", not "1024". If they wanted to define it another way, they should have used a different prefix, especially since I've seen both definitions used as standard.

<P ID="signature">----
"And dreams may come
That are everlasting
Though all just plastic too..." </P>

Ugly Joe
02-11-2005, 11:54 PM
> If they wanted to define it another
> way, they should have used a different prefix, especially
> since I've seen both definitions used as standard.

http://mathworld.wolfram.com/Kibibyte.htmlThere is</a>. I've only seen the term used in one program, though (a BT client).

<P ID="signature"></P>

SpaceTiger
02-12-2005, 12:10 AM
> There is. I've only seen the term used in one program,
> though (a BT client).

Nice, I'm officially using this term from now on. <img src=smilies/magbiggrin.gif>

<P ID="signature">----
"And dreams may come
That are everlasting
Though all just plastic too..." </P>

MegaManJuno
02-12-2005, 02:26 AM
While I agree that there is a good place for this, it's too bad they came up with a prefix that makes it sound like baby-talk when you speak it.

"Oh.. wook at all de wittle kibibytes..." <img src=smilies/puke.gif>

<P ID="signature"></P>

Reaper man
02-12-2005, 04:01 AM
to clear everything up:

a kilobyte is 1024 when you're talking about data stored on a computer (ie 345KB = 353,280 bytes)
a kilobyte is 1000 when you're talking about network speed/bandwidth, but it's usually measured in kilobits
(IE 56KBs = 56,000Bs = 448Kbs = 448,000bs)

<P ID="signature"><center>
sig not found...</center></P>

SpaceTiger
02-12-2005, 04:09 AM
> a kilobyte is 1024 when you're talking about data stored on
> a computer (ie 345KB = 353,280 bytes)
> a kilobyte is 1000 when you're talking about network
> speed/bandwidth, but it's usually measured in kilobits

Eh, these are good rules of thumb, but there are definitely exceptions. I found http://www.t1shopper.com/tools/calculate/this link</a> to be quite informative.

<P ID="signature">----
"And dreams may come
That are everlasting
Though all just plastic too..." </P>

hcs
02-12-2005, 05:27 AM
> a kilobyte is 1024 when you're talking about data stored on
> a computer (ie 345KB = 353,280 bytes)

The materials for my hard drive claim that the hard drive industry uses powers of 10 to describe their hard drive sizes.

<P ID="signature">-http://hcs.freeshell.org/index.cgihcs</a>
http://www.eden.rutgers.edu/~agashlin/nowplaying.php</P>

mFC
02-12-2005, 05:29 AM
a kilobyte is never referred to 1000 in any computer context as sam pointed out... 2^10 is a kilobyte, 2^20 is a megabyte, etc. it doesnt add up to much at first, but when youre talking about gigabytes, theres a 73741824 byte difference.
and with bits to bytes, all you have to do is divide by eight.
56Kbit/s = 7KB/s, of course not counting protocol overhead or whatever.
not to sound rude, but im surprised everyone doesnt know this... its kind of vital in terms of programming

<P ID="signature">Chris

/personal/mfc/newsig.png</P>

mFC
02-12-2005, 05:32 AM
thats a marketing ploy though, always has been. lots of websites have subscripts saying that theyre using powers of 10 instead of proper notation

<P ID="signature">Chris

/personal/mfc/newsig.png</P>

SpaceTiger
02-12-2005, 05:37 AM
> not to sound rude, but im surprised everyone doesnt know
> this... its kind of vital in terms of programming

Meh, I've written hundreds or thousands of codes and this is the first time it even mattered.

<P ID="signature">----
"And dreams may come
That are everlasting
Though all just plastic too..." </P>