SpaceTiger
02-11-2005, 11:16 AM
So I spent the past few hours trying just read this output file from a simulation that one of my colleagues had run. "Why would that take a few hours?", you may ask. Well, I'll tell you.
What I had was a 1.5 GB file that had been written in fortran (unknown version) and the following block of code (sent by email) for reference:
NPART = 512*512*512
NPART16 = NPART/16
WRITE(1)ZR
C
WRITE(1)(XV(1,N),N= 1, NPART16)
WRITE(1)(XV(1,N),N= NPART16+1, 2*NPART16)
WRITE(1)(XV(1,N),N= 2*NPART16+1, 3*NPART16)...
It went on from there in basically the same fashion as the last three lines. There were several problems with this, some of which didn't become apparent until later, but right away it was clear that I wasn't told what the data types were. This would mean a process of trial and error, the scope of which would depend on the set of data types available in fortran.
So, I did a little reading on fortran. It turns out that there are many data types (of varying sizes) that can be declared, as well as a variety of output formats. The block of code seemed to indicate that the data was "unformatted" (presumably to save space), but it didn't specify whether the data was stored in "sequential" or "direct" mode (this would have been given in the "open" statement). I tried all possible combinations of data types and io modes, but nothing seemed to be giving me understandable results.
Eventually, I resorted to a byte-by-byte analysis of the file. First, I calculated the exact number of bytes I expected the output file to have based on the snippet of code (including 8 pad bytes appended by fortran for each call of WRITE). After some temporary confusion resulting from the ridiculous definition of "kilobyte" (1,024 bytes instead of the reasonable 1,000), I was able to match the file size to the data types. The result came out exactly right if all variables were 4-byte data types -- reals or integers, presumably.
Despite seemingly knowing the exact format of the file, however, I still couldn't get reasonable results. I was about to give up when I remembered something my roommate once told me about byte organization. A google search revealed that some machines read and write multi-byte data in the "big endian" system and some in the "little endian" system. Most PCs are little endian, while mainframes are generally big endian. The data had been written by a cluster, so I figured there was some chance I'd have to convert.
How did I do this? Well, it turns out that some fortran compilers give you the option of converting between the two systems (at compile time) for input and output. Of course, the one I was using didn't have that option, so I had to scour the documentation on our network for alternative compilers. I finally found one called "ifort" (i for Intel) and I set the relevant flag. Finally, my data came out!
What a nuisance. I understand that my advisors expect a little ingenuity from me, but this is ridiculous! A little more information would have been nice. <img src=smilies/banghead.gif>
Bah, well, at least I learned a thing or two.
<P ID="signature">----
"And dreams may come
That are everlasting
Though all just plastic too..." </P>
What I had was a 1.5 GB file that had been written in fortran (unknown version) and the following block of code (sent by email) for reference:
NPART = 512*512*512
NPART16 = NPART/16
WRITE(1)ZR
C
WRITE(1)(XV(1,N),N= 1, NPART16)
WRITE(1)(XV(1,N),N= NPART16+1, 2*NPART16)
WRITE(1)(XV(1,N),N= 2*NPART16+1, 3*NPART16)...
It went on from there in basically the same fashion as the last three lines. There were several problems with this, some of which didn't become apparent until later, but right away it was clear that I wasn't told what the data types were. This would mean a process of trial and error, the scope of which would depend on the set of data types available in fortran.
So, I did a little reading on fortran. It turns out that there are many data types (of varying sizes) that can be declared, as well as a variety of output formats. The block of code seemed to indicate that the data was "unformatted" (presumably to save space), but it didn't specify whether the data was stored in "sequential" or "direct" mode (this would have been given in the "open" statement). I tried all possible combinations of data types and io modes, but nothing seemed to be giving me understandable results.
Eventually, I resorted to a byte-by-byte analysis of the file. First, I calculated the exact number of bytes I expected the output file to have based on the snippet of code (including 8 pad bytes appended by fortran for each call of WRITE). After some temporary confusion resulting from the ridiculous definition of "kilobyte" (1,024 bytes instead of the reasonable 1,000), I was able to match the file size to the data types. The result came out exactly right if all variables were 4-byte data types -- reals or integers, presumably.
Despite seemingly knowing the exact format of the file, however, I still couldn't get reasonable results. I was about to give up when I remembered something my roommate once told me about byte organization. A google search revealed that some machines read and write multi-byte data in the "big endian" system and some in the "little endian" system. Most PCs are little endian, while mainframes are generally big endian. The data had been written by a cluster, so I figured there was some chance I'd have to convert.
How did I do this? Well, it turns out that some fortran compilers give you the option of converting between the two systems (at compile time) for input and output. Of course, the one I was using didn't have that option, so I had to scour the documentation on our network for alternative compilers. I finally found one called "ifort" (i for Intel) and I set the relevant flag. Finally, my data came out!
What a nuisance. I understand that my advisors expect a little ingenuity from me, but this is ridiculous! A little more information would have been nice. <img src=smilies/banghead.gif>
Bah, well, at least I learned a thing or two.
<P ID="signature">----
"And dreams may come
That are everlasting
Though all just plastic too..." </P>