Author Topic: What is de-facto File Format for Storing/Transporting Digital Signal Data?  (Read 6879 times)

0 Members and 1 Guest are viewing this topic.

Offline MechatrommerTopic starter

  • Super Contributor
  • ***
  • Posts: 11714
  • Country: my
  • reassessing directives...
for storing/transporting/tranferring digital signal data in PC. such as DSO data, DAQ data and the like. i can think of several type:
1) fixed-bits-length such as 8 bits DSO, 12 bits DAQ etc
2) fixed-byte floating points data
both can do. preferably the file format standard and description is available online to be implemented in a software application (import/export). please advice if you know any, thanks.
Nature: Evolution and the Illusion of Randomness (Stephen L. Talbott): Its now indisputable that... organisms “expertise” contextualizes its genome, and its nonsense to say that these powers are under the control of the genome being contextualized - Barbara McClintock
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 9489
  • Country: gb
The most widely used file formats for waveforms are the formats originally designed for audio, like .wav files.
 

Offline MechatrommerTopic starter

  • Super Contributor
  • ***
  • Posts: 11714
  • Country: my
  • reassessing directives...
sorry i forgot to mention, the format should contains critical information about the acquisition property of the signal such as sampling rate, scaling and offset if anything applicable... and also not the CSV ascii text based. storing hundreds of Mega Sample points in CSV will required Giga Bytes of storage and super-inefficient transcoding ascii to float numbers.

The most widely used file formats for waveforms are the formats originally designed for audio, like .wav files.
i suspect this is fix-bit-length type, not floating point type? although, i struggled a long time ago to find the format specification. can it be extended to store floating points numbers?
Nature: Evolution and the Illusion of Randomness (Stephen L. Talbott): Its now indisputable that... organisms “expertise” contextualizes its genome, and its nonsense to say that these powers are under the control of the genome being contextualized - Barbara McClintock
 

Offline Armxnian

  • Regular Contributor
  • *
  • Posts: 214
  • Country: us
  • Computer Engineering Student
.wav is a versatile format. Can store 32bit signed integer and 64bit float, everything in between and various other encodes for audio. 
 

Offline BravoV

  • Super Contributor
  • ***
  • Posts: 7549
  • Country: 00
  • +++ ATH1
If I'm not mistaken the max file size of wav is 4 GB ?

Offline ChunkyPastaSauce

  • Supporter
  • ****
  • Posts: 539
  • Country: 00
RF64 is extended version of .wav ... stores 16 exabytes...

The file format is going to depended on # of channels, whether each channel is independent of another or not (for example, different time reference for each signal), variable vs constant rate sampling, real-timeness and jitter, read-while-write,   etc...

« Last Edit: March 09, 2016, 07:23:30 am by ChunkyPastaSauce »
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 9489
  • Country: gb
sorry i forgot to mention, the format should contains critical information about the acquisition property of the signal such as sampling rate, scaling and offset if anything applicable... and also not the CSV ascii text based. storing hundreds of Mega Sample points in CSV will required Giga Bytes of storage and super-inefficient transcoding ascii to float numbers.

The most widely used file formats for waveforms are the formats originally designed for audio, like .wav files.
i suspect this is fix-bit-length type, not floating point type? although, i struggled a long time ago to find the format specification. can it be extended to store floating points numbers?
You suspect wrong. For linear data you can store 8, 16, 24, or 32 bit signed or unsigned integer samples. Floating samples are also supported. I am not sure about double precision floating point. Various compressed formats can also be stored in wave files. You can have multiple parallel channels. The header contains information about the sample type, the number of channels, etc. Its actually quite flexible.

Traditional .wav files have a 4GB limit. Some systems limit you to 2GB, because some people don't seem to understand the difference between signed and unsigned. As others have said, there is an extended .wav format for the 64 bit world. Unfortunately Its not nearly as widely implemented.
 

Offline Karel

  • Super Contributor
  • ***
  • Posts: 2267
  • Country: 00
I'm not aware of a defacto file format for time series data for test & measurement.
However, there's a defacto standard for biomedical time series data, which can be used also for general test & measurement like DSO's.
The name of this format is EDF (European Data Format).
The advantage of EDF is that there are lot's of tools and viewers available that can read/write those files, like Scilab, R, Octave, Matlab, Polyman, EDFbrowser, etc.
There's also a free C/C++ library to read/write EDF.

The pro's of EDF: it's a simple format that's easy to implement. The header contains all necessary data to store sensitivity, samplerate, channel name, etc.
Also, it supports channels with different samplerate.

Cons: EDF stores the samples as 16-bit 2'complements while you probably need only 8 bits resolution.
Technically it doesn't make a difference, apart from the resulting filesize...

https://en.wikipedia.org/wiki/European_Data_Format

 

Offline MechatrommerTopic starter

  • Super Contributor
  • ***
  • Posts: 11714
  • Country: my
  • reassessing directives...
You suspect wrong.
yeah as i've been educated that *.wav is a song file format to hear musics :palm:

The name of this format is EDF (European Data Format).
i have to pay $35.95 just to get to the standard sigh
« Last Edit: March 09, 2016, 10:32:23 am by Mechatrommer »
Nature: Evolution and the Illusion of Randomness (Stephen L. Talbott): Its now indisputable that... organisms “expertise” contextualizes its genome, and its nonsense to say that these powers are under the control of the genome being contextualized - Barbara McClintock
 

Offline jeremy

  • Super Contributor
  • ***
  • Posts: 1079
  • Country: au
I normally use HDF5, and have recorded traces in the hundreds of gigabytes without issue. HDF5 is very common in the scientific computing and high performance computing areas.
 

Offline HAL-42b

  • Frequent Contributor
  • **
  • Posts: 423
EDF (European Data Format).

This looks really useful. Thanks for sharing.

Extending it with different bit depths might be useful.
 

Offline Karel

  • Super Contributor
  • ***
  • Posts: 2267
  • Country: 00
Wave format:

Pros: Common format, easy to implement, supports 8-bit samples (smaller filesize).

Cons: no possibility to store: sensitivity/scaling, offset, units (Volts, milli-Volts, amps, etc.), description of measurement/experiment.

 

Offline Karel

  • Super Contributor
  • ***
  • Posts: 2267
  • Country: 00
You suspect wrong.
yeah as i've been educated that *.wav is a song file format to hear musics :palm:

The name of this format is EDF (European Data Format).
i have to pay $35.95 to get to the standard sigh

The EDF format is a truly open format and available for free without any registration.

Here's the link to the one and only official EDF format website:

http://edfplus.info/

 

Offline Karel

  • Super Contributor
  • ***
  • Posts: 2267
  • Country: 00
EDF (European Data Format).

This looks really useful. Thanks for sharing.

Extending it with different bit depths might be useful.

Please don't divert from the (defacto) standards in order to maintain compatibility.

There's already a 24-bit version of EDF which is called BDF:

http://www.biosemi.com/faq/file_format.htm

http://www.teuniz.net/edfbrowser/bdfplus%20format%20description.html



 

Offline MechatrommerTopic starter

  • Super Contributor
  • ***
  • Posts: 11714
  • Country: my
  • reassessing directives...
is there a need for bigger than say 4 bytes float, or 8 bytes double? for time series data? like from DSO or DMM data? what measurement can be greater than 3.402823E38 and smaller than 1.401298E-45 in magnitude for both sign? is there a format that can differentiate, or store both interleaved format and non-interleaved format, ie 1 channel in a chunk followed by another channel chunk? and then back to channel1's chunk again? wav specification and terms are more towards audio data, EDF is specific to medical data, it even has its own "patience name" field in it, and most header fields are Ascii based, there is good in this since its "endian-free" format, but does the world separated into two big endian group differences? HDF5 sound application-blind, i mean its generically explain it simply as "data model", but is downloaded in "kit", "api" and "tools". why do i have to download all that to implement a storage format? does EDF, WAV and HDF5 has "time-frame" field so we can synchronize data from another channel or even from another files?
Nature: Evolution and the Illusion of Randomness (Stephen L. Talbott): Its now indisputable that... organisms “expertise” contextualizes its genome, and its nonsense to say that these powers are under the control of the genome being contextualized - Barbara McClintock
 

Offline Karel

  • Super Contributor
  • ***
  • Posts: 2267
  • Country: 00
EDF is specific to medical data, it even has its own "patience name" field in it, and most header fields are Ascii based

Why do you think this is a problem?

Focus on the availability of tools and libraries and on the possibilty to easily open it in common programs for analysis.

 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 28013
  • Country: nl
    • NCT Developments
I'd go for CSV (or any text base format) because it is self documenting. Any binary format is prone to interpretation differences and loss of readability. The length limit of a WAV file is another typical problem you'll have with a binary format. Actually a text based format can be more size efficient because you only store relevant information (significant digits) versus enough room for any number.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline dom0

  • Super Contributor
  • ***
  • Posts: 1483
  • Country: 00
HDF5

Widely used, good software support, supports practically all data types and dimensions of data, supports essentially arbitrary meta data.

It might be a bit "heavyweight" if all you want to do is store a handful of DSO traces or something like that.
,
 

Offline Karel

  • Super Contributor
  • ***
  • Posts: 2267
  • Country: 00
Actually a text based format can be more size efficient because you only store relevant information (significant digits) versus enough room for any number.

Text based formats are much bigger (multiple times) compared to binary formats.

For example: the value -1.2345 takes 7 bytes plus the CSV separator character plus the newline character is 9 bytes in total.
The same value can be stored in a binary format using only 2 bytes.

 

Offline jeremy

  • Super Contributor
  • ***
  • Posts: 1079
  • Country: au
I don't see anything mentioning what language you are using, but for Python I use h5py. It's very simple to use compared to the C library and treats the hdf5 file like a collection of numpy arrays; the exact numerical format is up to you. It also automatically chunks files, checksums and compresses them if you enable the right options.

Without compression but with checksums, I was able to easily stream data to an ssd at 350MB/s while remaining inside python in less than 100 lines of code
 

Offline MechatrommerTopic starter

  • Super Contributor
  • ***
  • Posts: 11714
  • Country: my
  • reassessing directives...
I'd go for CSV (or any text base format) because it is self documenting.
its self documenting until you try to save or read starting 24 Mega Points and up... by that time its not self documenting but self annoying. you havent saved 1Mpts of CSV from rigol yet |O

I don't see anything mentioning what language you are using,
any language the one can name. the format should be language-independent shouldnt it?
« Last Edit: March 09, 2016, 05:15:48 pm by Mechatrommer »
Nature: Evolution and the Illusion of Randomness (Stephen L. Talbott): Its now indisputable that... organisms “expertise” contextualizes its genome, and its nonsense to say that these powers are under the control of the genome being contextualized - Barbara McClintock
 

Offline dom0

  • Super Contributor
  • ***
  • Posts: 1483
  • Country: 00
"self documenting" and "cat <file> is self explanatory" are two different things.

Although I concur that CSV isn't really well suited for larger amounts of data.
,
 

Offline MechatrommerTopic starter

  • Super Contributor
  • ***
  • Posts: 11714
  • Country: my
  • reassessing directives...
Without compression but with checksums, I was able to easily stream data to an ssd at 350MB/s while remaining inside python in less than 100 lines of code
come to think of it, its now sensical. it must be one complicated format, that need somehow somesort of adaptive algorithm to compress, stream etc the data, probably tuning to the best machine and multithreading config its running in. the drawback is we are tied with the system, i'll check back again when i have time, thanks for the suggestion...
Nature: Evolution and the Illusion of Randomness (Stephen L. Talbott): Its now indisputable that... organisms “expertise” contextualizes its genome, and its nonsense to say that these powers are under the control of the genome being contextualized - Barbara McClintock
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf