Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to generate binary type input file #196

Open
atri7887 opened this issue Aug 23, 2022 · 11 comments
Open

How to generate binary type input file #196

atri7887 opened this issue Aug 23, 2022 · 11 comments

Comments

@atri7887
Copy link

I have a text file that has multiple 64 bit binary data(example given below) extracted from PUF primitives. How do I convert it to a suitable input file? This input file creation method should ideally be present in the readme file.

File: binary_signature.txt
1001001101010100111010001000011010000111010100100101011100011011
1011001100011001100110110111001000111001111110000101110111000001
0101100010010010111110010011110110101111011010100100000100011100
.
.
.
1110011011000000100100100011100011000101110001110111010110110001

@joshuaehill
Copy link
Contributor

Short answer: The tool automatically translates values, so if you limit the file to ASCII '0' and ASCII '1' (with no new lines, or any other characters in the file), that will do the right thing.

The "correct" answer is to present this binary data as a string of '0x00' bytes (for '0') and '0x01' bytes (for '1').

@atri7887
Copy link
Author

atri7887 commented Aug 24, 2022

Can you please present a sample for this?-> "The "correct" answer is to present this binary data as a string of '0x00' bytes (for '0') and '0x01' bytes (for '1')"

Also, is it absolutely necessary to present 1 million data samples to run the tool?
Will it fail the test in its absence?

@joshuaehill
Copy link
Contributor

I'm not really sure what language you are using; so it is difficult to produce a comprehensive answer that would be useful to you. This isn't a task that is specific to this program, so you should probably consult a tutorial site for the language you are using, and take a look at how that language deals with binary file I/O.

As an example, if you are interesting in writing binary files. In C, you could use calls like fwrite to accomplish this. In this case, you will want to make sure that the types that you are using are sized appropriately (e.g., uint8_t). Such a tutorial site for C is here. Adapting a program example from that site:

 #include<stdio.h>

int main () {
   FILE *fp;
   uint8_t sample;

   fp = fopen( "data.bin" , "wb" );
   for(int j=0; j<1000000; j++) {
      sample = get_noise_sample(); //get a 1-bit value from the noise sample
      fwrite(sample , 1 , sizeof(uint8_t) , fp );
   }

   fclose(fp);
  
   return(0);
}

@atri7887
Copy link
Author

Thanks a lot. This is really helpful. Also, can you kindly add this comment to the readme/ user guide for completeness?

@thanoojarao
Copy link

I have a test file named sample.txt with 1024 samples data as a string without new line
Is this correct file format? or should I need to do any changes
command to run test : ./ea_non_iid -i-v sample.txt 8
sample.txt:
0x4d0xb20x850x7a0x850x7d0xc20x3d0x820x7d0x820x7d0x820x3c0xc3..................0x7d0x980xe30xbc0xc30xbc0xe0x710x8e0x610xde0x00xf70x80x3e0x810.........................................................0xac0x130xee0x310x8e0x75

@Chaosequals
Copy link

Hello, I'm trying to understand NIST-SP800-90b, specifically the use of the parameter [bits_per_symbol] in ea_non_iid.

For the given binary_signature.txt, shall the [bits_per_symbol] be set as 64?

However, I understand that [bits_per_symbol] should be small enough to fit within a single byte. Does this mean I should divide the original data into 8 segments, with each segment being 8 bits (therefore, bits_per_symbol = 8)? I'm concerned that doing so might alter the original data's physical meaning.

Could you provide some guidance on this? Thanks.

The provided File: binary_signature.txt by atri7887:
1001001101010100111010001000011010000111010100100101011100011011
1011001100011001100110110111001000111001111110000101110111000001
0101100010010010111110010011110110101111011010100100000100011100
.
.
.
1110011011000000100100100011100011000101110001110111010110110001

@fogking
Copy link

fogking commented Dec 28, 2023

@joshuaehill Can we know the implementation of get_noise_sample()?

I can't see the values in the sample as binary, could you post the source code or the process of creating the sample in the README?

@fogking
Copy link

fogking commented Jan 3, 2024

Resolved the issue.
I created a sample file with []bytes that I wanted to test and it passed.

@joshuaehill
Copy link
Contributor

joshuaehill commented Jan 4, 2024

I have a test file named sample.txt with 1024 samples data as a string without new line Is this correct file format? or should I need to do any changes command to run test : ./ea_non_iid -i-v sample.txt 8 sample.txt: 0x4d0xb20x850x7a0x850x7d0xc20x3d0x820x7d0x820x7d0x820x3c0xc3..................0x7d0x980xe30xbc0xc30xbc0xe0x710x8e0x610xde0x00xf70x80x3e0x810.........................................................0xac0x130xee0x310x8e0x75

An text file containing samples in the format of text strings like "0x4d" is absolutely not the correct format.

Please understand, this is not an issue with the tool, but is instead a quite general issue regarding how binary files work on your platform. Please take a look at some binary file I/O introductions for whatever computer language you are most comfortable with. You want to produce and use "binary" files containing the stated octets, not "text" files.

@joshuaehill
Copy link
Contributor

@joshuaehill Can we know the implementation of get_noise_sample()?

Please see the SP 800-90B document for context. This is the function used to abstract the noise source interface. The interface is entropy-source specific.

@joshuaehill
Copy link
Contributor

Hello, I'm trying to understand NIST-SP800-90b, specifically the use of the parameter [bits_per_symbol] in ea_non_iid.

For the given binary_signature.txt, shall the [bits_per_symbol] be set as 64?

It is surely possible that your noise source produces 64-bit output, but if this is the case, you are going to need to map these outputs down to at most 8-bit-wide symbols.

Be advised that this mapping essentially establishes the probability of each mapped symbol by adding the probability of each symbols that maps to it. This may mask problems with the underlying noise source, so it should be done carefully.

However, I understand that [bits_per_symbol] should be small enough to fit within a single byte. Does this mean I should divide the original data into 8 segments, with each segment being 8 bits (therefore, bits_per_symbol = 8)?

No, this would cause the tool to produce nonsense.

I'm concerned that doing so might alter the original data's physical meaning.

Indeed.

There is some discussion on how to do this "mapping down" (i.e., "reducing the symbol space") in SP 800-90B Section 6.4, though this is only one possible approach. An alternate approach that applies in some physical systems is to use the approach that I outlined in Comment #20. Alternately, in some physical systems it makes more sense to map various value ranges to different abstract symbols (e.g., discretizing by mapping a sampled input voltage level to one of several distinguished symbols). The appropriate mapping approach is very specific to the noise source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants