Skip to content

Commit

Permalink
Update enrichedDataset.md
Browse files Browse the repository at this point in the history
  • Loading branch information
deveyNull authored Dec 1, 2017
1 parent 006771a commit d1b1781
Showing 1 changed file with 22 additions and 19 deletions.
41 changes: 22 additions & 19 deletions enrichedDataset.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,31 @@
## I will use this to describe the csv. It's kinda stream of consciousness now, but it will become prettier.
## I will use this to describe the csv.

##### [0]: domainName
##### [1]: count, count # just in general useful for all of this... if you use total values for things like bytes or packets io, should be used to scale results.
##### domainName:
##### count:

## Word Magic: return([countUnique, percentageUnique, modeCount, percentageMode])
### For every item below there are 4 columns.
##### [2-6] temp0, subdomain array #super important for DNS, less so for http
##### [6-11] temp1, agent array #unlikely, ignore
##### temp0 = subdomain array: Super important for DNS, less likely to be used for HTTP because there are so many other places to hide data.
##### temp1 = user agent array: Unlikely to be used by anyone, but it could happen.
temp2, uri array #super important for http, encoded in URI

## Math Magic: (return([countUnique, percentageUnique, average, minimum, maximum, entropyStat, variationStat, skewStat, kurtosisStat])
### For every item in this list, there are 9 columns for each statistics function returned

##### temp_0, delta time list # very important, periodicity?
##### magicDurationArray, durations #possibly important
##### magicOrigBytesArray, bytes sent #yes * maybe something can be done with ratios here
##### magicRespBytesArray, bytes received #yes
##### magicOrigPacketsArray, packets sent #yes
##### magicOrigIpBytesArray, ip bytes sent #yes
##### magicRespPacketsArray, packets recieved #yes
##### magicRespIpBytesArray, ip bytes recieved #yes * maybe something can be done with ratios here
##### temp_2, uri length #important
##### temp_3, uri depth #important
##### temp_4, uri entropy #important
##### temp_5, agent length #unlikely to matter, #unlikely to matter
##### temp_6, agent depth #unlikely to matter, #unlikely to matter
##### temp_7, agent entropy #unlikely to matter, recommend ignore
##### temp_0 = delta time list: Stats from an array of the time differences between connections... a poor man's time series analysis. There are much better ways to do this most likely, for now, most likely effective.
##### magicDurationArray = connection durations: Stats from an array of the connection lengths. File under, possibly important.
### TIME TO DO: Actual time series analysis
##### magicOrigBytesArray = bytes sent: Important
##### magicRespBytesArray = bytes received: Ditto and #yes
##### magicOrigPacketsArray = packets sent: Ditto and #yes
##### magicOrigIpBytesArray = ip bytes sent: Ditto and #yes
##### magicRespPacketsArray = packets recieved: Ditto and #yes
##### magicRespIpBytesArray = ip bytes recieved: Ditto and #yes
#### Bytes To Do: Various Producer/Consumer Ratios
##### temp_2 = uri length: Length of the URI, longer = sketchier.
##### temp_3 = uri depth: Stats from array of directory depths in URI.
##### temp_4 = uri entropy: Stats from array of uri entropy, can be significantly optimized.
### URI TO DO: Longest common substring stuff, URI hexadecimal count, entropy in final subdirectory.
##### temp_5 = agent length: #unlikely to matter, #unlikely to matter
##### temp_6 = agent depth: #unlikely to matter, #unlikely to matter
##### temp_7 = agent entropy: #unlikely to matter, recommend ignore

0 comments on commit d1b1781

Please sign in to comment.