For each Activity resource (the technical term the API reference uses for posts) a separate JSON file is created. Its filename appears to be named according to the following structure:
YYYYMMDD - UNIQUE_POST_TITLE.json
or, represented as a Regular Expression:
(?<year>[0-9]{4})(?<month>[0-9]{2})(?<day>[0-9]{2}) - (?<unique_post_title>.{0-42})\.json
where YYYYMMDD
is the Activity's creation date consisting of a 4 digit year, 2 digit month and 2 digit day of month, and UNIQUE_POST_TITLE
is a unique (within the scope of the same day) Activity identifier that's at most 42 characters long.
The UNIQUE_POST_TITLE
generally is the first 39 characters of the Activity's content.
However, if this would cause a duplicate filename within the same day, the last characters of this fragment will be replaced with a (\d+)
suffix (e.g. (1)
) where the number between the parentheses is an integer which starts at 1, and gets incremented by 1 till the suffixed title fragment is unique again within the scope of the day.
This structure isn't complete yet, as it's based on just a couple of example JSON files.
The data is a standard JSON file. Its structure can be visualised as a Hash or associative array as follows:
activity_post = {
'url': 'https://plus.google.com/ACTIVITY_USER_ID/posts/ACTIVITY_ID',
'creationTime': 'YYYY-mm-dd HH:MM:SS:zzzz',
'updateTime': 'YYYY-mm-dd HH:MM:SS:zzzz',
'author':
{
'displayName': 'DISPLAY_NAME_WITHOUT_NICKNAME',
'profilePageUrl': 'https://plus.google.com/ACTIVITY_USER_ID',
'avatarImageUrl': 'https://lh3.googleusercontent.com/-RANDOM/PATH/TO/RESOURCE/photo.jpg',
'resourceName': 'users/ACTIVITY_USER_ID'
},
'album':
{
'media':
[
{
'url': 'https://lh3.googleusercontent.com/-RANDOM/PATH/TO/RESOURCE/photo.jpg',
'contentType': 'image/*',
'width': 1234,
'height': 4321,
'resourceName': 'media/MEDIA_RESOURCE_ID',
}
]
},
'content': 'HTML_FORMATTED_CONTENT',
'link':
{
'title': 'LINK_TITLE',
'url': 'LINK_ABSOLUTE_URL',
'imageUrl': 'LINK_ABSOLUTE_IMAGE_URL'
},
'resharedPost': {
'url': 'https://plus.google.com/+ORIGINAL_ACTIVITY_USER_ID/posts/ORIGINAL_ACTIVITY_ID',
'author': {
'displayName': 'ORIGINAL_ACTIVITY_USER_DISPLAY_NAME',
'profilePageUrl': 'https://plus.google.com/+ORIGINAL_ACTIVITY_USER_ID',
'avatarImageUrl': 'https://lh3.googleusercontent.com/-RANDOM/PATH/TO/RESOURCE/photo.jpg',
'resourceName': 'users/ORIGINAL_ACTIVITY_NUMERIC_USER_ID'
},
'content': 'ORIGINAL_ACTIVITY_HTML_FORMATTED_CONTENT',
'resourceName': 'users/ORIGINAL_ACTIVITY_NUMERIC_USER_ID/posts/ORIGINAL_ACTIVITY_ID'
},
'comments':
[
'creationTime': 'YYYY-mm-dd HH:MM:SS:zzzz',
'author':
{
'displayName': 'DISPLAY_NAME_WITHOUT_NICKNAME',
'profilePageUrl': 'https://plus.google.com/USER_ID',
'avatarImageUrl': 'https://lh3.googleusercontent.com/-RANDOM/PATH/TO/RESOURCE/photo.jpg',
'resourceName': 'users/USER_ID'
},
'content': 'HTML_FORMATTED_CONTENT',
'postURL': 'https://plus.google.com/ACTIVITY_USER_ID/posts/ACTIVITY_ID',
'resourceName': 'users/ACTIVITY_USER_ID/posts/THREAD_ID/comments/COMMENT_ID'
],
'resourceName': 'users/ACTIVITY_USER_ID/posts/THREAD_ID',
'plusOnes':
[
'plusOner':
{
'displayName': 'DISPLAY_NAME_WITHOUT_NICKNAME',
'profilePageUrl': 'https://plus.google.com/USER_ID',
'avatarImageUrl': 'https://lh3.googleusercontent.com/-RANDOM/PATH/TO/RESOURCE/photo.jpg',
'resourceName': 'users/USER_ID'
},
],
'postAcl':
{
'visibleToStandardAcl':
{
'circles':
[
{
'type': 'CIRCLE_TYPE_PUBLIC'
},
{
'type': 'CIRCLE_TYPE_YOUR_CIRCLES'
},
{
'type': 'CIRCLE_TYPE_EXTENDED_CIRCLES'
},
{
'resourceName': 'circles/USER_ID-CIRCLE_ID',
'type': 'CIRCLE_TYPE_USER_CIRCLE',
'displayName': 'CIRCLE_DISPLAY_NAME'
}
],
'users':
[
{
'resourceName': 'users/USER_ID',
'displayName': 'DISPLAY_NAME_WITHOUT_NICKNAME'
}
]
},
'communityAcl':
{
'community':
{
'resourceName': 'communities/COMMUNITY_ID',
'displayName': 'COMMUNITY_DISPLAY_NAME'
},
'users':
[
{
'resourceName': 'users/USER_ID',
'displayName': 'DISPLAY_NAME_WITHOUT_NICKNAME'
}
]
},
'eventAcl':
{
'event':
{
'resourceName': 'events/EVENT_ID'
}
},
'collectionAcl':
{
'collection':
{
'resourceName': 'collections/COLLECTION_ID',
'displayName': 'COLLECTION_DISPLAY_NAME'
}
}
}
}
(NOTE: This Ruby structure has not been kept up to date; for now it's merely still here as illustration/clarification)
What follows is a list of all possible JSON keys encountered after analysing a set of complete archive of Google+ Stream posts from 2011-06-30 till 2019-04-02, from various accounts, including a G Suite account:
activityId
album
album.media
album.media[]
album.media[].contentType
album.media[].description
album.media[].height
album.media[].localFilePath
album.media[].resourceName
album.media[].url
album.media[].width
author
author.avatarImageUrl
author.displayName
author.profilePageUrl
author.resourceName
collectionAttachment
collectionAttachment.coverPhotoUrl
collectionAttachment.displayName
collectionAttachment.owner
collectionAttachment.owner.avatarImageUrl
collectionAttachment.owner.displayName
collectionAttachment.owner.profilePageUrl
collectionAttachment.owner.resourceName
collectionAttachment.permalink
collectionAttachment.resourceName
comments
comments[]
comments[].author
comments[].author.avatarImageUrl
comments[].author.displayName
comments[].author.profilePageUrl
comments[].author.resourceName
comments[].commentActivityId
comments[].content
comments[].creationTime
comments[].link
comments[].link.imageUrl
comments[].link.title
comments[].link.url
comments[].media
comments[].media.contentType
comments[].media.height
comments[].media.localFilePath
comments[].media.resourceName
comments[].media.url
comments[].media.width
comments[].postUrl
comments[].resourceName
comments[].updateTime
communityAttachment
communityAttachment.coverPhotoUrl
communityAttachment.displayName
communityAttachment.resourceName
content
creationTime
link
link.imageUrl
link.title
link.url
location
location.displayName
location.latitude
location.longitude
location.physicalAddress
media
media.contentType
media.description
media.height
media.localFilePath
media.resourceName
media.url
media.width
plusOnes
plusOnes[]
plusOnes[].plusOner
plusOnes[].plusOner.avatarImageUrl
plusOnes[].plusOner.displayName
plusOnes[].plusOner.profilePageUrl
plusOnes[].plusOner.resourceName
poll
poll.choices
poll.choices[]
poll.choices[].description
poll.choices[].imageLocalFilePath
poll.choices[].imageUrl
poll.choices[].resourceName
poll.choices[].voteCount
poll.choices[].votes
poll.choices[].votes[]
poll.choices[].votes[].voter
poll.choices[].votes[].voter.avatarImageUrl
poll.choices[].votes[].voter.displayName
poll.choices[].votes[].voter.profilePageUrl
poll.choices[].votes[].voter.resourceName
poll.imageLocalFilePath
poll.imageUrl
poll.totalVotes
poll.viewerPollChoiceResourceName
postAcl
postAcl.collectionAcl
postAcl.collectionAcl.collection
postAcl.collectionAcl.collection.displayName
postAcl.collectionAcl.collection.resourceName
postAcl.collectionAcl.users
postAcl.collectionAcl.users[]
postAcl.collectionAcl.users[].displayName
postAcl.collectionAcl.users[].resourceName
postAcl.communityAcl
postAcl.communityAcl.community
postAcl.communityAcl.community.displayName
postAcl.communityAcl.community.resourceName
postAcl.communityAcl.users
postAcl.communityAcl.users[]
postAcl.communityAcl.users[].displayName
postAcl.communityAcl.users[].resourceName
postAcl.eventAcl
postAcl.eventAcl.event
postAcl.eventAcl.event.displayName
postAcl.eventAcl.event.resourceName
postAcl.isDomainRestricted
postAcl.isLegacyAcl
postAcl.isPublic
postAcl.visibleToStandardAcl
postAcl.visibleToStandardAcl.circles
postAcl.visibleToStandardAcl.circles[]
postAcl.visibleToStandardAcl.circles[].displayName
postAcl.visibleToStandardAcl.circles[].resourceName
postAcl.visibleToStandardAcl.circles[].type
postAcl.visibleToStandardAcl.users
postAcl.visibleToStandardAcl.users[]
postAcl.visibleToStandardAcl.users[].displayName
postAcl.visibleToStandardAcl.users[].resourceName
postKind
resharedPost
resharedPost.album
resharedPost.album.media
resharedPost.album.media[]
resharedPost.album.media[].contentType
resharedPost.album.media[].description
resharedPost.album.media[].height
resharedPost.album.media[].resourceName
resharedPost.album.media[].url
resharedPost.album.media[].width
resharedPost.author
resharedPost.author.avatarImageUrl
resharedPost.author.displayName
resharedPost.author.profilePageUrl
resharedPost.author.resourceName
resharedPost.communityAttachment
resharedPost.communityAttachment.coverPhotoUrl
resharedPost.communityAttachment.displayName
resharedPost.communityAttachment.resourceName
resharedPost.content
resharedPost.link
resharedPost.link.imageUrl
resharedPost.link.title
resharedPost.link.url
resharedPost.media
resharedPost.media.contentType
resharedPost.media.description
resharedPost.media.height
resharedPost.media.resourceName
resharedPost.media.url
resharedPost.media.width
resharedPost.resourceName
resharedPost.url
reshares
reshares[]
reshares[].resharer
reshares[].resharer.avatarImageUrl
reshares[].resharer.displayName
reshares[].resharer.profilePageUrl
reshares[].resharer.resourceName
resourceName
updateTime
url
This list was generated with the following command:
find /path/to/{first,second,third}/extracted/takeout-archive/Google+\ Stream/Posts/ -iname '*.json' -type f -exec jq -rs 'paths|map(.|tostring)|join(".")' '{}' \; | sed -E 's/^[0-9]{1,}(\.|$)//' | sed -E 's/\.[0-9]{1,}/[]/g' | sort -u | tee unique_keys_in_takeout_posts.txt
I used gsed
instead of sed
to indicate it's GNU sed, rather than for instance the default sed that comes with for example macOS; though the regular expression is probably simple enough to work regardless.
A .
(period) indicates that the following key is a child, and []
(square brackets) indicate that the preceding parent key is an array.
So, reshares
is an array of which the child members contain the hash key resharer
, which contain the keys avatarImageUrl
, displayName
, profilePageUrl
and resourceName
.
In JSON this would be equivalent to:
"reshares": [
{
"resharer": {
"avatarImageUrl": "some url",
"displayName": "some display name",
"profilePageUrl": "some url",
"resourceName": "some/resource/path"
}
}
]
ACTIVITY_USER_ID
is the numeric Person ID of the user who posts the Activity;ACTIVITY_ID
a unique identifier consisting of alpha-numeric characters ([a-zA-Z0-9]
); do note that this id is not the same as theActivity#id
used by the Google+ REST API. In fact, none of theid
s andresourceName
s supplied in the JSON file seem to be usable through the API.USER_ID
is the numeric Person ID of the user within the scope of the current content. For instance, within the scope of 'comments', it is the Person ID of the user who posted the comment;DISPLAY_NAME_WITHOUT_NICKNAME
is the display name of the profile, however the nickname does not seem to be included, even if the user had set his nickname to be included within his display name.HTML_FORMATTED_CONTENT
is the content (post's or comment's body) formatted using HTML, rather than containing Google's own markdown-like markup/formatting language.LINK_TITLE
is the title of the linked (external) webpage, likely extracted from the webpage's<title>
HTML tag.LINK_ABSOLUTE_URL
is the absolute URL of the linked (external) webpage (e.g.https://somesite.example/path/to/some/page
).LINK_ABSOLUTE_IMAGE_URL
is the absolute URL of the linked (external) image resource (e.g.https://somesite.example/path/to/some/image.jpg
).THREAD_ID
appears to be a unique identifier for a comment thread for an activity. Though it also seems to be a secondary Activity ID (perhaps used internally?) as it is not only used within comment threads, it also is used in the Activity's resourceName.COMMENT_ID
appears to be a unique identifier for a comment within a comment thread for an activity. It appears to consist of two sub-fragments, delimited by a hyphen (-). The first fragments appears to be the same for all comments within the same thread, and the last (shortest) fragment seems to be unique within that thread.COMMUNITY_ID
unique numeric identifier for the community in which the Activity was posted.COMMUNITY_DISPLAY_NAME
string, display name for the community in which the Activity was posted.EVENT_ID
unique alpha-numeric identifier for the Event Activity that was posted.MEDIA_RESOURCE_ID
an alpha-numeric URL-encoded identifier, or possibly a URL-encoded BASE64 content string.COLLECTION_ID
an alpha-numeric identifier for the Collection the Activity is in.COLLECTION_DISPLAY_NAME
the display name of the Collection.media
.width
andmedia
.height
are integer values representing their dimensions in pixels.ORIGINAL_ACTIVITY_HTML_FORMATTED_CONTENT
likeHTML_FORMATTED_CONTENT
but then for the Original Activity item rather than the reshared item.ORIGINAL_ACTIVITY_NUMERIC_USER_ID
likeACTIVITY_USER_ID
but then belonging to the user who originally posted the Activity item, rather than the reshared Activity item. Is an Integer represented as a String.ORIGINAL_ACTIVITY_USER_ID
likeACTIVITY_USER_ID
, but will be in +UserAliasID format if the user has set a user alias. If no alias was set, it'll be a numeric id instead.ORIGINAL_ACTIVITY_USER_DISPLAY_NAME
likeDISPLAY_NAME_WITHOUT_NICKNAME
, but then for the User who originally posted the Activity, rather than the reshared Activity item.ORIGINAL_ACTIVITY_ID
likeACTIVITY_ID
but then for the original Activity item, rather than the reshared item.
Within datetime strings:
YYYY
is a 4 digit year,mm
a 2 digit month (01-12),dd
a 2 digit day of the month (01-31),HH
a 2 digit hour (00-23),MM
a 2 digit minute of the hour (00-59),SS
2 digits indicating the seconds of the minute (00-59) andzzzz
2x2 digits indicating the hours and minutes offset of UTC.
For the postAcl
(Access Control List) data, there is some exclusivity regarding the items, but for completeness/overview sake all possible items are listed.
It's likely that you can have either a visibleToStandardAcl
item, an eventAcl
item, a communityAcl
item, or a collectionAcl
item and none of them combined with each other.
Furthermore, within visibleToStandardAcl
.circles
, the CIRCLE_TYPE_PUBLIC
-, CIRCLE_TYPE_YOUR_CIRCLES
- and CIRCLE_TYPE_EXTENDED_CIRCLES
-type circle items are likely mutually-exclusive, which means you can't combine one or more of them with each other. You can likely find one of them combined with zero, one or more CIRCLE_TYPE_USER_CIRCLE
-type circle item though.
Finally, within communityAcl
the users
item is optional.