Skip to content

Conversation

@assakafpix
Copy link
Contributor

Summary

This PR adds support for the Google Admin Reports API to the steampipe-plugin-googleworkspace plugin by introducing five new tables:

  • googleworkspace_admin_reports_admin_activity
  • googleworkspace_admin_reports_drive_activity
  • googleworkspace_admin_reports_login_activity
  • googleworkspace_admin_reports_mobile_activity
  • googleworkspace_admin_reports_token_activity

These tables allow Steampipe users to query Workspace audit logs for key activities and events across their domain.


Motivation

Auditing and monitoring activity within Google Workspace is essential for security, compliance, and operational oversight. Currently, this plugin does not support querying Google’s Admin Reports API.

By introducing these new tables, users can now:

  • Monitor admin actions such as user or group changes
  • Detect suspicious logins or token usage
  • Track file access and sharing activity in Google Drive
  • Audit mobile device enrollments and policy violations

Changes

  • Added five new table files under googleworkspace/table_googleworkspace_admin_reports_*.go
  • Integrated with the Reports API via the Google SDK
  • Defined schemas and hydration functions for each table
  • Wrote documentation and example queries for each table

@cbruno10 cbruno10 requested a review from Copilot July 5, 2025 15:57
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR integrates the Google Admin Reports API into the Steampipe Google Workspace plugin by adding five new tables that surface audit log data for token, mobile, login, Drive, and admin activities.

  • Introduces new table definitions under googleworkspace/table_googleworkspace_admin_reports_*.go
  • Adds a ReportsService in service.go, updates plugin mappings, and injects the required Admin Reports scope
  • Provides documentation and example queries for each new table

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
googleworkspace/table_googleworkspace_admin_reports_admin_activity.go New table for admin activity events
googleworkspace/table_googleworkspace_admin_reports_drive_activity.go New table for Drive activity events
googleworkspace/table_googleworkspace_admin_reports_login_activity.go New table for login activity events
googleworkspace/table_googleworkspace_admin_reports_mobile_activity.go New table for mobile activity events
googleworkspace/table_googleworkspace_admin_reports_token_activity.go New table for token activity events
googleworkspace/service.go Added ReportsService and Admin Reports audit scope
googleworkspace/plugin.go Registered the five new tables in the plugin’s TableMap
docs/index.md Updated APIs/Credentials section to include Admin Reports API
docs/tables/*.md Added documentation for the new tables
Comments suppressed due to low confidence (2)

googleworkspace/table_googleworkspace_admin_reports_token_activity.go:45

  • [nitpick] The description says 'List of event names' but extractFirstEventName returns only the first event name. Update the description to reflect that only the first event name is returned.
                Description: "List of event names for this activity",

googleworkspace/table_googleworkspace_admin_reports_admin_activity.go:1

  • No unit tests were added for the new admin reports tables or their transform functions. Consider adding tests to cover list functions and transform helpers to ensure correct behavior.
package googleworkspace

Comment on lines 44 to 47
Name: "event_name",
Description: "List of event names for this activity",
Type: proto.ColumnType_STRING,
Transform: transform.FromField("Events").Transform(extractEventNames),
Copy link

Copilot AI Jul 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The event_name column is defined as STRING but the extractEventNames transform returns a slice of strings. Consider changing the column type to JSON or a string array to match the returned value.

Suggested change
Name: "event_name",
Description: "List of event names for this activity",
Type: proto.ColumnType_STRING,
Transform: transform.FromField("Events").Transform(extractEventNames),
Name: "event_name",
Description: "List of event names for this activity",
Type: proto.ColumnType_JSON,
Transform: transform.FromField("Events").Transform(extractEventNames),

Copilot uses AI. Check for mistakes.
Comment on lines 50 to 53
Name: "event_name",
Description: "List of event names for this activity",
Type: proto.ColumnType_STRING,
Transform: transform.FromField("Events").Transform(extractEventNames),
Copy link

Copilot AI Jul 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The event_name column is defined as STRING but the extractEventNames transform returns a slice of strings. Update the column type to JSON or an appropriate array type.

Suggested change
Name: "event_name",
Description: "List of event names for this activity",
Type: proto.ColumnType_STRING,
Transform: transform.FromField("Events").Transform(extractEventNames),
Name: "event_name",
Description: "List of event names for this activity",
Type: proto.ColumnType_JSON,
Transform: transform.FromField("Events").Transform(extractEventNames),

Copilot uses AI. Check for mistakes.
{
Name: "event_name",
Description: "List of event names for this activity",
Type: proto.ColumnType_STRING,
Copy link

Copilot AI Jul 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The event_name column is defined as STRING but you are using extractEventNames which returns a slice. Change the column type to JSON or an array to align with the return value.

Copilot uses AI. Check for mistakes.
Comment on lines 100 to 170
// If the user supplied a time qualifier, translate it to StartTime/EndTime parameters
if quals := d.Quals["time"]; quals != nil {
var startTime, endTime time.Time
for _, q := range quals.Quals {
if ts := q.Value.GetTimestampValue(); ts != nil {
t := ts.AsTime()
switch q.Operator {
case "=":
startTime, endTime = t, t
case ">":
startTime = t.Add(time.Nanosecond)
case ">=":
startTime = t
case "<":
endTime = t
case "<=":
endTime = t
}
}
}
if !startTime.IsZero() {
call.StartTime(startTime.Format(time.RFC3339))
}
if !endTime.IsZero() {
call.EndTime(endTime.Format(time.RFC3339))
}
}

// Pagination setup
pageToken := ""
const apiMaxPageSize = 1000

// Determine initial page size based on SQL LIMIT
var initialPageSize int64 = apiMaxPageSize
if limit := d.QueryContext.Limit; limit != nil && *limit < initialPageSize {
initialPageSize = *limit
}
call.MaxResults(initialPageSize)

for {
if pageToken != "" {
call.PageToken(pageToken)
}
resp, err := call.Do()
if err != nil {
plugin.Logger(ctx).Error("googleworkspace_admin_reports_admin_activity.list", "api_error", err)
return nil, err
}
for _, activity := range resp.Items {
d.StreamListItem(ctx, activity)
if d.RowsRemaining(ctx) == 0 {
return nil, nil
}
}
if resp.NextPageToken == "" {
break
}
pageToken = resp.NextPageToken

if limit := d.QueryContext.Limit; limit != nil {
remaining := d.RowsRemaining(ctx)
if remaining > 0 && remaining < apiMaxPageSize {
call.MaxResults(int64(remaining))
} else {
call.MaxResults(apiMaxPageSize)
}
} else {
call.MaxResults(apiMaxPageSize)
}
}

Copy link

Copilot AI Jul 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time qualifier parsing and pagination logic is duplicated across multiple table implementations. Consider extracting common helpers to reduce code duplication and improve maintainability.

Suggested change
// If the user supplied a time qualifier, translate it to StartTime/EndTime parameters
if quals := d.Quals["time"]; quals != nil {
var startTime, endTime time.Time
for _, q := range quals.Quals {
if ts := q.Value.GetTimestampValue(); ts != nil {
t := ts.AsTime()
switch q.Operator {
case "=":
startTime, endTime = t, t
case ">":
startTime = t.Add(time.Nanosecond)
case ">=":
startTime = t
case "<":
endTime = t
case "<=":
endTime = t
}
}
}
if !startTime.IsZero() {
call.StartTime(startTime.Format(time.RFC3339))
}
if !endTime.IsZero() {
call.EndTime(endTime.Format(time.RFC3339))
}
}
// Pagination setup
pageToken := ""
const apiMaxPageSize = 1000
// Determine initial page size based on SQL LIMIT
var initialPageSize int64 = apiMaxPageSize
if limit := d.QueryContext.Limit; limit != nil && *limit < initialPageSize {
initialPageSize = *limit
}
call.MaxResults(initialPageSize)
for {
if pageToken != "" {
call.PageToken(pageToken)
}
resp, err := call.Do()
if err != nil {
plugin.Logger(ctx).Error("googleworkspace_admin_reports_admin_activity.list", "api_error", err)
return nil, err
}
for _, activity := range resp.Items {
d.StreamListItem(ctx, activity)
if d.RowsRemaining(ctx) == 0 {
return nil, nil
}
}
if resp.NextPageToken == "" {
break
}
pageToken = resp.NextPageToken
if limit := d.QueryContext.Limit; limit != nil {
remaining := d.RowsRemaining(ctx)
if remaining > 0 && remaining < apiMaxPageSize {
call.MaxResults(int64(remaining))
} else {
call.MaxResults(apiMaxPageSize)
}
} else {
call.MaxResults(apiMaxPageSize)
}
}
// Parse time qualifiers using helper function
startTime, endTime := parseTimeQualifiers(d)
if !startTime.IsZero() {
call.StartTime(startTime.Format(time.RFC3339))
}
if !endTime.IsZero() {
call.EndTime(endTime.Format(time.RFC3339))
}
// Handle pagination using helper function
err = handlePagination(ctx, d, call)
if err != nil {
plugin.Logger(ctx).Error("googleworkspace_admin_reports_admin_activity.list", "pagination_error", err)
return nil, err
}

Copilot uses AI. Check for mistakes.
@ParthaI
Copy link
Contributor

ParthaI commented Jul 7, 2025

Hi @assakafpix, 👋

Great to see the PR — thanks for putting it together!

I had a few suggestions regarding the table design that I’d love your thoughts on. Currently, the PR adds five separate tables with the same schema, key columns, and API call — the only difference being the input parameter (applicationName).

Would it be possible to consolidate these into a single table that can serve all reports dynamically based on the input parameter? This could help improve maintainability and performance.

📌 Proposed table design:

  • Introduce a single table named googleworkspace_admin_report_activity (open to naming improvements).
  • Add a required key column application_name to represent the application type to query (e.g., admin, drive, mobile, login, token).
  • Optionally, default this column to a specific value, but requiring it in the ListConfig would help keep things explicit.
  • Update the documentation to mention the purpose of application_name and list supported values.
  • Example table definition outline:
func tableGoogleworkspaceAdminReportActivity(ctx context.Context) *plugin.Table {
    return &plugin.Table{
        Name:        "googleworkspace_admin_report_activity",
        Description: "Audit logs of administrative actions across your Workspace domain.",
        List: &plugin.ListConfig{
            Hydrate: listGoogleworkspaceAdminReportActivities,
            KeyColumns: plugin.KeyColumnSlice{
                {Name: "application_name", Require: plugin.Required},
                // Additional optional key quals
            },
            Tags: map[string]string{"service": "admin", "product": "reports", "action": "activities.list"},
        },
        Columns: []*plugin.Column{
            {
                Name:        "application_name",
                Description: "Application name for which the events are to be retrieved.",
                Type:        proto.ColumnType_STRING,
                Transform:   transform.FromValue("application_name"),
            },
            {
                Name:        "time",
                Description: "Timestamp of the activity (Id.Time) in RFC3339 format",
                Type:        proto.ColumnType_TIMESTAMP,
                Transform:   transform.FromField("Id.Time"),
            },
            // Additional columns here
        },
    }
}

And the listGoogleworkspaceAdminReportActivities function would use application_name dynamically:

applicationName := d.EqualsQuals["application_name"].GetStringValue()
call := service.Activities.List("all", applicationName)

Example usage:

select * from googleworkspace_admin_report_activity where application_name = 'admin';
select * from googleworkspace_admin_report_activity where application_name = 'drive';
select * from googleworkspace_admin_report_activity where application_name = 'mobile';

This approach simplifies the codebase and allows us to reuse logic while still offering flexibility to query per application.

Let me know what you think — happy to iterate on this!

Thanks again! 🙌

@assakafpix
Copy link
Contributor Author

assakafpix commented Jul 7, 2025

Hi @ParthaI !

Thanks for the quick response! Merging the five tables into one is a really good idea and I didn’t think of it. The main trade‑off, however, is that I’ve already added table‑specific columns (for example, file_name in the Drive table, app_name in the Token table, and user_email in the Admin table). I could drop these specialized fields, but then our queries start to feel cumbersome.

For instance, if I want all the Google Drive logs related to my file Passwords.txt :

Current setup (five separate tables):

SELECT *
FROM googleworkspace_admin_reports_drive_activity
WHERE file_name = 'Passwords.txt';

Single “big” table (no specialized columns):

Option 1, using jsonb_array_elements:

SELECT a.*
FROM googleworkspace_admin_reports_activity AS a
CROSS JOIN LATERAL jsonb_array_elements(a.events)            AS evt
CROSS JOIN LATERAL jsonb_array_elements(evt->'parameters') AS param
WHERE a.application_name = 'drive'
  AND param->>'name'  = 'doc_title'
  AND param->>'value' = 'Passwords.txt';

Option 2, using JSONPath:

SELECT *
FROM googleworkspace_admin_reports_activity
WHERE application_name = 'drive'
  AND events @? '$[*].parameters[*] ? (@.name == "doc_title" && @.value == "Passwords.txt")';

Both feel a bit heavy for what should be a simple filter on file_name. We could keep all specialized columns in the unified table and simply select the ones we need in each query, but that may become tedious to maintain.

Given your experience with Steampipe, which approach do you think strikes the best balance between simplicity, performance, and maintainability?

@ParthaI
Copy link
Contributor

ParthaI commented Jul 8, 2025

Hello @assakafpix,

Thank you for the detailed update — I really appreciate the thorough explanation.

Here are my thoughts:

  • Since the API is the same and returns a consistent Go structure across all activity report types, I would lean toward creating a single consolidated table.

  • Creating separate tables based on input parameters seems redundant and could introduce unnecessary complexity.

  • The table schema should be derived directly from the API response. Specifically:

    • We should only extract fields into the schema if the corresponding API response is a JSON object — not an array of JSON objects.
    • It appears that the event_name data type is currently set as a string, but since we are iterating over multiple events and the extractEventNames function returns an array of strings, this should be updated to use a JSON column type.
    • Regarding the file_name extraction using the extractEventParameter function — the current approach seems to return the value as soon as a match is found. However, since multiple events may contain the same property within a single row, this method could return inaccurate data.

Based on the above, here’s what I would suggest:

  • Keep the table design simple:

    • Use a single table for all activity report types.
    • Structure the schema to match the API response closely. The additional extracted fields like time, actor_email, and unique_qualifier are appropriate since they are part of a JSON object and not an array.
    • Remove the use of the extractEventParameter function, and any related columns.
    • Avoid extracting properties from the Events field since it is an array of JSON objects.
    • Add a column named application_name and make it a required qualifier.
    • Any other input parameters that are specific to an application_name should be added as additional optional key qualifiers.

Additionally, it's important that the documentation for the table clearly explains its behavior:

  • Include notes on how to use the table and which optional qualifiers apply to which application_name if any.
  • For reference, you can follow the approach in the "Important Notes" section of this example.
  • The table documentation should also provide example queries for different application_name values, including how to extract specific event properties where applicable.

Please let me know if you have any questions or concerns.
Thanks again!

@assakafpix
Copy link
Contributor Author

Hi @ParthaI !
Thanks for the feedback! I’ve merged all five tables into a single one. Please let me know if there’s anything else I can improve.
Thanks in advance!

Copy link
Contributor

@ParthaI ParthaI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @assakafpix, I’ve added a few review comments — whenever you have a moment, could you please take a look? Thank you!

@misraved, feel free to review the table naming as well and share any suggestions you might have. Your input is always appreciated!

List: &plugin.ListConfig{
Hydrate: listGoogleworkspaceAdminReportsActivities,
KeyColumns: plugin.KeyColumnSlice{
{Name: "application_name", Require: plugin.Required, Operators: []string{"="}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{Name: "application_name", Require: plugin.Required, Operators: []string{"="}},
{Name: "application_name", Require: plugin.Required},

{Name: "time", Require: plugin.Optional, Operators: []string{">", ">=", "<", "<=", "="}},
{Name: "actor_email", Require: plugin.Optional},
{Name: "ip_address", Require: plugin.Optional},
{Name: "event_names", Require: plugin.Optional},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The event_names column is of type JSON. Would it make sense to include it as an optional qualifier?

If so, here are a couple of points to consider:

  • It should be implemented in line with the approach we discussed in the AWS plugin.
    Reference: PR #2466 – AWS plugin

  • If the input parameter supports only a single event name, we could consider introducing a new column named event_name (singular).
    This column could be populated directly from the query input — only when the user specifies it explicitly in the WHERE clause.

Name: "ip_address",
Description: "IP address associated with the activity (IpAddress)",
Type: proto.ColumnType_STRING,
Transform: transform.FromField("IpAddress"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a custom transform function(transform.FromField("IpAddress")) necessary in this case? Would the plugin-level default transform function (DefaultTransform: transform.FromCamel().NullIfZero()) defined in plugin.go be sufficient?

call.EndTime(endTime.Format(time.RFC3339))
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also take care of the remaining optional qualifiers defined in the list config? I noticed they don't appear to be used when making the API call.

break
}
pageToken = resp.NextPageToken
if limit := d.QueryContext.Limit; limit != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please update the pagination logic to align with the coding structure used in other tables? The goal is to maintain consistency across the codebase unless there's a strong reason to deviate from the standard approach.

Comment on lines 15 to 25
You can optionally filter by:

* `time` — filter events before/after a specific timestamp (RFC3339 format).
* `actor_email` — the email of the user or service account performing the action.
* `ip_address` — the source IP of the event.
* `event_names` — names of the audit events (e.g., `create_file`, `deleted`, `created_note`).

**Important Notes**

* **Required Qualifier**: You must include `where application_name = '<app>'` in every query (no default).
* **Time Filter**: For performance, use the optional `time` qualifier to limit the result set to a specific period.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please update the Important Notes section to follow the format we use in other tables? For reference, you can take a look at the gcp_logging_log_entry table in the GCP plugin: link. It would also be helpful to include a link to the list of supported values for the application_name column, if available.

## Examples

### List all Drive events in the last hour

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

```

### List all password changes performed by administrators on users

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@assakafpix
Copy link
Contributor Author

Hi @ParthaI,
Thank you for your feedback, it really helps me deepen my understanding of Steampipe. I’ve implemented the following:

  • New event_name column (type STRING):

    • When filtered (in WHERE event_name = …), it’s populated directly from the query input and pushed down to the Reports API.

    • Otherwise, it falls back to the first element of the JSON events array.
      This “else” ensures that if, for example, you run:

SELECT * 
FROM googleworkspace_admin_reports_activity 
WHERE application_name = 'login';

then subsequently:

SELECT * 
FROM googleworkspace_admin_reports_activity 
WHERE application_name = 'login'
      AND event_name = 'login_failure';

the second query will not hit a stale cache entry (since event_name is now always hydrated).

  • Documentation updates:
    I added a link to the full list of supported application names and updated the Important notes section.

  • I added the rest of the needed quals to improve performance.

  • I updated the pagination logic.

Please let me know if you see any issues or have further suggestions. Thanks again!

Copy link
Contributor

@ParthaI ParthaI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @assakafpix,
Great to see that most of the review comments have been addressed — it’s looking good overall!

I’ve just added a comment regarding how the event_name column/optional key qualifier should behave. When you get a chance, could you please:

  • Take a final look at that comment regarding the column event_name and push the corresponding changes.

  • Fix the missing period (.) at the end of each column description for consistency.

  • Improve the column descriptions across the table:

    • Avoid including response property names or structures in the descriptions.
    • Provide meaningful explanations of what each column represents.
  • Add any missing descriptions for the columns.

Thanks again for your work on this!

return names, nil
}

func getEventName(ctx context.Context, d *plugin.QueryData, h *plugin.HydrateData) (interface{}, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick note — the current implementation always returns the first event name from the Events array, which doesn't appear to be correct in all scenarios.

What I was suggesting earlier is to use an optional key qualifier with exact match caching, like so:

import "github.com/turbot/steampipe-plugin-sdk/v5/query_cache"

{Name: "event_name", Require: plugin.Optional, CacheMatch: query_cache.CacheMatchExact}, // CacheMatch ensures the cache is bypassed if the 'event_name' in the WHERE clause doesn't exactly match.

Then, in the column definition, we can rely on the qualifier:

// This value will only be populated if the user provides it in the WHERE clause.
{
    Name:        "event_name",
    Description: "The name of the event.",
    Type:        proto.ColumnType_STRING,
    Transform:   transform.FromQual("event_name"),
},

The rest of the code can remain unchanged. Let me know if you'd like me to help refactor it.

@rajlearner17 rajlearner17 added help wanted We are asking the community to submit a PR to resolve this issue and removed help wanted We are asking the community to submit a PR to resolve this issue labels Jul 11, 2025
@assakafpix
Copy link
Contributor Author

Hi @ParthaI,
Thanks for the feedback. I just changed the event_name column. I also added the missing qual for the customer_id column.
Let me know if there’s anything else you’d like me to address.

@ParthaI ParthaI changed the base branch from main to add-table-admin-audit-report July 15, 2025 06:20
@ParthaI ParthaI merged commit 8103bba into turbot:add-table-admin-audit-report Jul 15, 2025
misraved pushed a commit that referenced this pull request Jul 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants