Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example of accessing S3 files #25

Open
hrbrmstr opened this issue Dec 21, 2018 · 3 comments
Open

Add example of accessing S3 files #25

hrbrmstr opened this issue Dec 21, 2018 · 3 comments
Assignees
Milestone

Comments

@hrbrmstr
Copy link
Owner

https://issues.apache.org/jira/browse/DRILL-6662 makes it possible to use non-hardcoded creds so it finally makes sense to add some examples of how to query S3 data.

@davidski
Copy link

If your example could include referencing a specific regional endpoint, that would be A++ good. My first attempt at getting IAM roles and a regionally scoped bucket call to work failed and I've yet to go back and make another attempt.

@hrbrmstr hrbrmstr added this to the 0.8.0 release milestone Jan 22, 2019
@hrbrmstr hrbrmstr self-assigned this Jan 22, 2019
@hrbrmstr
Copy link
Owner Author

I gave it a quick try the day 1.15.0 came out but didn't go back to it.

@davidski
Copy link

Looks like I just needed to come back to this. Got this working on a us-west-2 S3 endpoint with the following (excessively verbose) storage configuration:

{
  "type": "file",
  "connection": "s3a://cloudy-mccloudface",
  "config": {
    "fs.s3a.aws.credentials.provider": "com.amazonaws.auth.InstanceProfileCredentialsProvider",
    "fs.s3a.endpoint": "s3.us-west-2.amazonaws.com"
  },
  "workspaces": {
    "tmp": {
      "location": "/tmp",
      "writable": true,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    },
    "root": {
      "location": "/",
      "writable": false,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    },
    "csvs": {
      "location": "/csvs",
      "writable": false,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    }
  },
  "formats": {
    "psv": {
      "type": "text",
      "extensions": [
        "tbl"
      ],
      "delimiter": "|"
    },
    "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "delimiter": ","
    },
    "tsv": {
      "type": "text",
      "extensions": [
        "tsv"
      ],
      "delimiter": "\t"
    },
    "parquet": {
      "type": "parquet"
    },
    "json": {
      "type": "json",
      "extensions": [
        "json"
      ]
    },
    "avro": {
      "type": "avro"
    },
    "sequencefile": {
      "type": "sequencefile",
      "extensions": [
        "seq"
      ]
    },
    "csvh": {
      "type": "text",
      "extensions": [
        "csvh"
      ],
      "extractHeader": true,
      "delimiter": ","
    }
  },
  "enabled": true
}

A bit too shagged out to write this up properly right now, so dumping the config as a reminder for later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants