Skip to content

Commit

Permalink
Merge pull request #21 from DARPA-ASKEM/working-with-data
Browse files Browse the repository at this point in the history
Add list of available and recommended packages
  • Loading branch information
mecrouch authored Nov 25, 2024
2 parents b0b7fef + 955cc37 commit acc920e
Show file tree
Hide file tree
Showing 2 changed files with 134 additions and 11 deletions.
144 changes: 133 additions & 11 deletions docs/datasets/transform-dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ In a workflow, the Transform dataset operator takes one or more datasets or simu

You can choose any step in your transformation process as the thumbnail preview.

<figure markdown>![Transform dataset that takes two datasets as inputs and displays choropleth comparison maps](../img/data/transform-operator.png)<figcaption markdown>**How it works**: The Transform dataset operator is an interactive code notebook with a [pandas](https://pandas.pydata.org/) :octicons-link-external-24:{ alt="External link" title="External link" } dataframe. See the [pandas User Guide](https://pandas.pydata.org/docs/user_guide/index.html#user-guide) :octicons-link-external-24:{ alt="External link" title="External link" } for information on using pandas to work with data.</figcaption></figure>
<figure markdown>![Transform dataset that takes two datasets as inputs and displays choropleth comparison maps](../img/data/transform-operator.png)<figcaption markdown>**How it works**: The Transform dataset operator is an interactive code notebook with a [pandas](https://pandas.pydata.org/) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } dataframe. See the [pandas User Guide](https://pandas.pydata.org/docs/user_guide/index.html#user-guide) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for information on using pandas to work with data.</figcaption></figure>

<div class="grid cards" markdown>

Expand Down Expand Up @@ -84,7 +84,7 @@ You can choose any step in your transformation process as the thumbnail preview.

## Modify data in the Transform dataset code notebook

Inside the Transform dataset operator is a code notebook in which you can prompt an AI assistant to answer questions about or modify your data. If you're comfortable writing code, you can also edit anything the assistant generates or add your own custom code (in a variety of languages).
Inside the Transform dataset operator is a code notebook. In the notebook, you can prompt an AI assistant to answer questions about or modify your data. If you're comfortable writing code, you can edit anything the assistant creates or add your own custom code.

![Transform dataset code notebook in which the AI assistant creates a new comparison](../img/data/transform-notebook.png)

Expand Down Expand Up @@ -125,8 +125,11 @@ The Transform dataset AI assistant interprets plain language to answer questions

??? list "Prompt or question the AI assistant"

1. Use the text box at the top of the page to enter a question or describe the transformation you want to make and then click <span class="sr-only" id="submit-icon-label">Submit</span> :octicons-paper-airplane-24:{ style="transform: rotate(-45deg);" title="Submit" aira-labelledby="Submit" }.
2. Scroll down to the new code cell to inspect the transformation.
1. Click in the text box at the top of the page and then perform one of the following actions:
- Select one of the suggested prompts and edit it to fit your dataset and the transformation you want to make.
- Enter a question or describe the transformation you want to make.
2. Click <span class="sr-only" id="submit-icon-label">Submit</span> :octicons-paper-airplane-24:{ style="transform: rotate(-45deg);" title="Submit" aira-labelledby="Submit" }.
3. Scroll down to the new code cell to inspect the transformation.

??? list "Choose where to insert a prompt"

Expand Down Expand Up @@ -184,7 +187,132 @@ When the response is complete, the code cell may also contain:

### Add or edit code

At any time, you can edit the code generated by the AI assistant or enter your own custom code.
At any time, you can edit the code generated by the AI assistant or enter your own custom code. The notebook environment supports the following languages, each extended with commonly used data manipulation and scientific operation libraries.

???+ note

The use of Julia is currently disabled.

<div class="grid cards" markdown>

- :simple-python:{ .lg .middle aria-hidden="true" } __[Python](https://docs.python.org/3.10/) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } libraries__

---

- [pandas](https://pandas.pydata.org/pandas-docs/version/1.3/user_guide/index.html) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for organizing, cleaning, and analyzing data tables and time series.
- [numpy](https://numpy.org/doc/1.24/user/absolute_beginners.html) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for handling of large arrays of numbers and performing mathematical operations.
- [scipy](https://docs.scipy.org/doc/scipy-1.11.4/index.html) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for performing advanced scientific operations, including optimization, integration, and interpolation.
- [pickle](https://docs.python.org/3/library/pickle.html) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for saving and reloading complex data structures.

- :simple-julia:{ .lg .middle aria-hidden="true" } __[Julia](https://docs.julialang.org/en/v1.10/) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } libraries__

---

- [DataFrames](https://dataframes.juliadata.org/stable/man/getting_started/) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for manipulating data tables.
- [CSV](https://csv.juliadata.org/stable/#Overview) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for reading, writing, and processing CSV files.
- [HTTP](https://juliaweb.github.io/HTTP.jl/stable/) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for sending and receiving data over the Internet.
- [JSON3](https://github.com/quinnj/JSON3.jl) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for working with JSON data.
- [DisplayAs](https://github.com/tkf/DisplayAs.jl) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for displaying data.

- :simple-r:{ .lg .middle aria-hidden="true" } __[R](https://cran.r-project.org/doc/manuals/r-release/R-intro.html) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } libraries__

---

- [data.frame](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/data.frame) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for manipulating data tables.


</div>

???+ tip

More libraries are available in the code notebook, but you may need to import them before use.

1. To list the available packages, click :octicons-plus-24:{ aria-hidden="true" } **Add a cell** and then enter and run:

=== "Python"

```bash
pip list
```

=== "Julia"

```julia
Pkg.installed()
```

=== "R"

```r
installed.packages()
```

2. To import a package, click :octicons-plus-24:{ aria-hidden="true" } **Add a cell** and then enter and run:

=== "Python"

```bash
import <package_name>
```

=== "Julia"

```julia
using <package_name>
```

=== "R"

```r
library(<package_name>)
```

??? tip "Additional libraries that may be useful for data transformations"

<div class="grid cards" markdown>

- __Data manipulation and analysis__
---
- [dask](https://docs.dask.org/en/stable/) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for handling large datasets and computations efficiently.
- [geopandas](https://geopandas.org/en/v0.13.2/docs.html) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for working with geographic data in tables.
- [xarray](https://docs.xarray.dev/en/v0.19.0/) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for managing and analyzing multidimensional datasets.


- __Data visualization__
---
- [cartopy](https://scitools.org.uk/cartopy/docs/latest/) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for creating maps and visualizing geographic data.
- [matplotlib](https://matplotlib.org/3.7.5/users/index.html) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for creating static, animated, and interactive visualizations.

- __Machine learning__
---
- [scikit-learn](https://scikit-learn.org/1.4/user_guide.html) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for creating machine learning models.
- [torch](https://pytorch.org/docs/2.5/) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for building, training, and experimenting with machine learning models.

- __Image processing__
---
- [scikit-image](https://scikit-image.org/docs/stable/) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for processing and analyzing images.

- __Graph and network analysis__
---
- [networkx](https://networkx.org/documentation/stable/reference/) :octicons-link-external-24:{ aria-hidden="true" alt="External link" title="External link" } for working with networks and graphs.

</div>

??? list "Change the language of the code notebook"

The Transform dataset AI assistant writes Python code by default. You can switch between Python, R, or Julia code at any time.

- Use the **language** drop down above the code cells.

??? list "Make changes to a transformation"

Expand All @@ -202,12 +330,6 @@ At any time, you can edit the code generated by the AI assistant or enter your o
1. Select the cell above where you want to insert the new code cell.
2. Click :octicons-plus-24:{ aria-hidden="true" } **Add a cell**.

??? list "Change the language of generated code"

The Transform dataset AI assistant writes Python code by default. You can switch between Python, R, or Julia code at any time.

- Use the **language** drop down above the code cells.

## Save transformed data

Saved datasets appear in your Resources panel and the output of the Transform dataset operator.
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ theme:
- navigation.instant
- navigation.footer
- content.code.copy
- content.tabs.link
logo_dark_mode: img/terarium-logo-dark.svg
logo_light_mode: img/terarium-logo-light.svg
palette:
Expand Down

0 comments on commit acc920e

Please sign in to comment.