Skip to content

Commit

Permalink
More Go-package restructuring (#748)
Browse files Browse the repository at this point in the history
  • Loading branch information
johnkerl authored Nov 12, 2021
1 parent f597ec3 commit bc72cd1
Show file tree
Hide file tree
Showing 187 changed files with 552 additions and 783 deletions.
29 changes: 1 addition & 28 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,41 +1,14 @@
go/mlr
go/mlr.exe
mlr
mlr.exe
a.out
*.dSYM
catc
catc0
catm
gmon.out
*.o
*.pyc
.sw?
.*.sw?
tags
*~
mlr-[0-9.]*.tar.*
push2
data/.gitignore

docs/_build
docs6/_build

man/man1

miller-*.src.rpm
mlr.exe
mlr.linux.x86_64
mlr.macosx

data/big.*
data/nmc?.*

experiments/dsl-parser/one/src
experiments/dsl-parser/one/main
experiments/dsl-parser/two/src
experiments/dsl-parser/two/main
experiments/cli-parser/cliparse
experiments/cli-parser/cliparse.exe

docs/src/polyglot-dkvp-io/__pycache__
docs/site/
1 change: 1 addition & 0 deletions .vimrc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
map \d :w<C-m>:!clear;echo Building ...; echo; make build<C-m>
32 changes: 19 additions & 13 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,25 @@ PREFIX=/usr/local
INSTALLDIR=$(PREFIX)/bin

build:
go build

check:
# Unit tests (small number)
go test -v mlr/internal/pkg/...
# Regression tests (large number)
#
# See ./regression_test.go for information on how to get more details
# for debugging. TL;DR is for CI jobs, we have 'go test -v'; for
# interactive use, instead of 'go test -v' simply use 'mlr regtest
# -vvv' or 'mlr regtest -s 20'. See also src/auxents/regtest.
go test -v
go build github.com/johnkerl/miller/cmd/mlr

check: unit_test regression_test

# Unit tests (small number)
unit_test:
go test github.com/johnkerl/miller/internal/pkg/...

# Regression tests (large number)
#
# See ./regression_test.go for information on how to get more details
# for debugging. TL;DR is for CI jobs, we have 'go test -v'; for
# interactive use, instead of 'go test -v' simply use 'mlr regtest
# -vvv' or 'mlr regtest -s 20'. See also internal/pkg/auxents/regtest.
regression_test:
go test -v regression_test.go

# DESTDIR is for package installs; nominally blank when this is run interactively.
# See also https://www.gnu.org/prep/standards/html_node/DESTDIR.html
install: build
cp mlr $(DESTDIR)/$(INSTALLDIR)
make -C man install
Expand Down Expand Up @@ -51,4 +57,4 @@ release_tarball: build check
./create-release-tarball

# Go does its own dependency management, outside of make.
.PHONY: build check fmt dev
.PHONY: build check unit_test regression_test fmt dev
57 changes: 27 additions & 30 deletions README-go-port.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
# Quickstart for developers

* `go build` -- produces the `mlr` executable
* Miller has tens of unit tests and thousands of regression tests:
* `go test mlr/src/...` runs the unit tests.
* `go test` or `mlr regtest` runs the regression tests in `test/cases/`. Using `mlr regtest -h` you can see more options available than are exposed by `go test`.
See `makefile` in the repo base directory.

# Continuous integration

Expand Down Expand Up @@ -57,10 +54,10 @@ During the coding of Miller, I've been guided by the following:
* Names of files, variables, functions, etc. should be fully spelled out (e.g. `NewEvaluableLeafNode`), except for a small number of most-used names where a longer name would cause unnecessary line-wraps (e.g. `Mlrval` instead of `MillerValue` since this appears very very often).
* Code should not be too clever. This includes some reasonable amounts of code duplication from time to time, to keep things inline, rather than lasagna code.
* Things should be transparent. For example, `mlr -n put -v '$y = 3 + 0.1 * $x'` shows you the abstract syntax tree derived from the DSL expression.
* Comments should be robust with respect to reasonably anticipated changes. For example, one package should cross-link to another in its comments, but I try to avoid mentioning specific filenames too much in the comments and README files since these may change over time. I make an exception for stable points such as [mlr.go](./mlr.go), [mlr.bnf](./src/parsing/mlr.bnf), [stream.go](./src/stream/stream.go), etc.
* Comments should be robust with respect to reasonably anticipated changes. For example, one package should cross-link to another in its comments, but I try to avoid mentioning specific filenames too much in the comments and README files since these may change over time. I make an exception for stable points such as [mlr.go](./mlr.go), [mlr.bnf](./internal/pkg/parsing/mlr.bnf), [stream.go](./internal/pkg/stream/stream.go), etc.
* *Miller should be pleasant to write.*
* It should be quick to answer the question *Did I just break anything?* -- hence the `build` and `reg_test/run` regression scripts.
* It should be quick to find out what to do next as you iteratively develop -- see for example [cst/README.md](https://github.com/johnkerl/miller/blob/master/go/src/dsl/cst/README.md).
* It should be quick to find out what to do next as you iteratively develop -- see for example [cst/README.md](./internal/pkg/dsl/cst/README.md).
* *The language should be an asset, not a liability.*
* One of the reasons I chose Go is that (personally anyway) I find it to be reasonably efficient, well-supported with standard libraries, straightforward, and fun. I hope you enjoy it as much as I have.

Expand All @@ -79,10 +76,10 @@ sequence of key-value pairs. The basic **stream** operation is:

So, in broad overview, the key packages are:

* [src/stream](./src/stream) -- connect input -> transforms -> output via Go channels
* [src/input](./src/input) -- read input records
* [src/transforming](./src/transforming) -- transform input records to output records
* [src/output](./src/output) -- write output records
* [internal/pkg/stream](./internal/pkg/stream) -- connect input -> transforms -> output via Go channels
* [internal/pkg/input](./internal/pkg/input) -- read input records
* [internal/pkg/transforming](./internal/pkg/transforming) -- transform input records to output records
* [internal/pkg/output](./internal/pkg/output) -- write output records
* The rest are details to support this.

## Directory-structure details
Expand All @@ -94,31 +91,31 @@ So, in broad overview, the key packages are:
* This package defines the grammar for Miller's domain-specific language (DSL) for the Miller `put` and `filter` verbs. And, GOCC is a joy to use. :)
* It is used on the terms of its open-source license.
* [golang.org/x/term](https://pkg.go.dev/golang.org/x/term):
* Just a one-line Miller callsite for is-a-terminal checking for the [Miller REPL](https://github.com/johnkerl/miller/blob/go-mod/go/src/auxents/repl/README.md).
* Just a one-line Miller callsite for is-a-terminal checking for the [Miller REPL](./internal/pkg/auxents/repl/README.md).
* It is used on the terms of its open-source license.
* See also [./go.mod](go.mod). Setup:
* `go get github.com/goccmack/gocc`
* `go get golang.org/x/term`

### Miller per se

* The main entry point is [mlr.go](./mlr.go); everything else in [src](./src).
* [src/entrypoint](./src/entrypoint): All the usual contents of `main()` are here, for ease of testing.
* [src/platform](./src/platform): Platform-dependent code, which as of early 2021 is the command-line parser. Handling single quotes and double quotes is different on Windows unless particular care is taken, which is what this package does.
* [src/lib](./src/lib):
* Implementation of the [`Mlrval`](./src/types/mlrval.go) datatype which includes string/int/float/boolean/void/absent/error types. These are used for record values, as well as expression/variable values in the Miller `put`/`filter` DSL. See also below for more details.
* [`Mlrmap`](./src/types/mlrmap.go) is the sequence of key-value pairs which represents a Miller record. The key-lookup mechanism is optimized for Miller read/write usage patterns -- please see [mlrmap.go](./src/types/mlrmap.go) for more details.
* [`context`](./src/types/context.go) supports AWK-like variables such as `FILENAME`, `NF`, `NR`, and so on.
* [src/cli](./src/cli) is the flag-parsing logic for supporting Miller's command-line interface. When you type something like `mlr --icsv --ojson put '$sum = $a + $b' then filter '$sum > 1000' myfile.csv`, it's the CLI parser which makes it possible for Miller to construct a CSV record-reader, a transformer-chain of `put` then `filter`, and a JSON record-writer.
* [src/cliutil](./src/cliutil) contains datatypes for the CLI-parser, which was split out to avoid a Go package-import cycle.
* [src/stream](./src/stream) is as above -- it uses Go channels to pipe together file-reads, to record-reading/parsing, to a chain of record-transformers, to record-writing/formatting, to terminal standard output.
* [src/input](./src/input) is as above -- one record-reader type per supported input file format, and a factory method.
* [src/output](./src/output) is as above -- one record-writer type per supported output file format, and a factory method.
* [src/transforming](./src/transforming) contains the abstract record-transformer interface datatype, as well as the Go-channel chaining mechanism for piping one transformer into the next.
* [src/transformers](./src/transformers) is all the concrete record-transformers such as `cat`, `tac`, `sort`, `put`, and so on. I put it here, not in `transforming`, so all files in `transformers` would be of the same type.
* [src/parsing](./src/parsing) contains a single source file, `mlr.bnf`, which is the lexical/semantic grammar file for the Miller `put`/`filter` DSL using the GOCC framework. All subdirectories of `src/parsing/` are autogen code created by GOCC's processing of `mlr.bnf`. If you need to edit `mlr.bnf`, please use [tools/build-dsl](./tools/build-dsl) to autogenerate Go code from it (using the GOCC tool). (This takes several minutes to run.)
* [src/dsl](./src/dsl) contains [`ast_types.go`](src/dsl/ast_types.go) which is the abstract syntax tree datatype shared between GOCC and Miller. I didn't use a `src/dsl/ast` naming convention, although that would have been nice, in order to avoid a Go package-dependency cycle.
* [src/dsl/cst](./src/dsl/cst) is the concrete syntax tree, constructed from an AST produced by GOCC. The CST is what is actually executed on every input record when you do things like `$z = $x * 0.3 * $y`. Please see the [src/dsl/cst/README.md](./src/dsl/cst/README.md) for more information.
* The main entry point is [mlr.go](./mlr.go); everything else in [internal/pkg](./internal/pkg).
* [internal/pkg/entrypoint](./internal/pkg/entrypoint): All the usual contents of `main()` are here, for ease of testing.
* [internal/pkg/platform](./internal/pkg/platform): Platform-dependent code, which as of early 2021 is the command-line parser. Handling single quotes and double quotes is different on Windows unless particular care is taken, which is what this package does.
* [internal/pkg/lib](./internal/pkg/lib):
* Implementation of the [`Mlrval`](./internal/pkg/types/mlrval.go) datatype which includes string/int/float/boolean/void/absent/error types. These are used for record values, as well as expression/variable values in the Miller `put`/`filter` DSL. See also below for more details.
* [`Mlrmap`](./internal/pkg/types/mlrmap.go) is the sequence of key-value pairs which represents a Miller record. The key-lookup mechanism is optimized for Miller read/write usage patterns -- please see [mlrmap.go](./internal/pkg/types/mlrmap.go) for more details.
* [`context`](./internal/pkg/types/context.go) supports AWK-like variables such as `FILENAME`, `NF`, `NR`, and so on.
* [internal/pkg/cli](./internal/pkg/cli) is the flag-parsing logic for supporting Miller's command-line interface. When you type something like `mlr --icsv --ojson put '$sum = $a + $b' then filter '$sum > 1000' myfile.csv`, it's the CLI parser which makes it possible for Miller to construct a CSV record-reader, a transformer-chain of `put` then `filter`, and a JSON record-writer.
* [internal/pkg/cliutil](./internal/pkg/cliutil) contains datatypes for the CLI-parser, which was split out to avoid a Go package-import cycle.
* [internal/pkg/stream](./internal/pkg/stream) is as above -- it uses Go channels to pipe together file-reads, to record-reading/parsing, to a chain of record-transformers, to record-writing/formatting, to terminal standard output.
* [internal/pkg/input](./internal/pkg/input) is as above -- one record-reader type per supported input file format, and a factory method.
* [internal/pkg/output](./internal/pkg/output) is as above -- one record-writer type per supported output file format, and a factory method.
* [internal/pkg/transforming](./internal/pkg/transforming) contains the abstract record-transformer interface datatype, as well as the Go-channel chaining mechanism for piping one transformer into the next.
* [internal/pkg/transformers](./internal/pkg/transformers) is all the concrete record-transformers such as `cat`, `tac`, `sort`, `put`, and so on. I put it here, not in `transforming`, so all files in `transformers` would be of the same type.
* [internal/pkg/parsing](./internal/pkg/parsing) contains a single source file, `mlr.bnf`, which is the lexical/semantic grammar file for the Miller `put`/`filter` DSL using the GOCC framework. All subdirectories of `internal/pkg/parsing/` are autogen code created by GOCC's processing of `mlr.bnf`. If you need to edit `mlr.bnf`, please use [tools/build-dsl](./tools/build-dsl) to autogenerate Go code from it (using the GOCC tool). (This takes several minutes to run.)
* [internal/pkg/dsl](./internal/pkg/dsl) contains [`ast_types.go`](internal/pkg/dsl/ast_types.go) which is the abstract syntax tree datatype shared between GOCC and Miller. I didn't use a `internal/pkg/dsl/ast` naming convention, although that would have been nice, in order to avoid a Go package-dependency cycle.
* [internal/pkg/dsl/cst](./internal/pkg/dsl/cst) is the concrete syntax tree, constructed from an AST produced by GOCC. The CST is what is actually executed on every input record when you do things like `$z = $x * 0.3 * $y`. Please see the [internal/pkg/dsl/cst/README.md](./internal/pkg/dsl/cst/README.md) for more information.

## Nil-record conventions

Expand Down Expand Up @@ -150,15 +147,15 @@ nil through the reader/transformer/writer sequence.

## More about mlrvals

[`Mlrval`](./src/types/mlrval.go) is the datatype of record values, as well as expression/variable values in the Miller `put`/`filter` DSL. It includes string/int/float/boolean/void/absent/error types, not unlike PHP's `zval`.
[`Mlrval`](./internal/pkg/types/mlrval.go) is the datatype of record values, as well as expression/variable values in the Miller `put`/`filter` DSL. It includes string/int/float/boolean/void/absent/error types, not unlike PHP's `zval`.

* Miller's `absent` type is like Javascript's `undefined` -- it's for times when there is no such key, as in a DSL expression `$out = $foo` when the input record is `$x=3,y=4` -- there is no `$foo` so `$foo` has `absent` type. Nothing is written to the `$out` field in this case. See also [here](http://johnkerl.org/miller/doc/reference.html#Null_data:_empty_and_absent) for more information.
* Miller's `void` type is like Javascript's `null` -- it's for times when there is a key with no value, as in `$out = $x` when the input record is `$x=,$y=4`. This is an overlap with `string` type, since a void value looks like an empty string. I've gone back and forth on this (including when I was writing the C implementation) -- whether to retain `void` as a distinct type from empty-string, or not. I ended up keeping it as it made the `Mlrval` logic easier to understand.
* Miller's `error` type is for things like doing type-uncoerced addition of strings. Data-dependent errors are intended to result in `(error)`-valued output, rather than crashing Miller. See also [here](http://johnkerl.org/miller/doc/reference.html#Data_types) for more information.
* Miller's number handling makes auto-overflow from int to float transparent, while preserving the possibility of 64-bit bitwise arithmetic.
* This is different from JavaScript, which has only double-precision floats and thus no support for 64-bit numbers (note however that there is now [`BigInt`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt)).
* This is also different from C and Go, wherein casts are necessary -- without which int arithmetic overflows.
* See also [here](http://johnkerl.org/miller/doc/reference.html#Arithmetic) for the semantics of Miller arithmetic, which the [`Mlrval`](./src/types/mlrval.go) class implements.
* See also [here](http://johnkerl.org/miller/doc/reference.html#Arithmetic) for the semantics of Miller arithmetic, which the [`Mlrval`](./internal/pkg/types/mlrval.go) class implements.

## Software-testing methodology

Expand Down
7 changes: 3 additions & 4 deletions mlr.go → cmd/mlr/main.go
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Package main is the entry point for Miller.
// This is the entry point for the mlr executable.
package main

import (
Expand All @@ -9,10 +9,9 @@ import (
"runtime/pprof"
"strconv"

"mlr/internal/pkg/entrypoint"
"github.com/johnkerl/miller/internal/pkg/entrypoint"
)

// main is the entry point for Miller.
func main() {

// Respect env $GOMAXPROCS, if provided, else set default.
Expand All @@ -36,7 +35,7 @@ func main() {
// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
// CPU profiling
//
// We do this here not in the command-line parser since
// We do this here, not in the command-line parser, since
// pprof.StopCPUProfile() needs to be called at the very end of everything.
// Putting this pprof logic into a go func running in parallel with main,
// and properly stopping the profile only when main ends via chan-sync,
Expand Down
6 changes: 3 additions & 3 deletions docs/src/build.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ In this example I am using version 6.1.0 to 6.2.0; of course that will change fo

* Update version found in `mlr --version` and `man mlr`:

* Edit `go/src/version/version.go` from `6.1.0-dev` to `6.2.0`.
* Edit `internal/pkpg/version/version.go` from `6.1.0-dev` to `6.2.0`.
* Edit version in `docs/mkdocs.yml` from `6.1.0` to `6.2.0`.
* Run `make dev` in the Miller repo base directory
* The ordering in this makefile rule is important: the first build creates `mlr`; the second runs `mlr` to create `manpage.txt`; the third includes `manpage.txt` into one of its outputs.
Expand All @@ -79,7 +79,7 @@ In this example I am using version 6.1.0 to 6.2.0; of course that will change fo
* `make release_tarball`
* This creates `miller-6.2.0-dev.tar.gz` which we'll upload to GitHub, the URL of which will be in our `miller.spec`
* Get `mlr.{arch}` binaries from latest successful build from [https://github.com/johnkerl/miller/actions](https://github.com/johnkerl/miller/actions), or, build them on buildboxes.
* Prepare the source RPM following [./README-RPM.md](README-RPM.md).
* Prepare the source RPM following `README-RPM.md`.

* Create the Github release tag:

Expand Down Expand Up @@ -119,7 +119,7 @@ git push -u origin miller-6.1.0

* Afterwork:

* Edit `go/src/version/version.go` to change version from `6.2.0` to `6.2.0-dev`.
* Edit `internal/pkg/version/version.go` to change version from `6.2.0` to `6.2.0-dev`.
* `cd go`
* `./build`
* Commit and push.
Expand Down
6 changes: 3 additions & 3 deletions docs/src/build.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ In this example I am using version 6.1.0 to 6.2.0; of course that will change fo

* Update version found in `mlr --version` and `man mlr`:

* Edit `go/src/version/version.go` from `6.1.0-dev` to `6.2.0`.
* Edit `internal/pkpg/version/version.go` from `6.1.0-dev` to `6.2.0`.
* Edit version in `docs/mkdocs.yml` from `6.1.0` to `6.2.0`.
* Run `make dev` in the Miller repo base directory
* The ordering in this makefile rule is important: the first build creates `mlr`; the second runs `mlr` to create `manpage.txt`; the third includes `manpage.txt` into one of its outputs.
Expand All @@ -63,7 +63,7 @@ In this example I am using version 6.1.0 to 6.2.0; of course that will change fo
* `make release_tarball`
* This creates `miller-6.2.0-dev.tar.gz` which we'll upload to GitHub, the URL of which will be in our `miller.spec`
* Get `mlr.{arch}` binaries from latest successful build from [https://github.com/johnkerl/miller/actions](https://github.com/johnkerl/miller/actions), or, build them on buildboxes.
* Prepare the source RPM following [./README-RPM.md](README-RPM.md).
* Prepare the source RPM following `README-RPM.md`.

* Create the Github release tag:

Expand Down Expand Up @@ -103,7 +103,7 @@ GENMD-EOF

* Afterwork:

* Edit `go/src/version/version.go` to change version from `6.2.0` to `6.2.0-dev`.
* Edit `internal/pkg/version/version.go` to change version from `6.2.0` to `6.2.0-dev`.
* `cd go`
* `./build`
* Commit and push.
Expand Down
File renamed without changes.
Loading

0 comments on commit bc72cd1

Please sign in to comment.