Fix typos (Datafusion -> DataFusion) (apache#1993)

* Fix typos (Datafusion -> DataFusion) * revert change to proto files
milevin · Mar 12, 2022 · b702e08 · b702e08
1 parent 8e09b49
commit b702e08
Show file tree

Hide file tree

Showing 18 changed files with 48 additions and 48 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -20,7 +20,7 @@
 Changelogs are maintained separately for each subproject. Please check out the
 changelog file within each subproject folder for more details:
 
-* [Datafusion CHANGELOG](./datafusion/CHANGELOG.md)
+* [DataFusion CHANGELOG](./datafusion/CHANGELOG.md)
 * [Ballista CHANGELOG](./ballista/CHANGELOG.md)
 
 For older versions, see [apache/arrow/CHANGELOG.md](https://github.com/apache/arrow/blob/master/CHANGELOG.md).
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -132,7 +132,7 @@ python -m pytest -v integration-tests/test_psql_parity.py
 
 ### Criterion Benchmarks
 
-[Criterion](https://docs.rs/criterion/latest/criterion/index.html) is a statistics-driven micro-benchmarking framework used by Datafusion for evaluating the performance of specific code-paths. In particular, the criterion benchmarks help to both guide optimisation efforts, and prevent performance regressions within Datafusion.
+[Criterion](https://docs.rs/criterion/latest/criterion/index.html) is a statistics-driven micro-benchmarking framework used by DataFusion for evaluating the performance of specific code-paths. In particular, the criterion benchmarks help to both guide optimisation efforts, and prevent performance regressions within DataFusion.
 
 Criterion integrates with Cargo's built-in [benchmark support](https://doc.rust-lang.org/cargo/commands/cargo-bench.html) and a given benchmark can be run with
 
@@ -160,7 +160,7 @@ The benchmark will automatically remove any generated parquet file on exit, howe
 
 ### Upstream Benchmark Suites
 
-Instructions and tooling for running upstream benchmark suites against Datafusion and/or Ballista can be found in [benchmarks](./benchmarks).
+Instructions and tooling for running upstream benchmark suites against DataFusion and/or Ballista can be found in [benchmarks](./benchmarks).
 
 These are valuable for comparative evaluation against alternative Arrow implementations and query engines.
 
@@ -227,7 +227,7 @@ dot -Tpdf < /tmp/plan.dot > /tmp/plan.pdf
 
 ## Specification
 
-We formalize Datafusion semantics and behaviors through specification
+We formalize DataFusion semantics and behaviors through specification
 documents. These specifications are useful to be used as references to help
 resolve ambiguities during development or code reviews.
 

diff --git a/ballista/rust/core/src/execution_plans/mod.rs b/ballista/rust/core/src/execution_plans/mod.rs
@@ -15,7 +15,7 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! This module contains execution plans that are needed to distribute Datafusion's execution plans into
+//! This module contains execution plans that are needed to distribute DataFusion's execution plans into
 //! several Ballista executors.
 
 mod distributed_query;

diff --git a/conbench/benchmarks.py b/conbench/benchmarks.py
@@ -38,4 +38,4 @@ def _f(self):
 @conbench.runner.register_benchmark
 class CargoBenchmarks(_criterion.CriterionBenchmark):
     name = "datafusion"
-    description = "Run Arrow Datafusion micro benchmarks."
+    description = "Run Arrow DataFusion micro benchmarks."
diff --git a/datafusion-physical-expr/src/expressions/cast.rs b/datafusion-physical-expr/src/expressions/cast.rs
@@ -30,7 +30,7 @@ use datafusion_common::ScalarValue;
 use datafusion_common::{DataFusionError, Result};
 use datafusion_expr::ColumnarValue;
 
-/// provide Datafusion default cast options
+/// provide DataFusion default cast options
 pub const DEFAULT_DATAFUSION_CAST_OPTIONS: CastOptions = CastOptions { safe: false };
 
 /// CAST expression casts an expression to a specific data type and returns a runtime error on invalid cast

diff --git a/datafusion/CHANGELOG.md b/datafusion/CHANGELOG.md
@@ -32,7 +32,7 @@
 - Remove non idiomatic `DataFusionError::into_arrow_external_error` in favor of From conversion [\#1645](https://github.com/apache/arrow-datafusion/pull/1645) ([alamb](https://github.com/alamb))
 - Remove `Accumulator::update` and `Accumulator::merge` [\#1582](https://github.com/apache/arrow-datafusion/pull/1582) ([Jimexist](https://github.com/Jimexist))
 - implement `Hash` for various types and replace `PartialOrd` [\#1580](https://github.com/apache/arrow-datafusion/pull/1580) ([Jimexist](https://github.com/Jimexist))
-- Replace `DatafusionError` with `GenericError` in `ObjectStore` interface [\#1541](https://github.com/apache/arrow-datafusion/pull/1541) ([matthewmturner](https://github.com/matthewmturner))
+- Replace `DataFusionError` with `GenericError` in `ObjectStore` interface [\#1541](https://github.com/apache/arrow-datafusion/pull/1541) ([matthewmturner](https://github.com/matthewmturner))
 - Make `FLOAT` SQL type map to `Float32` rather than `Float64` [\#1423](https://github.com/apache/arrow-datafusion/pull/1423) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([liukun4515](https://github.com/liukun4515))
 - Map `REAL` SQL type to `Float32` rather than `Float64` to be consistent with pg  [\#1390](https://github.com/apache/arrow-datafusion/pull/1390) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([hntd187](https://github.com/hntd187))
 
@@ -79,7 +79,7 @@
 - Add support for `ORDER BY` on unprojected columns [\#1415](https://github.com/apache/arrow-datafusion/pull/1415) ([viirya](https://github.com/viirya))
 - Support decimal for `min` and `max` aggregate [\#1407](https://github.com/apache/arrow-datafusion/pull/1407) ([liukun4515](https://github.com/liukun4515))
 - Consolidate `ConstantFolding` and `SimplifyExpression` [\#1375](https://github.com/apache/arrow-datafusion/pull/1375) ([alamb](https://github.com/alamb))
-- Datafusion cli quiet mode command to contain option bool [\#1345](https://github.com/apache/arrow-datafusion/pull/1345) ([Jimexist](https://github.com/Jimexist))
+- DataFusion cli quiet mode command to contain option bool [\#1345](https://github.com/apache/arrow-datafusion/pull/1345) ([Jimexist](https://github.com/Jimexist))
 - Implement `array_agg` aggregate function [\#1300](https://github.com/apache/arrow-datafusion/pull/1300) ([viirya](https://github.com/viirya))
 - Add a command to switch output format in cli [\#1284](https://github.com/apache/arrow-datafusion/pull/1284) ([capkurmagati](https://github.com/capkurmagati))
 - Support `=`, `<`, `<=`, `>`, `>=`, `!=`, `is distinct from`, `is not distinct from` for `BooleanArray` [\#1163](https://github.com/apache/arrow-datafusion/pull/1163) ([alamb](https://github.com/alamb))
@@ -94,7 +94,7 @@
 - CTE/WITH .. UNION ALL confuses name resolution in WHERE [\#1509](https://github.com/apache/arrow-datafusion/issues/1509)
 - ORDER BY min\(x\) results in error  `Plan("No field named 'foo.x'. Valid fields are 'MIN(foo.x)'.")` [\#1479](https://github.com/apache/arrow-datafusion/issues/1479)
 - Sort discards field metadata on the output schema [\#1476](https://github.com/apache/arrow-datafusion/issues/1476)
-- Datafusion should not strip out timezone information from existing types [\#1454](https://github.com/apache/arrow-datafusion/issues/1454)
+- DataFusion should not strip out timezone information from existing types [\#1454](https://github.com/apache/arrow-datafusion/issues/1454)
 - Error on some queries: "column types must match schema types, expected XXX but found YYY" [\#1447](https://github.com/apache/arrow-datafusion/issues/1447)
 - Query failing to return any results when filter is an equality check on strings \(bad statistics in parquet\) [\#1433](https://github.com/apache/arrow-datafusion/issues/1433)
 - Field names containing period such as `f.c1` cannot be named in SQL query [\#1432](https://github.com/apache/arrow-datafusion/issues/1432)
@@ -111,7 +111,7 @@
 - Fix single\_distinct\_to\_groupby for arbitrary expressions [\#1519](https://github.com/apache/arrow-datafusion/pull/1519) ([james727](https://github.com/james727))
 - Fix SortExec discards field metadata on the output schema [\#1477](https://github.com/apache/arrow-datafusion/pull/1477) ([alamb](https://github.com/alamb))
 - fix calculate in many\_to\_many\_hash\_partition test. [\#1463](https://github.com/apache/arrow-datafusion/pull/1463) ([Ted-Jiang](https://github.com/Ted-Jiang))
-- Add Timezone to Scalar::Time\* types,   and better timezone awareness to Datafusion's time types [\#1455](https://github.com/apache/arrow-datafusion/pull/1455) ([maxburke](https://github.com/maxburke))
+- Add Timezone to Scalar::Time\* types,   and better timezone awareness to DataFusion's time types [\#1455](https://github.com/apache/arrow-datafusion/pull/1455) ([maxburke](https://github.com/maxburke))
 - Support identifiers with `.` in them [\#1449](https://github.com/apache/arrow-datafusion/pull/1449) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([alamb](https://github.com/alamb))
 - Fixes for working with functions in dataframes, additional documentation [\#1430](https://github.com/apache/arrow-datafusion/pull/1430) ([tobyhede](https://github.com/tobyhede))
 - \[Minor\] Fix `send_time` metric for hash-repartition [\#1421](https://github.com/apache/arrow-datafusion/pull/1421) ([Dandandan](https://github.com/Dandandan))
@@ -130,7 +130,7 @@
 
 - Clarify docs about `Accumulator::update` and `Accumulator::update_batch` [\#1542](https://github.com/apache/arrow-datafusion/pull/1542) ([alamb](https://github.com/alamb))
 - Fix duplicated `cargo run --example parquet_sql` [\#1482](https://github.com/apache/arrow-datafusion/pull/1482) ([sergey-melnychuk](https://github.com/sergey-melnychuk))
-- add documentation to Datafusion cli's new commands [\#1348](https://github.com/apache/arrow-datafusion/pull/1348) ([liukun4515](https://github.com/liukun4515))
+- add documentation to DataFusion cli's new commands [\#1348](https://github.com/apache/arrow-datafusion/pull/1348) ([liukun4515](https://github.com/liukun4515))
 - fix some clippy warnings from nightly channel [\#1277](https://github.com/apache/arrow-datafusion/pull/1277) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([Jimexist](https://github.com/Jimexist))
 
 **Performance improvements:**
@@ -470,7 +470,7 @@
 - delete redundant code [\#973](https://github.com/apache/arrow-datafusion/issues/973)
 - How to build DataFusion python wheel  [\#853](https://github.com/apache/arrow-datafusion/issues/853)
 -   Add support for partition pruning [\#204](https://github.com/apache/arrow-datafusion/issues/204)
-- \[Datafusion\] Support joins on TimestampMillisecond columns [\#187](https://github.com/apache/arrow-datafusion/issues/187)
+- \[DataFusion\] Support joins on TimestampMillisecond columns [\#187](https://github.com/apache/arrow-datafusion/issues/187)
 -  TPC-H Query 21 [\#173](https://github.com/apache/arrow-datafusion/issues/173)
 -  TPC-H Query 13 [\#164](https://github.com/apache/arrow-datafusion/issues/164)
 -  TPC-H Query 8 [\#162](https://github.com/apache/arrow-datafusion/issues/162)
@@ -509,7 +509,7 @@ For older versions, see [apache/arrow/CHANGELOG.md](https://github.com/apache/ar
 - Box ScalarValue:Lists, reduce size by half size [\#788](https://github.com/apache/arrow-datafusion/pull/788) ([alamb](https://github.com/alamb))
 - JOIN conditions are order dependent [\#778](https://github.com/apache/arrow-datafusion/pull/778) ([seddonm1](https://github.com/seddonm1))
 - Show the result of all optimizer passes in EXPLAIN VERBOSE [\#759](https://github.com/apache/arrow-datafusion/pull/759) ([alamb](https://github.com/alamb))
-- \#723 Datafusion add option in ExecutionConfig to enable/disable parquet pruning [\#749](https://github.com/apache/arrow-datafusion/pull/749) ([lvheyang](https://github.com/lvheyang))
+- \#723 DataFusion add option in ExecutionConfig to enable/disable parquet pruning [\#749](https://github.com/apache/arrow-datafusion/pull/749) ([lvheyang](https://github.com/lvheyang))
 - Update API for extension planning to include logical plan [\#643](https://github.com/apache/arrow-datafusion/pull/643) ([alamb](https://github.com/alamb))
 - Rename MergeExec to CoalescePartitionsExec [\#635](https://github.com/apache/arrow-datafusion/pull/635) ([andygrove](https://github.com/andygrove))
 - fix 593, reduce cloning by taking ownership in logical planner's `from` fn [\#610](https://github.com/apache/arrow-datafusion/pull/610) ([Jimexist](https://github.com/Jimexist))
@@ -520,7 +520,7 @@ For older versions, see [apache/arrow/CHANGELOG.md](https://github.com/apache/ar
 - Use 4.x arrow-rs from crates.io rather than git sha [\#395](https://github.com/apache/arrow-datafusion/pull/395) ([alamb](https://github.com/alamb))
 - Return Vec\<bool\> from PredicateBuilder rather than an `Fn` [\#370](https://github.com/apache/arrow-datafusion/pull/370) ([alamb](https://github.com/alamb))
 - Refactor: move RowGroupPredicateBuilder into its own module, rename to PruningPredicateBuilder [\#365](https://github.com/apache/arrow-datafusion/pull/365) ([alamb](https://github.com/alamb))
-- \[Datafusion\] NOW\(\) function support [\#288](https://github.com/apache/arrow-datafusion/pull/288) ([msathis](https://github.com/msathis))
+- \[DataFusion\] NOW\(\) function support [\#288](https://github.com/apache/arrow-datafusion/pull/288) ([msathis](https://github.com/msathis))
 - Implement select distinct [\#262](https://github.com/apache/arrow-datafusion/pull/262) ([Dandandan](https://github.com/Dandandan))
 - Refactor datafusion/src/physical\_plan/common.rs build\_file\_list to take less param and reuse code [\#253](https://github.com/apache/arrow-datafusion/pull/253) ([Jimexist](https://github.com/Jimexist))
 - Support qualified columns in queries [\#55](https://github.com/apache/arrow-datafusion/pull/55) ([houqp](https://github.com/houqp))
@@ -718,7 +718,7 @@ For older versions, see [apache/arrow/CHANGELOG.md](https://github.com/apache/ar
 -   RFC Roadmap for 2021 \(DataFusion\) [\#140](https://github.com/apache/arrow-datafusion/issues/140)
 -   Implement hash partitioning [\#131](https://github.com/apache/arrow-datafusion/issues/131)
 -   Grouping by column position [\#110](https://github.com/apache/arrow-datafusion/issues/110)
--  \[Datafusion\] GROUP BY with a high cardinality doesn't seem to finish [\#107](https://github.com/apache/arrow-datafusion/issues/107)
+-  \[DataFusion\] GROUP BY with a high cardinality doesn't seem to finish [\#107](https://github.com/apache/arrow-datafusion/issues/107)
 - \[Rust\]  Add support for JSON data sources [\#103](https://github.com/apache/arrow-datafusion/issues/103)
 - \[Rust\]  Implement metrics framework [\#95](https://github.com/apache/arrow-datafusion/issues/95)
 - Publically export Arrow crate from datafusion  [\#36](https://github.com/apache/arrow-datafusion/issues/36)

diff --git a/datafusion/src/execution/context.rs b/datafusion/src/execution/context.rs
@@ -833,7 +833,7 @@ pub struct ExecutionConfig {
     /// Should DataFusion repartition data using the partition keys to execute window functions in
     /// parallel using the provided `target_partitions` level
     pub repartition_windows: bool,
-    /// Should Datafusion parquet reader using the predicate to prune data
+    /// Should DataFusion parquet reader using the predicate to prune data
     parquet_pruning: bool,
     /// Runtime configurations such as memory threshold and local disk for spill
     pub runtime: RuntimeConfig,

diff --git a/datafusion/src/physical_plan/planner.rs b/datafusion/src/physical_plan/planner.rs
@@ -583,7 +583,7 @@ impl DefaultPhysicalPlanner {
                             // columns with names like `SUM(t1.c1)`, `t1.c1 + t1.c2`, etc.
                             //
                             // If we run these logical columns through physical_name function, we will
-                            // get physical names with column qualifiers, which violates Datafusion's
+                            // get physical names with column qualifiers, which violates DataFusion's
                             // field name semantics. To account for this, we need to derive the
                             // physical name from physical input instead.
                             //

diff --git a/dev/release/README.md b/dev/release/README.md
@@ -21,18 +21,18 @@
 
 ## Sub-projects
 
-The Datafusion repo contains 2 different releasable sub-projects: Datafusion, Ballista
+The DataFusion repo contains 2 different releasable sub-projects: DataFusion, Ballista
 
-We use Datafusion release to drive the release for the other sub-projects. As a
-result, Datafusion version bump is required for every release while version
+We use DataFusion release to drive the release for the other sub-projects. As a
+result, DataFusion version bump is required for every release while version
 bumps for the Python binding and Ballista are optional. In other words, we can
-release a new version of Datafusion without releasing a new version of the
+release a new version of DataFusion without releasing a new version of the
 Python binding or Ballista. On the other hand, releasing a new version of the
-Python binding or Ballista always requires a new Datafusion version release.
+Python binding or Ballista always requires a new DataFusion version release.
 
 ## Branching
 
-Datafusion currently only releases from the `master` branch. Given the project
+DataFusion currently only releases from the `master` branch. Given the project
 is still in early development state, we are not maintaining an active stable
 release backport branch.
 
@@ -177,11 +177,11 @@ Send the email output from the script to [email protected]. The email should
 
 ```
 To: [email protected]
-Subject: [VOTE][Datafusion] Release Apache Arrow Datafusion 5.1.0 RC0
+Subject: [VOTE][DataFusion] Release Apache Arrow DataFusion 5.1.0 RC0
 
 Hi,
 
-I would like to propose a release of Apache Arrow Datafusion Implementation,
+I would like to propose a release of Apache Arrow DataFusion Implementation,
 version 5.1.0.
 
 This release candidate is based on commit: a5dd428f57e62db20a945e8b1895de91405958c4 [1]
@@ -193,9 +193,9 @@ and vote on the release.
 
 The vote will be open for at least 72 hours.
 
-[ ] +1 Release this as Apache Arrow Datafusion 5.1.0
+[ ] +1 Release this as Apache Arrow DataFusion 5.1.0
 [ ] +0
-[ ] -1 Do not release this as Apache Arrow Datafusion 5.1.0 because...
+[ ] -1 Do not release this as Apache Arrow DataFusion 5.1.0 because...
 
 [1]: https://github.com/apache/arrow-datafusion/tree/a5dd428f57e62db20a945e8b1895de91405958c4
 [2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-5.1.0

diff --git a/dev/release/create-tarball.sh b/dev/release/create-tarball.sh
@@ -80,10 +80,10 @@ echo ""
 echo "---------------------------------------------------------"
 cat <<MAIL
 To: [email protected]
-Subject: [VOTE][RUST][Datafusion] Release Apache Arrow Datafusion ${version} RC${rc}
+Subject: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion ${version} RC${rc}
 Hi,
 
-I would like to propose a release of Apache Arrow Datafusion Implementation,
+I would like to propose a release of Apache Arrow DataFusion Implementation,
 version ${version}.
 
 This release candidate is based on commit: ${release_hash} [1]
@@ -98,9 +98,9 @@ encouraged to test the release and vote with "(non-binding)".
 
 The standard verification procedure is documented at https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates.
 
-[ ] +1 Release this as Apache Arrow Datafusion ${version}
+[ ] +1 Release this as Apache Arrow DataFusion ${version}
 [ ] +0
-[ ] -1 Do not release this as Apache Arrow Datafusion ${version} because...
+[ ] -1 Do not release this as Apache Arrow DataFusion ${version} because...
 
 [1]: https://github.com/apache/arrow-datafusion/tree/${release_hash}
 [2]: ${url}
@@ -129,4 +129,4 @@ gpg --armor --output ${tarball}.asc --detach-sig ${tarball}
 echo "Uploading to apache dist/dev to ${url}"
 svn co --depth=empty https://dist.apache.org/repos/dist/dev/arrow ${SOURCE_TOP_DIR}/dev/dist
 svn add ${distdir}
-svn ci -m "Apache Arrow Datafusion ${version} ${rc}" ${distdir}
+svn ci -m "Apache Arrow DataFusion ${version} ${rc}" ${distdir}
diff --git a/dev/release/release-tarball.sh b/dev/release/release-tarball.sh
@@ -65,7 +65,7 @@ cp -r ${tmp_dir}/dev/* ${tmp_dir}/release/${release_version}/
 svn add ${tmp_dir}/release/${release_version}
 
 echo "Commit release"
-svn ci -m "Apache Arrow Datafusion ${version}" ${tmp_dir}/release
+svn ci -m "Apache Arrow DataFusion ${version}" ${tmp_dir}/release
 
 echo "Clean up"
 rm -rf ${tmp_dir}

diff --git a/docs/README.md b/docs/README.md
@@ -26,7 +26,7 @@ inside a Python virtualenv.
 
 - Python
 - `pip install -r requirements.txt`
-- Datafusion python package. You can install the latest version by running `maturin develop` inside `../python` directory.
+- DataFusion python package. You can install the latest version by running `maturin develop` inside `../python` directory.
 
 ## Build
 

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -35,9 +35,9 @@
 
 # -- Project information -----------------------------------------------------
 
-project = 'Arrow Datafusion'
+project = 'Arrow DataFusion'
 copyright = '2022, Apache Software Foundation'
-author = 'Arrow Datafusion Authors'
+author = 'Arrow DataFusion Authors'
 
 
 # -- General configuration ---------------------------------------------------