Feature/no answer pipeline #183

viktors264 · 2022-12-29T13:39:07Z

Added new noAnswer key and updated generic pipeline aggregation to show all responses without answer.

SachaG · 2023-01-17T23:54:29Z

This is a good start! But it's missing a key feature, which is that the no_answer key should be added to buckets, not just facets.

What I mean is that currently this gives up data like this (in this case, "years of experience" with the "gender" facet):

facets: [
        {
          type: 'gender',
          id: 'noAnswer',
          buckets: [
            { id: 'range_5_10', count: 66 },
            { id: 'range_10_20', count: 49 },
            { id: 'range_less_than_1', count: 20 },
          ]
        },
        {
          type: 'gender',
          id: 'not_listed',
          buckets: [
            { id: 'range_2_5', count: 35 },
            { id: 'range_5_10', count: 34 },
            { id: 'range_10_20', count: 39 },
          ]
        },
        {
          type: 'gender',
          id: 'male',
          buckets: [
            { id: 'range_2_5', count: 7970 },
            { id: 'range_10_20', count: 5470 },
            { id: 'range_5_10', count: 7362 },
          ]
        },

So you've added the "years of experience" breakdown for people who didn't answer the "gender" question.

But within each "years of experience" array of buckets, we also want to know how many people didn't answer the years of experience question. So the data we actually want for would be more like this:

facets: [
        {
          type: 'gender',
          id: 'noAnswer',
          buckets: [
            { id: 'range_5_10', count: 66 },
            { id: 'range_10_20', count: 49 },
            { id: 'range_less_than_1', count: 20 },
            { id: 'no_answer', count: 123 }, // people who didn't answer gender OR years of experience
          ]
        },
        {
          type: 'gender',
          id: 'not_listed',
          buckets: [
            { id: 'range_2_5', count: 35 },
            { id: 'range_5_10', count: 34 },
            { id: 'range_10_20', count: 39 },
            { id: 'no_answer', count: 123 }, // people who picked "not_listed" as gender but didn't answer "years of experience"
          ]
        },

Additionally we want this no_answer bucket to appear even when people don't select any facet. So we also want this:

"facets": [
              {
                "id": "default", // this is what we get when no facet is selected
                "buckets": [
                  {
                    "id": "range_less_than_1",
                    "count": 1272,
                  },
                  {
                    "id": "range_1_2",
                    "count": 4177,
                  },
                  {
                    "id": "range_2_5",
                    "count": 8710,
                  },
                  {
                    "id": "no_answer",
                    "count": 123,
                  },

SachaG · 2023-01-17T23:48:12Z

api/src/data/keys.yml

@@ -9,6 +9,7 @@ age:
  - range_more_than_65

 years_of_experience:
+  - no_answer


Because every field will need a no_answer key I don't think it makes sense to add it here, it's probably better to add it directly in generic.ts somewhere.

I'm not sure we actually need to add it though, or maybe only at the GraphQL level… at least I don't think we use those keys in the pipeline? I'll double check.

Previously

This is a good start! But it's missing a key feature, which is that the no_answer key should be added to buckets, not just facets.

Generic pipeline updated. Added "no_answer" key to the buckets section. Also, when no facets is selected, I tested and got results as you wrote before.

Because every field will need a no_answer key I don't think it makes sense to add it here, it's probably better to add it directly in generic.ts somewhere.

Yes, I absolutely agree. If 'no_answer' key is needed for every field, better to add it in generic.ts or at the GraphQL level (if keys.yml are not used in pipeline). I have checked - don't found straight use of keys in Mongo aggregation, but you better check.

SachaG · 2023-01-17T23:57:47Z

api/src/compute/generic_pipeline.ts

+                  //     $unwind: {
+                  //         path: `$${facetPath}`
+                  //     }
+                  // }


I guess we can safely remove this whole step then?

Removed comments. Changed aggregation.

SachaG · 2023-01-18T00:10:49Z

By the way, that no_answer bucket already appears in the survey results, but currently it's manually calculated in the chart itself (number of total respondents - sum of respondents in the other columns). I think it would be cleaner to do it at the API level.

(Also I guess it wouldn't be too hard to do it outside the aggregation pipeline in the rest of the JS code if the pipeline can't easily do it)

viktors264 · 2023-01-18T04:15:21Z

By the way, that no_answer bucket already appears in the survey results, but currently it's manually calculated in the chart itself (number of total respondents - sum of respondents in the other columns). I think it would be cleaner to do it at the API level.

(Also I guess it wouldn't be too hard to do it outside the aggregation pipeline in the rest of the JS code if the pipeline can't easily do it)

Yes, of course - better to make calculations inside API.

SachaG · 2023-01-18T05:33:14Z

Good progress! But now I'm running into a different issue. It doesn't work when querying for a field where people can pick multiple options at the same time.

For example with the following GraphQL query:

query raceEthnicityQuery {
    survey(survey: state_of_js) {
        demographics {
            race_ethnicity: race_ethnicity(filters: {}, options: {}) {
                keys
                year(year: 2022) {
                    year
                    completion {
                        total
                        percentage_survey
                        count
                    }
                    facets {
                        id
                        type
                        completion {
                            total
                            percentage_question
                            percentage_survey
                            count
                        }
                        buckets {
                            id
                            count
                            percentage_question
                            percentage_survey
                        }
                    }
                }
            }
            
        }
    }
}

I get this:

results: [
    {
      facets: [
        {
          type: 'default',
          id: 'default',
          buckets: [
            { id: [ 'multiracial', 'white_european' ], count: 33 },
            {
              id: [
                'black_african',
                'east_asian',
                'hispanic_latin',
                'middle_eastern',
                'multiracial',
                'native_american_islander_australian',
                'south_asian',
                'south_east_asian'
              ],
              count: 1
            },
            {
              id: [ 'multiracial', 'hispanic_latin', 'white_european' ],
              count: 2
            },
            {
              id: [ 'multiracial', 'white_european', 'middle_eastern' ],
              count: 2
            },
            { id: [ 'east_asian', 'multiracial' ], count: 1 },
            {
              id: [ 'south_east_asian', 'south_asian', 'east_asian' ],
              count: 3
            },
            {
              id: [
                'black_african',
                'east_asian',
                'hispanic_latin',
                'middle_eastern',
                'native_american_islander_australian',
                'multiracial',
                'south_asian',
                'south_east_asian',
                'white_european',
                'not_listed'
              ],
              count: 1
            },
            { id: [ 'south_east_asian' ], count: 1000 },
            { id: [ 'multiracial', 'south_east_asian' ], count: 1 },
            {
              id: [
                'east_asian',
                'native_american_islander_australian',
                'south_asian',
                'white_european'
              ],
              count: 1
            },
etc.

As you can see it's using every existing combination of answers as a unique id key instead of aggregating them. The correct output (from main branch) would be:

  results: [
    {
      facets: [
        {
          type: 'default',
          id: 'default',
          buckets: [
            { id: 'multiracial', count: 727 },
            { id: 'east_asian', count: 1710 },
            { id: 'white_european', count: 19790 },
            { id: 'middle_eastern', count: 1158 },
            { id: 'hispanic_latin', count: 2795 },
            { id: 'south_asian', count: 1731 },
            { id: 'native_american_islander_australian', count: 142 },
            { id: 'not_listed', count: 795 },
            { id: 'south_east_asian', count: 1221 },
            { id: 'black_african', count: 1074 }
          ]
        }
      ],
      year: 2022
    }
  ]
}

… multi select option

…norepo into feature/no-answer-pipeline

vercel · 2023-01-18T16:16:48Z

Someone is attempting to deploy a commit to the Devographics Team on Vercel.

A member of the Team first needs to authorize it.

viktors264 · 2023-01-18T16:25:20Z

Good progress! But now I'm running into a different issue. It doesn't work when querying for a field where people can pick multiple options at the same time.

I have added back unwind operator with specific option which not skip nullable/empty fields. Seems, that we cannot remove unwind operator. Tested your case, working fine now, tested previous cases locally also - seems working for me.
For me difficult to know and test all cases, but let me know if something is wrong.

Viktors Dobkevics added 2 commits December 29, 2022 15:26

Added noAnswer key and updated generic_pipeline aggregation

8e880a6

Added missing industry_selectior keys

79a4fef

Devographics deleted a comment from netlify bot Jan 17, 2023

Devographics deleted a comment from vercel bot Jan 17, 2023

Devographics deleted a comment from netlify bot Jan 17, 2023

SachaG requested changes Jan 18, 2023

View reviewed changes

Viktors Dobkevics added 2 commits January 18, 2023 05:45

Added no_answer key to buckets, updated generic_pipeline.ts

8f2ef57

Removed no_answer key from keys.yml

42afe3d

Merge branch 'main' into feature/no-answer-pipeline

02475a5

Devographics deleted a comment from vercel bot Jan 18, 2023

Viktors Dobkevics added 2 commits January 18, 2023 18:09

Added unwind with preserveNullAndEmptyArrays option to fix field with…

28fdd21

… multi select option

Merge branch 'feature/no-answer-pipeline' of github.com:viktors264/Mo…

90d5f8a

…norepo into feature/no-answer-pipeline

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/no answer pipeline #183

Feature/no answer pipeline #183

viktors264 commented Dec 29, 2022

SachaG commented Jan 17, 2023

SachaG Jan 17, 2023

SachaG Jan 18, 2023

viktors264 Jan 18, 2023 •

edited

Loading

viktors264 Jan 18, 2023

SachaG Jan 17, 2023

viktors264 Jan 18, 2023

SachaG commented Jan 18, 2023

viktors264 commented Jan 18, 2023

SachaG commented Jan 18, 2023

vercel bot commented Jan 18, 2023

viktors264 commented Jan 18, 2023 •

edited

Loading

Feature/no answer pipeline #183

Are you sure you want to change the base?

Feature/no answer pipeline #183

Conversation

viktors264 commented Dec 29, 2022

SachaG commented Jan 17, 2023

SachaG Jan 17, 2023

Choose a reason for hiding this comment

SachaG Jan 18, 2023

Choose a reason for hiding this comment

viktors264 Jan 18, 2023 • edited Loading

Choose a reason for hiding this comment

viktors264 Jan 18, 2023

Choose a reason for hiding this comment

SachaG Jan 17, 2023

Choose a reason for hiding this comment

viktors264 Jan 18, 2023

Choose a reason for hiding this comment

SachaG commented Jan 18, 2023

viktors264 commented Jan 18, 2023

SachaG commented Jan 18, 2023

vercel bot commented Jan 18, 2023

viktors264 commented Jan 18, 2023 • edited Loading

viktors264 Jan 18, 2023 •

edited

Loading

viktors264 commented Jan 18, 2023 •

edited

Loading