Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

90%: Add computed fields #40

Merged
merged 34 commits into from
Mar 25, 2015
Merged

Conversation

rsinger
Copy link
Member

@rsinger rsinger commented Feb 24, 2015

This is a spike to throw up a proof of concept to allow generation of dynamic table values based on the values elsewhere in the generated table row document.

It also introduces the notion of ‘temp’ fields that aren’t stored in the collection: these fieldNames have a "temporary": true property.

Example of a table spec with this

{
    "_id": "t_wibble",
    "type": "owl:Thing",
    "from": "CBD_foobar",
    "fields" : [
        {
            {
                "fieldName": "foobarLink",
                "value": "link"
            },   
            {
                "fieldName": "mainType",
                "predicates": ["rdf:type"],
                "temporary": true
            },
            {
                "fieldName": "lastModMain",
                "predicates": ["dct:modified"],
                "temporary": true
            },
            {
                "fieldName": "mainTitle",
                "predicates": ["dct:title"],
                "temporary": true
            }
        }
    ],
    "joins": {
        "dct:hasPart" : {
            "fields" : [
                {
                    "fieldName": "partTitle",
                    "predicates": ["dct:title"],
                    "temporary": true
                },
                {
                    "fieldName": "partType",
                    "predicates": ["rdf:type"],
                    "temporary": true
                },
                {
                    "fieldName": "lastModPart",
                    "predicates": ["dct:modified"],
                    "temporary": true
                }
            ]
        },
        "dct:isPartOf" : {
            "fields" : [
                {
                    "fieldName": "partOfTitle",
                    "predicates": ["dct:title"],
                    "temporary": true
                },
                {
                    "fieldName": "partOfType",
                    "predicates": ["rdf:type"],
                    "temporary": true
                },
                {
                    "fieldName": "lastModPartOf",
                    "predicates": ["dct:modified"],
                    "temporary": true
                }
            ]
        },        
    },
    "counts": [
        {
            "fieldName": "referenceCount",
            "property": "dct:references"
        },
        {
            "fieldName": "referencedCount",
            "property": "dct:isReferencedBy"
        }
    ],
    "computed_fields": [
        {
            "fieldName": "title",
            "value": {
                "conditional": {
                    "if":[["bibo:Article", "bibo:Chapter"], "contains", "$mainType"],
                    "then":"$mainTitle",
                    "else": {
                        "conditional":{
                            "if":[["bibo:Article", "bibo:Chapter"], "contains", "$partType"],
                            "then":"$partTitle",
                            "else":{"conditional":{
                                "if":[["bibo:Article", "bibo:Chapter"], "contains", "$partOfType"],
                                "then":"$partOfTitle",
                                "else":"Unknown!!"
                            }}
                        }
                    }
                }
            }
        },
        {
            "fieldName": "type",
            "value": {
                "replace": {
                    "search":"bibo:",
                    "replace":"",
                    "subject":"$mainType"
                }
            }
        },
        {
            "fieldName": "referenceDiff",
            "value": {
                "conditional": {
                    "if":["$referenceCount",">","$referencedCount"],
                    "then":{
                        "arithmetic":["$referenceCount","-","$referencedCount"]
                    },
                    "else":{
                        "arithmetic":["$referencedCount","-","$referenceCount"]
                    }
                }
            }
        }
    ]            
}

Example of document outputted above

{
    "foobarLink": "http://example.com/1234",
    "referenceCount": 32,
    "referencedCount": 14,
    "title": "I was the isPartOf title",
    "type": "Book",
    "referenceDiff": 18
}

break;
default:
$value = null;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arithmetic operators contains: "+", "-", "", "/", "%", "*"

Did we miss "**" from this switch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - it should have been removed from the valid operators: exponents were introduced in PHP 5.6, so I don't think we should support them here.

@malcyL
Copy link
Contributor

malcyL commented Feb 25, 2015

Reading through this, I was thinking there is a lot of parsing code - is there some library in PHP that might do this for us? A PHP yacc/bison? I guess we are already parsing the tablespec like this, and this is just extending what we have.

So - other than my question above on "**" - 👍

@rsinger
Copy link
Member Author

rsinger commented Feb 25, 2015

I was also concerned about the size of the validation code (which I think is what you mean about parsing), especially given that it needs to be done on every request. I sort of wonder about keeping them, but making the actual bit that runs on MongoTripod::loadConfig() much smaller and put a more comprehensive one as a script.

@rsinger
Copy link
Member Author

rsinger commented Feb 25, 2015

The only downside to this is that if your tripod config is generated on the fly (like it is in TARL).

@lordtatty
Copy link
Contributor

My first thoughts on this is that we seem to have an element of "code is data, and data is code" hovering around here, and this is a problem which has already been solved by languages such as LISP. I have found a couple of sites where people are discussing embedding conditionals within JSON, like we are trying to do, and lisp-like s-expressions seem to be approaches which are favoured, as they directly translate to this problem. http://alandipert.tumblr.com/post/7193534410/discover-lisp-in-your-web-browser-with-javascript - part way down this page the author proposes a JSONscript language (I googled - it doesn't actually exist yet), but I wonder if it might provide some interesting ideas for us here. Perhaps we could write a simple parser for JSONscript and open source it. There is a stack overflow post here where the top answer is proposing a similar design: http://stackoverflow.com/questions/20737045/representing-logic-as-data-in-json

Having said that, there is nothing inherently wrong with this approach if we are too far down the rabbit hole to change it now - I was just thinking in terms of standing on the shoulders of giants, and such. It would also help with the above thoughts on validation by separating out that code from tripod itself.

@rsinger
Copy link
Member Author

rsinger commented Feb 25, 2015

@lordtatty Are you proposing that the "value" value for the fieldName "type" should look something like:

["_replace_",["bibo","","$mainType"]]

and the "value" value for fieldName "referenceDiff" should look like:

[
    "_conditional_",
    [
        "if",
        [
            ">",
            ["$referenceCount", "$referencedCount"]
        ],
        "then",
        [
            "_arithmetic_",
            [
                "-",
                ["$referenceCount","$referencedCount"]
            ]
        ],
        "else",
        [
            "_arithmetic_",
            [
                "-",
                ["$referencedCount","$referenceCount"]
            ]
        ]
    ]
]

@lordtatty
Copy link
Contributor

It's been a little while since I touched something like lisp myself, but for the latter I was thinking something along the lines of:

        {
            "fieldName": "referenceDiff",
            "value": {
                "_evaluated_": [
                    "-",
                    ["MAX", "$referenceCount", "$referencedCount"],        
                    ["MIN", "$referenceCount", "$referencedCount"],               
                ]
            }
        }

Maybe could make it easier a function like MAXMIN so we don't need to do two separate MAX and MIN calls, but in terms of raw functionality, something like that.

@rsinger rsinger changed the title 30%: Add computed fields 90%: Add computed fields Mar 2, 2015
@rsinger
Copy link
Member Author

rsinger commented Mar 24, 2015

I think all of the comments here have been addressed:

  • Docs updated to include computed field functions and temporary fields
  • _function_ has been removed in favor of just 'conditional', 'replace', etc. 'link' has been deprecated in favor of 'link' and now logs a warning. Also log warning was added
  • Functions (predicate/computed field) need to be consolidated and modularized #44
  • The naming convention for temporary fields has been replaced with adding a 'temporary': true property on the fieldspec

{
"fieldName": "type",
"value": {
"_replace_" : { // _replace_ is a function name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the underscores be removed?

@@ -296,6 +318,14 @@ protected function loadConfig(Array $config)
{
throw new MongoTripodConfigException("View spec does not contain " . _ID_KEY);
}
if(!isset($spec['from']) || !in_array($spec['from'], $this->getPods($storeName)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've got a validateTableSpec function to validate table specs, but validation of view specs (and search specs) are done inline. Worth moving to a function to keep it consistent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, I had done this, but, actually, the amount of validation done for search and view specs is the same as we do inline for table specs.

Really, it's just that we do a lot more validation for table specs.

@scaleupcto
Copy link
Contributor

👍

"then" and "else" values may be of any type, including another function. To replicate 'else if', you can use another
conditional function as the value of 'else'.

Arithmetic functions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want something in here to tell you what arithmetic functions are available?

"replace": {
"search": "bibo:",
"replace": "",
"subject": "$x"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed in the PR that uses this functionality that you have something like:
"subject": "$!concatenatedType" - what does that do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that "!concatenatedType" is the name of the field - just confused me a bit as it looked like it could be an operator..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was based on an earlier version of this branch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

READ THE COMMENTS, @rgubby!

@rgubby
Copy link
Contributor

rgubby commented Mar 25, 2015

looks good to me! 👍

rsinger added a commit that referenced this pull request Mar 25, 2015
@rsinger rsinger merged commit b5e00c3 into master Mar 25, 2015
@rsinger rsinger deleted the issue/22/save-conditional-table-row-data branch March 25, 2015 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants