Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for treebank-parser tree structure #45

Open
turbopape opened this issue Nov 14, 2016 · 5 comments
Open

Proposal for treebank-parser tree structure #45

turbopape opened this issue Nov 14, 2016 · 5 comments

Comments

@turbopape
Copy link

turbopape commented Nov 14, 2016

Hey @dakrone,

I am particularly interested by the treebank-parser.

One cool representation would be actually a one-to-one translation from the string representation of the tree into a Clojure List, with the first element being the tag and the rest of it the chunk!
This will be visually more understandable, and stick with Lisp's common representation of data in general !
This could be done using some reader-tricks:

(load-string  (str "(quote "
                                    (first  (treebank-parser ["This is a sentence ."]))
                                    ")"))
;;=> (TOP (S (NP (DT This)) (VP (VBZ is) (NP (DT a) (NN sentence))) (. .)))

But it would be better to have it generated when the parse is being done...
Whadda ya think ?

@ghost
Copy link

ghost commented Nov 14, 2016

Rafik - I think that representation is a good idea. (I’m new to both Clojure and OpenNLP, but I’m interested in the project and learning as I go.)

On Nov 14, 2016, at 10:11 AM, Rafik NACCACHE [email protected] wrote:

Hey @dakrone https://github.com/dakrone,

I am particularly interested by the treebank-parser.

One cool representation would be actually a one-to-one translation from the string representation of the tree into a Clojure List, with the first element being the tag and the rest of it the chunk!
This will be visually more understandable, and stick with Lisp's common representation !
This could be done using some reader-tricks:

(load-string (str "(quote "
(first (treebank-parser [ "This is a sentence ."
]))
")"))
;;=> (TOP (S (NP (DT This)) (VP (VBZ is) (NP (DT a) (NN sentence))) (. .)))
But it would be better to have it generated when the parse is being done...
Whadda ya think ?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #45, or mute the thread https://github.com/notifications/unsubscribe-auth/AGxJ8Ln7AO_8sNJ1VVcIOTQTFnq0lKxHks5q-IhCgaJpZM4KxfyX.

@dakrone
Copy link
Owner

dakrone commented Nov 17, 2016

@turbopape I could see that being a pretty good representation, but I didn't want to include that out of the box since people using load-string is kind of dangerous. Might be worth adding to the readme though!

@turbopape
Copy link
Author

Yes I agree, I didn't want to go for the "load string" solution, I've put
it only to show an example. Instead, wanted to investigate if we could
rework the map representation in a way to afford for the representation I
am suggesting...
Many thanks !

2016-11-17 23:10 GMT+01:00 Lee Hinman [email protected]:

@turbopape https://github.com/turbopape I could see that being a pretty
good representation, but I didn't want to include that out of the box since
people using load-string is kind of dangerous. Might be worth adding to
the readme though!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#45 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACpoHyIjSBMfbjVqmnRQr5b0sujWDNMWks5q_NDhgaJpZM4KxfyX
.

[image: --]

Rafik Naccache
[image: https://]about.me/rafik_naccache
https://about.me/rafik_naccache?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links

@dakrone
Copy link
Owner

dakrone commented Nov 17, 2016

@turbopape certainly, I'm definitely down for adding more representations. I figure we can keep the map representation and have things that will output it in different formats depending on the user's taste.

@eihli
Copy link

eihli commented Nov 6, 2020

The load-string method loses commas.

  (let [text-lines ["Hello, world!"]]
    (->> text-lines
         (map tokenize)
         (map (partial string/join " "))
         parse
         (map #(str "(quote " % ")"))
         (map load-string)))
  ;; => ((TOP (FRAG (INTJ (UH Hello)) () (NP (NN world)) (. !))))  

With a slight modification to tr in treebank.clj, you can get an s-expression that exactly matches the string parse and is easily turned into a zipper.

(def ^:private s-parser
  (insta/parser
   "E = <'('> T <WS> (T | (E <WS?>)+) <')'> <WS?> ; T = #'[^)\\s]+' ; WS = #'\\s+'"))

;; Only this function modified. Including above and below for reference.
(defn- tr
  "Transforms treebank string into series of s-like expressions."
  [ptree & [tag-fn]]
  (let [t (or tag-fn symbol)]
    (if (= :E (first ptree))
      (concat
       (list (t (second (second ptree))))
       (map #(tr % tag-fn) (drop 2 ptree)))
      (second ptree))))

(defn make-tree
  "Make a tree from the string output of a treebank-parser."
  [tree-text & [tag-fn]]
  (tr (s-parser tree-text) tag-fn))

One kind of nice thing you can do with a tree like this is use the default zipper for iterating and manipulating the parse tree.

    (-> parsed-s-expression
         (zip/seq-zip)
         zip/down
         zip/down
         zip/rightmost
         (zip/append-child '(. "!"))
         zip/root)
;; => ((TOP (FRAG (INTJ (UH "Hello")) (, ",") (NP (NN "world")) (. "!") (. "!"))))  

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants